Frontend expects /api/chat but backend had /chat. Added APIRouter with prefix=/api to fix route mismatch.
3.2 KiB
GPU Compatibility Guide
RTX 50-Series (Blackwell) Compatibility Notice
Issue
NVIDIA RTX 50-series GPUs (RTX 5070, 5080, 5090) use CUDA capability sm_120 (Blackwell architecture). PyTorch stable releases (up to 2.5.1) only officially support up to sm_90 (Hopper/Ada).
Warning you'll see:
NVIDIA GeForce RTX 5070 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
Current Status
- ✅ PyTorch detects the GPU
- ✅ CUDA operations generally work
- ⚠️ Some operations may fail or fall back to CPU
- ⚠️ Performance may not be optimal
Workarounds
Option 1: Use PyTorch Nightly (Recommended for RTX 50-series)
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
Option 2: Use Current Stable with Known Limitations
Many workloads work fine despite the warning. Test your specific use case.
Option 3: Wait for PyTorch 2.7
Full sm_120 support is expected in the next stable release.
Installation Steps for KV-RAG with GPU
-
Install CUDA-enabled PyTorch:
pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 -
Install unsloth without dependencies:
pip install unsloth --no-deps pip install unsloth_zoo -
Install remaining training dependencies:
pip install bitsandbytes accelerate peft transformers datasets trlNote: Skip
xformersas it may overwrite torch. Unsloth works without it.
Verify GPU is Working
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"CUDA version: {torch.version.cuda}")
Ollama GPU Status
Ollama runs natively on Windows and uses GPU automatically when available:
- Check with:
nvidia-smi(look forollama.exeprocesses) - Embedding model (
mxbai-embed-large:335m) runs on GPU - Chat models also use GPU when loaded
Forge Training GPU Status
The training script uses unsloth + trl for QLoRA fine-tuning:
- Requires CUDA-enabled PyTorch
- Optimized for 12GB VRAM (RTX 5070)
- Uses 4-bit quantization + LoRA adapters
- See
src/companion/forge/train.pyfor implementation
Troubleshooting
Issue: CUDA available: False after installation
Fix: PyTorch was overwritten by a package dependency. Reinstall:
pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
Issue: xformers overwrites torch
Fix: Skip xformers or install matching wheel:
# Skip for now - unsloth works without it
# Or install specific version matching your torch
pip install xformers==0.0.28.post3 --index-url https://download.pytorch.org/whl/cu121