# GPU Compatibility Guide ## RTX 50-Series (Blackwell) Compatibility Notice ### Issue NVIDIA RTX 50-series GPUs (RTX 5070, 5080, 5090) use CUDA capability `sm_120` (Blackwell architecture). PyTorch stable releases (up to 2.5.1) only officially support up to `sm_90` (Hopper/Ada). **Warning you'll see:** ``` NVIDIA GeForce RTX 5070 with CUDA capability sm_120 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90. ``` ### Current Status - ✅ PyTorch detects the GPU - ✅ CUDA operations generally work - ⚠️ Some operations may fail or fall back to CPU - ⚠️ Performance may not be optimal ### Workarounds #### Option 1: Use PyTorch Nightly (Recommended for RTX 50-series) ```bash pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 ``` #### Option 2: Use Current Stable with Known Limitations Many workloads work fine despite the warning. Test your specific use case. #### Option 3: Wait for PyTorch 2.7 Full sm_120 support is expected in the next stable release. ### Installation Steps for KV-RAG with GPU 1. **Install CUDA-enabled PyTorch:** ```bash pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 ``` 2. **Install unsloth without dependencies:** ```bash pip install unsloth --no-deps pip install unsloth_zoo ``` 3. **Install remaining training dependencies:** ```bash pip install bitsandbytes accelerate peft transformers datasets trl ``` Note: Skip `xformers` as it may overwrite torch. Unsloth works without it. ### Verify GPU is Working ```python import torch print(f"CUDA available: {torch.cuda.is_available()}") print(f"GPU: {torch.cuda.get_device_name(0)}") print(f"CUDA version: {torch.version.cuda}") ``` ### Ollama GPU Status Ollama runs **natively on Windows** and uses GPU automatically when available: - Check with: `nvidia-smi` (look for `ollama.exe` processes) - Embedding model (`mxbai-embed-large:335m`) runs on GPU - Chat models also use GPU when loaded ### Forge Training GPU Status The training script uses `unsloth` + `trl` for QLoRA fine-tuning: - Requires CUDA-enabled PyTorch - Optimized for 12GB VRAM (RTX 5070) - Uses 4-bit quantization + LoRA adapters - See `src/companion/forge/train.py` for implementation ### Troubleshooting **Issue:** `CUDA available: False` after installation **Fix:** PyTorch was overwritten by a package dependency. Reinstall: ```bash pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall ``` **Issue:** `xformers` overwrites torch **Fix:** Skip xformers or install matching wheel: ```bash # Skip for now - unsloth works without it # Or install specific version matching your torch pip install xformers==0.0.28.post3 --index-url https://download.pytorch.org/whl/cu121 ``` ### References - [PyTorch CUDA Compatibility](https://pytorch.org/get-started/locally/) - [NVIDIA CUDA Capability Matrix](https://developer.nvidia.com/cuda-gpus) - [Unsloth Documentation](https://github.com/unsloth/unsloth) - [RTX 50-Series Architecture](https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/)