Files
kv-ai/docs/gpu-compatibility.md
Santhosh Janardhanan 2041dd9412 fix: add /api prefix to all backend routes
Frontend expects /api/chat but backend had /chat.
Added APIRouter with prefix=/api to fix route mismatch.
2026-04-13 17:09:17 -04:00

3.2 KiB

GPU Compatibility Guide

RTX 50-Series (Blackwell) Compatibility Notice

Issue

NVIDIA RTX 50-series GPUs (RTX 5070, 5080, 5090) use CUDA capability sm_120 (Blackwell architecture). PyTorch stable releases (up to 2.5.1) only officially support up to sm_90 (Hopper/Ada).

Warning you'll see:

NVIDIA GeForce RTX 5070 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

Current Status

  • PyTorch detects the GPU
  • CUDA operations generally work
  • ⚠️ Some operations may fail or fall back to CPU
  • ⚠️ Performance may not be optimal

Workarounds

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

Option 2: Use Current Stable with Known Limitations

Many workloads work fine despite the warning. Test your specific use case.

Option 3: Wait for PyTorch 2.7

Full sm_120 support is expected in the next stable release.

Installation Steps for KV-RAG with GPU

  1. Install CUDA-enabled PyTorch:

    pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    
  2. Install unsloth without dependencies:

    pip install unsloth --no-deps
    pip install unsloth_zoo
    
  3. Install remaining training dependencies:

    pip install bitsandbytes accelerate peft transformers datasets trl
    

    Note: Skip xformers as it may overwrite torch. Unsloth works without it.

Verify GPU is Working

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"CUDA version: {torch.version.cuda}")

Ollama GPU Status

Ollama runs natively on Windows and uses GPU automatically when available:

  • Check with: nvidia-smi (look for ollama.exe processes)
  • Embedding model (mxbai-embed-large:335m) runs on GPU
  • Chat models also use GPU when loaded

Forge Training GPU Status

The training script uses unsloth + trl for QLoRA fine-tuning:

  • Requires CUDA-enabled PyTorch
  • Optimized for 12GB VRAM (RTX 5070)
  • Uses 4-bit quantization + LoRA adapters
  • See src/companion/forge/train.py for implementation

Troubleshooting

Issue: CUDA available: False after installation Fix: PyTorch was overwritten by a package dependency. Reinstall:

pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall

Issue: xformers overwrites torch Fix: Skip xformers or install matching wheel:

# Skip for now - unsloth works without it
# Or install specific version matching your torch
pip install xformers==0.0.28.post3 --index-url https://download.pytorch.org/whl/cu121

References