Files

Santhosh Janardhanan 2041dd9412 fix: add /api prefix to all backend routes

Frontend expects /api/chat but backend had /chat.
Added APIRouter with prefix=/api to fix route mismatch.

2026-04-13 17:09:17 -04:00

3.2 KiB

Raw Permalink Blame History

GPU Compatibility Guide

RTX 50-Series (Blackwell) Compatibility Notice

Issue

NVIDIA RTX 50-series GPUs (RTX 5070, 5080, 5090) use CUDA capability sm_120 (Blackwell architecture). PyTorch stable releases (up to 2.5.1) only officially support up to sm_90 (Hopper/Ada).

Warning you'll see:

NVIDIA GeForce RTX 5070 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

Current Status

✅ PyTorch detects the GPU
✅ CUDA operations generally work
⚠️ Some operations may fail or fall back to CPU
⚠️ Performance may not be optimal

Workarounds

Option 1: Use PyTorch Nightly (Recommended for RTX 50-series)

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

Option 2: Use Current Stable with Known Limitations

Many workloads work fine despite the warning. Test your specific use case.

Option 3: Wait for PyTorch 2.7

Full sm_120 support is expected in the next stable release.

Installation Steps for KV-RAG with GPU

Install CUDA-enabled PyTorch:

pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Install unsloth without dependencies:

pip install unsloth --no-deps
pip install unsloth_zoo

Install remaining training dependencies:
```
pip install bitsandbytes accelerate peft transformers datasets trl
```
Note: Skip xformers as it may overwrite torch. Unsloth works without it.

Verify GPU is Working

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"CUDA version: {torch.version.cuda}")

Ollama GPU Status

Ollama runs natively on Windows and uses GPU automatically when available:

Check with: nvidia-smi (look for ollama.exe processes)
Embedding model (mxbai-embed-large:335m) runs on GPU
Chat models also use GPU when loaded

Forge Training GPU Status

The training script uses unsloth + trl for QLoRA fine-tuning:

Requires CUDA-enabled PyTorch
Optimized for 12GB VRAM (RTX 5070)
Uses 4-bit quantization + LoRA adapters
See src/companion/forge/train.py for implementation

Troubleshooting

Issue: CUDA available: False after installation Fix: PyTorch was overwritten by a package dependency. Reinstall:

pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall

Issue: xformers overwrites torch Fix: Skip xformers or install matching wheel:

# Skip for now - unsloth works without it
# Or install specific version matching your torch
pip install xformers==0.0.28.post3 --index-url https://download.pytorch.org/whl/cu121

3.2 KiB Raw Permalink Blame History