fix: add /api prefix to all backend routes

Frontend expects /api/chat but backend had /chat.
Added APIRouter with prefix=/api to fix route mismatch.
This commit is contained in:
2026-04-13 17:09:17 -04:00
parent f000f13672
commit 2041dd9412
2 changed files with 115 additions and 11 deletions

97
docs/gpu-compatibility.md Normal file
View File

@@ -0,0 +1,97 @@
# GPU Compatibility Guide
## RTX 50-Series (Blackwell) Compatibility Notice
### Issue
NVIDIA RTX 50-series GPUs (RTX 5070, 5080, 5090) use CUDA capability `sm_120` (Blackwell architecture). PyTorch stable releases (up to 2.5.1) only officially support up to `sm_90` (Hopper/Ada).
**Warning you'll see:**
```
NVIDIA GeForce RTX 5070 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
```
### Current Status
- ✅ PyTorch detects the GPU
- ✅ CUDA operations generally work
- ⚠️ Some operations may fail or fall back to CPU
- ⚠️ Performance may not be optimal
### Workarounds
#### Option 1: Use PyTorch Nightly (Recommended for RTX 50-series)
```bash
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
```
#### Option 2: Use Current Stable with Known Limitations
Many workloads work fine despite the warning. Test your specific use case.
#### Option 3: Wait for PyTorch 2.7
Full sm_120 support is expected in the next stable release.
### Installation Steps for KV-RAG with GPU
1. **Install CUDA-enabled PyTorch:**
```bash
pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```
2. **Install unsloth without dependencies:**
```bash
pip install unsloth --no-deps
pip install unsloth_zoo
```
3. **Install remaining training dependencies:**
```bash
pip install bitsandbytes accelerate peft transformers datasets trl
```
Note: Skip `xformers` as it may overwrite torch. Unsloth works without it.
### Verify GPU is Working
```python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"CUDA version: {torch.version.cuda}")
```
### Ollama GPU Status
Ollama runs **natively on Windows** and uses GPU automatically when available:
- Check with: `nvidia-smi` (look for `ollama.exe` processes)
- Embedding model (`mxbai-embed-large:335m`) runs on GPU
- Chat models also use GPU when loaded
### Forge Training GPU Status
The training script uses `unsloth` + `trl` for QLoRA fine-tuning:
- Requires CUDA-enabled PyTorch
- Optimized for 12GB VRAM (RTX 5070)
- Uses 4-bit quantization + LoRA adapters
- See `src/companion/forge/train.py` for implementation
### Troubleshooting
**Issue:** `CUDA available: False` after installation
**Fix:** PyTorch was overwritten by a package dependency. Reinstall:
```bash
pip install torch==2.5.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
```
**Issue:** `xformers` overwrites torch
**Fix:** Skip xformers or install matching wheel:
```bash
# Skip for now - unsloth works without it
# Or install specific version matching your torch
pip install xformers==0.0.28.post3 --index-url https://download.pytorch.org/whl/cu121
```
### References
- [PyTorch CUDA Compatibility](https://pytorch.org/get-started/locally/)
- [NVIDIA CUDA Capability Matrix](https://developer.nvidia.com/cuda-gpus)
- [Unsloth Documentation](https://github.com/unsloth/unsloth)
- [RTX 50-Series Architecture](https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/)