video-gen/AGENTS.MD

# agents.md
## Project: Local AI Video Generation from Text Storyboards (Windows + RTX 5070 12GB)

### 0) Who this is for
The owner (user) is not an ML expert. The system must:
- be reproducible (conda + requirements)
- have guardrails (configs, logs, validation)
- be test-driven (pytest)
- maintain docs (developer + user)

---

## 1) High-Level Goal
Build a local pipeline that converts **text-only storyboards** into **15–30 second videos** by:
1) converting storyboard -> shot plan
2) generating shot clips (T2V or I2V when possible)
3) assembling clips into a final MP4
4) upscaling to 2K/4K if desired

This is a **shot-based** system, not “one prompt makes a whole movie”.

---

## 2) Hard Constraints (Hardware & OS)
Target system:
- Windows 11
- NVIDIA RTX 5070 (12GB VRAM) - Must use GPU.
- 32GB RAM
- 2TB SSD
- Anaconda available

Design must be stable under 12GB VRAM using:
- fp16/bf16
- attention slicing
- xFormers / SDPA where supported
- optional CPU offload

---

## 3) Output Targets (Realistic)
- Native generation: 720p–1080p (preferred)
- Final delivery: 1080p required; 2K/4K via upscaling
- Duration: 15–30s per video (may be segmented)
- FPS: 24 default
- Output: MP4 (H.264/H.265)

---

## 4) CUDA 13.1 Reality & PyTorch Plan (Critical)
User has CUDA Toolkit 13.1 installed. Current PyTorch builds generally ship with and target CUDA 12.x runtimes.
We must NOT assume PyTorch will build/run against local CUDA 13.1 toolkit.

**Plan:**
- Use **PyTorch prebuilt binaries that bundle CUDA runtime** (e.g., cu121 / cu124).
- Rely on NVIDIA driver compatibility rather than local CUDA toolkit version.
- Avoid compiling custom CUDA extensions unless necessary.

Implementation notes:
- Prefer installing PyTorch via conda or pip using official CUDA 12.x builds.
- If xFormers causes build issues, use PyTorch SDPA and disable xFormers.

---

## 5) Approved Stack (Do Not Deviate)
### Core
- Python 3.10 or 3.11 (conda env)
- PyTorch (CUDA 12.x build: cu121 or cu124)
- diffusers + transformers + accelerate + safetensors
- ffmpeg for assembly
- opencv-python for frame IO (if needed)
- pydantic for config/schema validation
- rich / loguru for logs

### Testing
- pytest
- pytest-cov
- snapshot-ish tests where feasible (metadata + shapes, not visual perfection)

### Docs
- /docs/developer.md (developer documentation)
- /docs/user.md (user manual)
- Keep docs updated alongside code changes.

---

## 6) Video Models (Pragmatic Choices)
### Primary (target)
- WAN 2.x family (T2V; optional I2V if supported in chosen pipeline)
Goal: best possible quality on consumer VRAM with chunking.

### Secondary / fallback
- Stable Video Diffusion (SVD) if WAN is unstable
- LTX-Video (only if it fits and is stable in our stack)

All model backends must implement the same interface:
- generate_shot(shot_spec) -> video_file + metadata

---

## 7) Canonical Input: Storyboard JSON
Storyboard source is text-only (often AI-generated). We will store and validate it as JSON.

A template exists at: `templates/storyboard.template.json`

We will later build a utility script:
- input: plain text fields or a simple text format
- output: valid storyboard JSON

---

## 8) Pipeline Modules (Required)
### A) Storyboard parsing & validation
- Load storyboard JSON
- Validate schema
- Expand defaults (fps, resolution, global style)
- Produce normalized shot list

### B) Prompt compilation
- Merge global style + shot prompt + camera notes
- Produce positive + negative prompts
- Keep deterministic via seeds

### C) Generation runner (per shot)
- For each shot: generate clip
- Support:
  - seed control
  - chunking (e.g., generate 4–6 seconds then continue)
  - optional init frame handoff between shots

### D) Assembly
- Use ffmpeg concat to build final video
- Optionally add:
  - transitions
  - temp audio
  - burn-in shot IDs for debugging mode

### E) Upscaling (optional)
- Upscale final to 2K/4K (post step)
- Keep this modular so user can skip.

---

## 9) Determinism & Logging (Must Have)
For each shot and final render, save:
- prompts (positive/negative)
- seed(s)
- model + revision/hash info if available
- inference params (steps, cfg, sampler, resolution, fps, frames)
- timing + VRAM notes if possible

Every run produces a folder:
- outputs/<project>/<timestamp>/
  - shots/
  - assembled/
  - metadata/

---

## 10) Testing Rules (Hard Requirement)
- Tests must be written alongside features.
- Whenever a file/function is modified, corresponding tests MUST be updated.
- Prefer tests that verify:
  - schema validation works
  - prompt compiler output is stable
  - shot planner expands durations -> frame counts
  - assembly command lines are correct
  - metadata is generated correctly

Do not require “visual quality” assertions. Test structure and determinism.

---

## 11) Documentation Rules (Hard Requirement)
Maintain these continuously:
- docs/developer.md
  - architecture
  - install steps
  - how to run tests
  - how to add a new model backend
- docs/user.md
  - quickstart
  - how to create storyboard JSON
  - how to run generation
  - where outputs go
  - troubleshooting (VRAM, drivers, ffmpeg)

Docs must be updated whenever CLI flags, file formats, or workflows change.

---

## 12) Project Files to Maintain
Required:
- requirements.txt (pip deps)
- environment.yml (conda env)
- templates/storyboard.template.json
- docs/developer.md
- docs/user.md
- src/ (implementation)
- tests/ (pytest)

---

## 13) Definition of Done
A feature is “done” only if:
- implemented
- tests added/updated
- docs updated
- reproducible install instructions remain valid

End of file.