Initial Commit

2026-02-03 23:06:28 -05:00
commit 46b10fb69b
25 changed files with 2770 additions and 0 deletions
--- a/AGENTS.MD
+++ b/AGENTS.MD
@@ -0,0 +1,210 @@
+# agents.md
+## Project: Local AI Video Generation from Text Storyboards (Windows + RTX 5070 12GB)
+
+### 0) Who this is for
+The owner (user) is not an ML expert. The system must:
+- be reproducible (conda + requirements)
+- have guardrails (configs, logs, validation)
+- be test-driven (pytest)
+- maintain docs (developer + user)
+
+---
+
+## 1) High-Level Goal
+Build a local pipeline that converts **text-only storyboards** into **15–30 second videos** by:
+1) converting storyboard -> shot plan
+2) generating shot clips (T2V or I2V when possible)
+3) assembling clips into a final MP4
+4) upscaling to 2K/4K if desired
+
+This is a **shot-based** system, not “one prompt makes a whole movie”.
+
+---
+
+## 2) Hard Constraints (Hardware & OS)
+Target system:
+- Windows 11
+- NVIDIA RTX 5070 (12GB VRAM) - Must use GPU.
+- 32GB RAM
+- 2TB SSD
+- Anaconda available
+
+Design must be stable under 12GB VRAM using:
+- fp16/bf16
+- attention slicing
+- xFormers / SDPA where supported
+- optional CPU offload
+
+---
+
+## 3) Output Targets (Realistic)
+- Native generation: 720p–1080p (preferred)
+- Final delivery: 1080p required; 2K/4K via upscaling
+- Duration: 15–30s per video (may be segmented)
+- FPS: 24 default
+- Output: MP4 (H.264/H.265)
+
+---
+
+## 4) CUDA 13.1 Reality & PyTorch Plan (Critical)
+User has CUDA Toolkit 13.1 installed. Current PyTorch builds generally ship with and target CUDA 12.x runtimes.
+We must NOT assume PyTorch will build/run against local CUDA 13.1 toolkit.
+
+**Plan:**
+- Use **PyTorch prebuilt binaries that bundle CUDA runtime** (e.g., cu121 / cu124).
+- Rely on NVIDIA driver compatibility rather than local CUDA toolkit version.
+- Avoid compiling custom CUDA extensions unless necessary.
+
+Implementation notes:
+- Prefer installing PyTorch via conda or pip using official CUDA 12.x builds.
+- If xFormers causes build issues, use PyTorch SDPA and disable xFormers.
+
+---
+
+## 5) Approved Stack (Do Not Deviate)
+### Core
+- Python 3.10 or 3.11 (conda env)
+- PyTorch (CUDA 12.x build: cu121 or cu124)
+- diffusers + transformers + accelerate + safetensors
+- ffmpeg for assembly
+- opencv-python for frame IO (if needed)
+- pydantic for config/schema validation
+- rich / loguru for logs
+
+### Testing
+- pytest
+- pytest-cov
+- snapshot-ish tests where feasible (metadata + shapes, not visual perfection)
+
+### Docs
+- /docs/developer.md (developer documentation)
+- /docs/user.md (user manual)
+- Keep docs updated alongside code changes.
+
+---
+
+## 6) Video Models (Pragmatic Choices)
+### Primary (target)
+- WAN 2.x family (T2V; optional I2V if supported in chosen pipeline)
+Goal: best possible quality on consumer VRAM with chunking.
+
+### Secondary / fallback
+- Stable Video Diffusion (SVD) if WAN is unstable
+- LTX-Video (only if it fits and is stable in our stack)
+
+All model backends must implement the same interface:
+- generate_shot(shot_spec) -> video_file + metadata
+
+---
+
+## 7) Canonical Input: Storyboard JSON
+Storyboard source is text-only (often AI-generated). We will store and validate it as JSON.
+
+A template exists at: `templates/storyboard.template.json`
+
+We will later build a utility script:
+- input: plain text fields or a simple text format
+- output: valid storyboard JSON
+
+---
+
+## 8) Pipeline Modules (Required)
+### A) Storyboard parsing & validation
+- Load storyboard JSON
+- Validate schema
+- Expand defaults (fps, resolution, global style)
+- Produce normalized shot list
+
+### B) Prompt compilation
+- Merge global style + shot prompt + camera notes
+- Produce positive + negative prompts
+- Keep deterministic via seeds
+
+### C) Generation runner (per shot)
+- For each shot: generate clip
+- Support:
+  - seed control
+  - chunking (e.g., generate 4–6 seconds then continue)
+  - optional init frame handoff between shots
+
+### D) Assembly
+- Use ffmpeg concat to build final video
+- Optionally add:
+  - transitions
+  - temp audio
+  - burn-in shot IDs for debugging mode
+
+### E) Upscaling (optional)
+- Upscale final to 2K/4K (post step)
+- Keep this modular so user can skip.
+
+---
+
+## 9) Determinism & Logging (Must Have)
+For each shot and final render, save:
+- prompts (positive/negative)
+- seed(s)
+- model + revision/hash info if available
+- inference params (steps, cfg, sampler, resolution, fps, frames)
+- timing + VRAM notes if possible
+
+Every run produces a folder:
+- outputs/<project>/<timestamp>/
+  - shots/
+  - assembled/
+  - metadata/
+
+---
+
+## 10) Testing Rules (Hard Requirement)
+- Tests must be written alongside features.
+- Whenever a file/function is modified, corresponding tests MUST be updated.
+- Prefer tests that verify:
+  - schema validation works
+  - prompt compiler output is stable
+  - shot planner expands durations -> frame counts
+  - assembly command lines are correct
+  - metadata is generated correctly
+
+Do not require “visual quality” assertions. Test structure and determinism.
+
+---
+
+## 11) Documentation Rules (Hard Requirement)
+Maintain these continuously:
+- docs/developer.md
+  - architecture
+  - install steps
+  - how to run tests
+  - how to add a new model backend
+- docs/user.md
+  - quickstart
+  - how to create storyboard JSON
+  - how to run generation
+  - where outputs go
+  - troubleshooting (VRAM, drivers, ffmpeg)
+
+Docs must be updated whenever CLI flags, file formats, or workflows change.
+
+---
+
+## 12) Project Files to Maintain
+Required:
+- requirements.txt (pip deps)
+- environment.yml (conda env)
+- templates/storyboard.template.json
+- docs/developer.md
+- docs/user.md
+- src/ (implementation)
+- tests/ (pytest)
+
+---
+
+## 13) Definition of Done
+A feature is “done” only if:
+- implemented
+- tests added/updated
+- docs updated
+- reproducible install instructions remain valid
+
+End of file.