211 lines
5.5 KiB
Markdown
211 lines
5.5 KiB
Markdown
# agents.md
|
||
## Project: Local AI Video Generation from Text Storyboards (Windows + RTX 5070 12GB)
|
||
|
||
### 0) Who this is for
|
||
The owner (user) is not an ML expert. The system must:
|
||
- be reproducible (conda + requirements)
|
||
- have guardrails (configs, logs, validation)
|
||
- be test-driven (pytest)
|
||
- maintain docs (developer + user)
|
||
|
||
---
|
||
|
||
## 1) High-Level Goal
|
||
Build a local pipeline that converts **text-only storyboards** into **15–30 second videos** by:
|
||
1) converting storyboard -> shot plan
|
||
2) generating shot clips (T2V or I2V when possible)
|
||
3) assembling clips into a final MP4
|
||
4) upscaling to 2K/4K if desired
|
||
|
||
This is a **shot-based** system, not “one prompt makes a whole movie”.
|
||
|
||
---
|
||
|
||
## 2) Hard Constraints (Hardware & OS)
|
||
Target system:
|
||
- Windows 11
|
||
- NVIDIA RTX 5070 (12GB VRAM) - Must use GPU.
|
||
- 32GB RAM
|
||
- 2TB SSD
|
||
- Anaconda available
|
||
|
||
Design must be stable under 12GB VRAM using:
|
||
- fp16/bf16
|
||
- attention slicing
|
||
- xFormers / SDPA where supported
|
||
- optional CPU offload
|
||
|
||
---
|
||
|
||
## 3) Output Targets (Realistic)
|
||
- Native generation: 720p–1080p (preferred)
|
||
- Final delivery: 1080p required; 2K/4K via upscaling
|
||
- Duration: 15–30s per video (may be segmented)
|
||
- FPS: 24 default
|
||
- Output: MP4 (H.264/H.265)
|
||
|
||
---
|
||
|
||
## 4) CUDA 13.1 Reality & PyTorch Plan (Critical)
|
||
User has CUDA Toolkit 13.1 installed. Current PyTorch builds generally ship with and target CUDA 12.x runtimes.
|
||
We must NOT assume PyTorch will build/run against local CUDA 13.1 toolkit.
|
||
|
||
**Plan:**
|
||
- Use **PyTorch prebuilt binaries that bundle CUDA runtime** (e.g., cu121 / cu124).
|
||
- Rely on NVIDIA driver compatibility rather than local CUDA toolkit version.
|
||
- Avoid compiling custom CUDA extensions unless necessary.
|
||
|
||
Implementation notes:
|
||
- Prefer installing PyTorch via conda or pip using official CUDA 12.x builds.
|
||
- If xFormers causes build issues, use PyTorch SDPA and disable xFormers.
|
||
|
||
---
|
||
|
||
## 5) Approved Stack (Do Not Deviate)
|
||
### Core
|
||
- Python 3.10 or 3.11 (conda env)
|
||
- PyTorch (CUDA 12.x build: cu121 or cu124)
|
||
- diffusers + transformers + accelerate + safetensors
|
||
- ffmpeg for assembly
|
||
- opencv-python for frame IO (if needed)
|
||
- pydantic for config/schema validation
|
||
- rich / loguru for logs
|
||
|
||
### Testing
|
||
- pytest
|
||
- pytest-cov
|
||
- snapshot-ish tests where feasible (metadata + shapes, not visual perfection)
|
||
|
||
### Docs
|
||
- /docs/developer.md (developer documentation)
|
||
- /docs/user.md (user manual)
|
||
- Keep docs updated alongside code changes.
|
||
|
||
---
|
||
|
||
## 6) Video Models (Pragmatic Choices)
|
||
### Primary (target)
|
||
- WAN 2.x family (T2V; optional I2V if supported in chosen pipeline)
|
||
Goal: best possible quality on consumer VRAM with chunking.
|
||
|
||
### Secondary / fallback
|
||
- Stable Video Diffusion (SVD) if WAN is unstable
|
||
- LTX-Video (only if it fits and is stable in our stack)
|
||
|
||
All model backends must implement the same interface:
|
||
- generate_shot(shot_spec) -> video_file + metadata
|
||
|
||
---
|
||
|
||
## 7) Canonical Input: Storyboard JSON
|
||
Storyboard source is text-only (often AI-generated). We will store and validate it as JSON.
|
||
|
||
A template exists at: `templates/storyboard.template.json`
|
||
|
||
We will later build a utility script:
|
||
- input: plain text fields or a simple text format
|
||
- output: valid storyboard JSON
|
||
|
||
---
|
||
|
||
## 8) Pipeline Modules (Required)
|
||
### A) Storyboard parsing & validation
|
||
- Load storyboard JSON
|
||
- Validate schema
|
||
- Expand defaults (fps, resolution, global style)
|
||
- Produce normalized shot list
|
||
|
||
### B) Prompt compilation
|
||
- Merge global style + shot prompt + camera notes
|
||
- Produce positive + negative prompts
|
||
- Keep deterministic via seeds
|
||
|
||
### C) Generation runner (per shot)
|
||
- For each shot: generate clip
|
||
- Support:
|
||
- seed control
|
||
- chunking (e.g., generate 4–6 seconds then continue)
|
||
- optional init frame handoff between shots
|
||
|
||
### D) Assembly
|
||
- Use ffmpeg concat to build final video
|
||
- Optionally add:
|
||
- transitions
|
||
- temp audio
|
||
- burn-in shot IDs for debugging mode
|
||
|
||
### E) Upscaling (optional)
|
||
- Upscale final to 2K/4K (post step)
|
||
- Keep this modular so user can skip.
|
||
|
||
---
|
||
|
||
## 9) Determinism & Logging (Must Have)
|
||
For each shot and final render, save:
|
||
- prompts (positive/negative)
|
||
- seed(s)
|
||
- model + revision/hash info if available
|
||
- inference params (steps, cfg, sampler, resolution, fps, frames)
|
||
- timing + VRAM notes if possible
|
||
|
||
Every run produces a folder:
|
||
- outputs/<project>/<timestamp>/
|
||
- shots/
|
||
- assembled/
|
||
- metadata/
|
||
|
||
---
|
||
|
||
## 10) Testing Rules (Hard Requirement)
|
||
- Tests must be written alongside features.
|
||
- Whenever a file/function is modified, corresponding tests MUST be updated.
|
||
- Prefer tests that verify:
|
||
- schema validation works
|
||
- prompt compiler output is stable
|
||
- shot planner expands durations -> frame counts
|
||
- assembly command lines are correct
|
||
- metadata is generated correctly
|
||
|
||
Do not require “visual quality” assertions. Test structure and determinism.
|
||
|
||
---
|
||
|
||
## 11) Documentation Rules (Hard Requirement)
|
||
Maintain these continuously:
|
||
- docs/developer.md
|
||
- architecture
|
||
- install steps
|
||
- how to run tests
|
||
- how to add a new model backend
|
||
- docs/user.md
|
||
- quickstart
|
||
- how to create storyboard JSON
|
||
- how to run generation
|
||
- where outputs go
|
||
- troubleshooting (VRAM, drivers, ffmpeg)
|
||
|
||
Docs must be updated whenever CLI flags, file formats, or workflows change.
|
||
|
||
---
|
||
|
||
## 12) Project Files to Maintain
|
||
Required:
|
||
- requirements.txt (pip deps)
|
||
- environment.yml (conda env)
|
||
- templates/storyboard.template.json
|
||
- docs/developer.md
|
||
- docs/user.md
|
||
- src/ (implementation)
|
||
- tests/ (pytest)
|
||
|
||
---
|
||
|
||
## 13) Definition of Done
|
||
A feature is “done” only if:
|
||
- implemented
|
||
- tests added/updated
|
||
- docs updated
|
||
- reproducible install instructions remain valid
|
||
|
||
End of file.
|