213 lines
5.7 KiB
Markdown
213 lines
5.7 KiB
Markdown
# agents.md
|
|
## Project: Local AI Video Generation from Text Storyboards (Windows + RTX 5070 12GB)
|
|
|
|
### 0) Who this is for
|
|
The owner (user) is not an ML expert. The system must:
|
|
- be reproducible (conda + requirements)
|
|
- have guardrails (configs, logs, validation)
|
|
- be test-driven (pytest)
|
|
- maintain docs (developer + user)
|
|
|
|
---
|
|
|
|
## 1) High-Level Goal
|
|
Build a local pipeline that converts text-only storyboards into 15-30 second videos by:
|
|
1) converting storyboard -> shot plan
|
|
2) generating shot clips (T2V or I2V when possible)
|
|
3) assembling clips into a final MP4
|
|
4) upscaling to 2K/4K if desired
|
|
|
|
This is a shot-based system, not "one prompt makes a whole movie".
|
|
|
|
---
|
|
|
|
## 2) Hard Constraints (Hardware & OS)
|
|
Target system:
|
|
- Windows 11
|
|
- NVIDIA RTX 5070 (12GB VRAM) - Must use GPU.
|
|
- 32GB RAM
|
|
- 2TB SSD
|
|
- Anaconda available
|
|
|
|
Design must be stable under 12GB VRAM using:
|
|
- fp16/bf16
|
|
- attention slicing
|
|
- xFormers / SDPA where supported
|
|
- optional CPU offload
|
|
|
|
---
|
|
|
|
## 3) Output Targets (Realistic)
|
|
- Native generation: 720p-1080p (preferred)
|
|
- Final delivery: 1080p required; 2K/4K via upscaling
|
|
- Duration: 15-30s per video (may be segmented)
|
|
- FPS: 24 default
|
|
- Output: MP4 (H.264/H.265)
|
|
|
|
---
|
|
|
|
## 4) CUDA 13.1 Reality & PyTorch Plan (Critical)
|
|
User has CUDA Toolkit 13.1 installed. Current PyTorch builds generally ship with and target CUDA 12.x runtimes.
|
|
We must NOT assume PyTorch will build/run against local CUDA 13.1 toolkit.
|
|
|
|
Plan:
|
|
- Use PyTorch prebuilt binaries that bundle CUDA runtime (cu121/cu124/cu128).
|
|
- Rely on NVIDIA driver compatibility rather than local CUDA toolkit version.
|
|
- Avoid compiling custom CUDA extensions unless necessary.
|
|
|
|
Implementation notes:
|
|
- For RTX 5070 (sm_120), use CUDA 12.8 wheels via pip:
|
|
`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128`
|
|
- Prefer conda for Python, ffmpeg, and general deps; use pip for torch if sm_120 support is required.
|
|
- If xFormers causes build issues, use PyTorch SDPA and disable xFormers.
|
|
|
|
---
|
|
|
|
## 5) Approved Stack (Do Not Deviate)
|
|
### Core
|
|
- Python 3.10 or 3.11 (conda env)
|
|
- PyTorch (CUDA 12.x build, cu121/cu124/cu128)
|
|
- diffusers + transformers + accelerate + safetensors
|
|
- ffmpeg for assembly
|
|
- opencv-python for frame IO (if needed)
|
|
- pydantic for config/schema validation
|
|
- rich / loguru for logs
|
|
- ftfy for text normalization (required by WAN)
|
|
|
|
### Testing
|
|
- pytest
|
|
- pytest-cov
|
|
- snapshot-ish tests where feasible (metadata + shapes, not visual perfection)
|
|
|
|
### Docs
|
|
- /docs/developer.md (developer documentation)
|
|
- /docs/user.md (user manual)
|
|
- Keep docs updated alongside code changes.
|
|
|
|
---
|
|
|
|
## 6) Video Models (Pragmatic Choices)
|
|
### Primary (target)
|
|
- WAN 2.x family (T2V; optional I2V if supported in chosen pipeline)
|
|
Goal: best possible quality on consumer VRAM with chunking.
|
|
|
|
### Secondary / fallback
|
|
- Stable Video Diffusion (SVD) if WAN is unstable
|
|
- LTX-Video (only if it fits and is stable in our stack)
|
|
|
|
All model backends must implement the same interface:
|
|
- generate_shot(shot_spec) -> video_file + metadata
|
|
|
|
---
|
|
|
|
## 7) Canonical Input: Storyboard JSON
|
|
Storyboard source is text-only (often AI-generated). We will store and validate it as JSON.
|
|
|
|
A template exists at: `templates/storyboard.template.json`
|
|
|
|
We will later build a utility script:
|
|
- input: plain text fields or a simple text format
|
|
- output: valid storyboard JSON
|
|
|
|
---
|
|
|
|
## 8) Pipeline Modules (Required)
|
|
### A) Storyboard parsing & validation
|
|
- Load storyboard JSON
|
|
- Validate schema
|
|
- Expand defaults (fps, resolution, global style)
|
|
- Produce normalized shot list
|
|
|
|
### B) Prompt compilation
|
|
- Merge global style + shot prompt + camera notes
|
|
- Produce positive + negative prompts
|
|
- Keep deterministic via seeds
|
|
|
|
### C) Generation runner (per shot)
|
|
- For each shot: generate clip
|
|
- Support:
|
|
- seed control
|
|
- chunking (e.g., generate 4-6 seconds then continue)
|
|
- optional init frame handoff between shots
|
|
|
|
### D) Assembly
|
|
- Use ffmpeg concat to build final video
|
|
- Optionally add:
|
|
- transitions
|
|
- temp audio
|
|
- burn-in shot IDs for debugging mode
|
|
|
|
### E) Upscaling (optional)
|
|
- Upscale final to 2K/4K (post step)
|
|
- Keep this modular so user can skip.
|
|
|
|
---
|
|
|
|
## 9) Determinism & Logging (Must Have)
|
|
For each shot and final render, save:
|
|
- prompts (positive/negative)
|
|
- seed(s)
|
|
- model + revision/hash info if available
|
|
- inference params (steps, cfg, sampler, resolution, fps, frames)
|
|
- timing + VRAM notes if possible
|
|
|
|
Every run produces a folder:
|
|
- outputs/<project>/
|
|
- shots/
|
|
- assembled/
|
|
- metadata/
|
|
|
|
---
|
|
|
|
## 10) Testing Rules (Hard Requirement)
|
|
- Tests must be written alongside features.
|
|
- Whenever a file/function is modified, corresponding tests MUST be updated.
|
|
- Prefer tests that verify:
|
|
- schema validation works
|
|
- prompt compiler output is stable
|
|
- shot planner expands durations -> frame counts
|
|
- assembly command lines are correct
|
|
- metadata is generated correctly
|
|
|
|
Do not require visual quality assertions. Test structure and determinism.
|
|
|
|
---
|
|
|
|
## 11) Documentation Rules (Hard Requirement)
|
|
Maintain these continuously:
|
|
- docs/developer.md
|
|
- architecture
|
|
- install steps
|
|
- how to run tests
|
|
- how to add a new model backend
|
|
- docs/user.md
|
|
- quickstart
|
|
- how to create storyboard JSON
|
|
- how to run generation
|
|
- where outputs go
|
|
- troubleshooting (VRAM, drivers, ffmpeg)
|
|
|
|
Docs must be updated whenever CLI flags, file formats, or workflows change.
|
|
|
|
---
|
|
|
|
## 12) Project Files to Maintain
|
|
Required:
|
|
- requirements.txt (pip deps)
|
|
- environment.yml (conda env)
|
|
- templates/storyboard.template.json
|
|
- docs/developer.md
|
|
- docs/user.md
|
|
- src/ (implementation)
|
|
- tests/ (pytest)
|
|
|
|
---
|
|
|
|
## 13) Definition of Done
|
|
A feature is "done" only if:
|
|
- implemented
|
|
- tests added/updated
|
|
- docs updated
|
|
- reproducible install instructions remain valid
|
|
|
|
End of file. |