Initial Commit
This commit is contained in:
210
AGENTS.MD
Normal file
210
AGENTS.MD
Normal file
@@ -0,0 +1,210 @@
|
||||
# agents.md
|
||||
## Project: Local AI Video Generation from Text Storyboards (Windows + RTX 5070 12GB)
|
||||
|
||||
### 0) Who this is for
|
||||
The owner (user) is not an ML expert. The system must:
|
||||
- be reproducible (conda + requirements)
|
||||
- have guardrails (configs, logs, validation)
|
||||
- be test-driven (pytest)
|
||||
- maintain docs (developer + user)
|
||||
|
||||
---
|
||||
|
||||
## 1) High-Level Goal
|
||||
Build a local pipeline that converts **text-only storyboards** into **15–30 second videos** by:
|
||||
1) converting storyboard -> shot plan
|
||||
2) generating shot clips (T2V or I2V when possible)
|
||||
3) assembling clips into a final MP4
|
||||
4) upscaling to 2K/4K if desired
|
||||
|
||||
This is a **shot-based** system, not “one prompt makes a whole movie”.
|
||||
|
||||
---
|
||||
|
||||
## 2) Hard Constraints (Hardware & OS)
|
||||
Target system:
|
||||
- Windows 11
|
||||
- NVIDIA RTX 5070 (12GB VRAM) - Must use GPU.
|
||||
- 32GB RAM
|
||||
- 2TB SSD
|
||||
- Anaconda available
|
||||
|
||||
Design must be stable under 12GB VRAM using:
|
||||
- fp16/bf16
|
||||
- attention slicing
|
||||
- xFormers / SDPA where supported
|
||||
- optional CPU offload
|
||||
|
||||
---
|
||||
|
||||
## 3) Output Targets (Realistic)
|
||||
- Native generation: 720p–1080p (preferred)
|
||||
- Final delivery: 1080p required; 2K/4K via upscaling
|
||||
- Duration: 15–30s per video (may be segmented)
|
||||
- FPS: 24 default
|
||||
- Output: MP4 (H.264/H.265)
|
||||
|
||||
---
|
||||
|
||||
## 4) CUDA 13.1 Reality & PyTorch Plan (Critical)
|
||||
User has CUDA Toolkit 13.1 installed. Current PyTorch builds generally ship with and target CUDA 12.x runtimes.
|
||||
We must NOT assume PyTorch will build/run against local CUDA 13.1 toolkit.
|
||||
|
||||
**Plan:**
|
||||
- Use **PyTorch prebuilt binaries that bundle CUDA runtime** (e.g., cu121 / cu124).
|
||||
- Rely on NVIDIA driver compatibility rather than local CUDA toolkit version.
|
||||
- Avoid compiling custom CUDA extensions unless necessary.
|
||||
|
||||
Implementation notes:
|
||||
- Prefer installing PyTorch via conda or pip using official CUDA 12.x builds.
|
||||
- If xFormers causes build issues, use PyTorch SDPA and disable xFormers.
|
||||
|
||||
---
|
||||
|
||||
## 5) Approved Stack (Do Not Deviate)
|
||||
### Core
|
||||
- Python 3.10 or 3.11 (conda env)
|
||||
- PyTorch (CUDA 12.x build: cu121 or cu124)
|
||||
- diffusers + transformers + accelerate + safetensors
|
||||
- ffmpeg for assembly
|
||||
- opencv-python for frame IO (if needed)
|
||||
- pydantic for config/schema validation
|
||||
- rich / loguru for logs
|
||||
|
||||
### Testing
|
||||
- pytest
|
||||
- pytest-cov
|
||||
- snapshot-ish tests where feasible (metadata + shapes, not visual perfection)
|
||||
|
||||
### Docs
|
||||
- /docs/developer.md (developer documentation)
|
||||
- /docs/user.md (user manual)
|
||||
- Keep docs updated alongside code changes.
|
||||
|
||||
---
|
||||
|
||||
## 6) Video Models (Pragmatic Choices)
|
||||
### Primary (target)
|
||||
- WAN 2.x family (T2V; optional I2V if supported in chosen pipeline)
|
||||
Goal: best possible quality on consumer VRAM with chunking.
|
||||
|
||||
### Secondary / fallback
|
||||
- Stable Video Diffusion (SVD) if WAN is unstable
|
||||
- LTX-Video (only if it fits and is stable in our stack)
|
||||
|
||||
All model backends must implement the same interface:
|
||||
- generate_shot(shot_spec) -> video_file + metadata
|
||||
|
||||
---
|
||||
|
||||
## 7) Canonical Input: Storyboard JSON
|
||||
Storyboard source is text-only (often AI-generated). We will store and validate it as JSON.
|
||||
|
||||
A template exists at: `templates/storyboard.template.json`
|
||||
|
||||
We will later build a utility script:
|
||||
- input: plain text fields or a simple text format
|
||||
- output: valid storyboard JSON
|
||||
|
||||
---
|
||||
|
||||
## 8) Pipeline Modules (Required)
|
||||
### A) Storyboard parsing & validation
|
||||
- Load storyboard JSON
|
||||
- Validate schema
|
||||
- Expand defaults (fps, resolution, global style)
|
||||
- Produce normalized shot list
|
||||
|
||||
### B) Prompt compilation
|
||||
- Merge global style + shot prompt + camera notes
|
||||
- Produce positive + negative prompts
|
||||
- Keep deterministic via seeds
|
||||
|
||||
### C) Generation runner (per shot)
|
||||
- For each shot: generate clip
|
||||
- Support:
|
||||
- seed control
|
||||
- chunking (e.g., generate 4–6 seconds then continue)
|
||||
- optional init frame handoff between shots
|
||||
|
||||
### D) Assembly
|
||||
- Use ffmpeg concat to build final video
|
||||
- Optionally add:
|
||||
- transitions
|
||||
- temp audio
|
||||
- burn-in shot IDs for debugging mode
|
||||
|
||||
### E) Upscaling (optional)
|
||||
- Upscale final to 2K/4K (post step)
|
||||
- Keep this modular so user can skip.
|
||||
|
||||
---
|
||||
|
||||
## 9) Determinism & Logging (Must Have)
|
||||
For each shot and final render, save:
|
||||
- prompts (positive/negative)
|
||||
- seed(s)
|
||||
- model + revision/hash info if available
|
||||
- inference params (steps, cfg, sampler, resolution, fps, frames)
|
||||
- timing + VRAM notes if possible
|
||||
|
||||
Every run produces a folder:
|
||||
- outputs/<project>/<timestamp>/
|
||||
- shots/
|
||||
- assembled/
|
||||
- metadata/
|
||||
|
||||
---
|
||||
|
||||
## 10) Testing Rules (Hard Requirement)
|
||||
- Tests must be written alongside features.
|
||||
- Whenever a file/function is modified, corresponding tests MUST be updated.
|
||||
- Prefer tests that verify:
|
||||
- schema validation works
|
||||
- prompt compiler output is stable
|
||||
- shot planner expands durations -> frame counts
|
||||
- assembly command lines are correct
|
||||
- metadata is generated correctly
|
||||
|
||||
Do not require “visual quality” assertions. Test structure and determinism.
|
||||
|
||||
---
|
||||
|
||||
## 11) Documentation Rules (Hard Requirement)
|
||||
Maintain these continuously:
|
||||
- docs/developer.md
|
||||
- architecture
|
||||
- install steps
|
||||
- how to run tests
|
||||
- how to add a new model backend
|
||||
- docs/user.md
|
||||
- quickstart
|
||||
- how to create storyboard JSON
|
||||
- how to run generation
|
||||
- where outputs go
|
||||
- troubleshooting (VRAM, drivers, ffmpeg)
|
||||
|
||||
Docs must be updated whenever CLI flags, file formats, or workflows change.
|
||||
|
||||
---
|
||||
|
||||
## 12) Project Files to Maintain
|
||||
Required:
|
||||
- requirements.txt (pip deps)
|
||||
- environment.yml (conda env)
|
||||
- templates/storyboard.template.json
|
||||
- docs/developer.md
|
||||
- docs/user.md
|
||||
- src/ (implementation)
|
||||
- tests/ (pytest)
|
||||
|
||||
---
|
||||
|
||||
## 13) Definition of Done
|
||||
A feature is “done” only if:
|
||||
- implemented
|
||||
- tests added/updated
|
||||
- docs updated
|
||||
- reproducible install instructions remain valid
|
||||
|
||||
End of file.
|
||||
Reference in New Issue
Block a user