adjust the resolution based on available VRAM. add elapsed time.

This commit is contained in:
2026-02-04 03:06:31 -05:00
parent 1252518832
commit c33c1d2f36
7 changed files with 347 additions and 174 deletions

View File

@@ -11,13 +11,13 @@ The owner (user) is not an ML expert. The system must:
---
## 1) High-Level Goal
Build a local pipeline that converts **text-only storyboards** into **1530 second videos** by:
Build a local pipeline that converts text-only storyboards into 15-30 second videos by:
1) converting storyboard -> shot plan
2) generating shot clips (T2V or I2V when possible)
3) assembling clips into a final MP4
4) upscaling to 2K/4K if desired
This is a **shot-based** system, not one prompt makes a whole movie.
This is a shot-based system, not "one prompt makes a whole movie".
---
@@ -38,9 +38,9 @@ Design must be stable under 12GB VRAM using:
---
## 3) Output Targets (Realistic)
- Native generation: 720p1080p (preferred)
- Native generation: 720p-1080p (preferred)
- Final delivery: 1080p required; 2K/4K via upscaling
- Duration: 1530s per video (may be segmented)
- Duration: 15-30s per video (may be segmented)
- FPS: 24 default
- Output: MP4 (H.264/H.265)
@@ -50,13 +50,15 @@ Design must be stable under 12GB VRAM using:
User has CUDA Toolkit 13.1 installed. Current PyTorch builds generally ship with and target CUDA 12.x runtimes.
We must NOT assume PyTorch will build/run against local CUDA 13.1 toolkit.
**Plan:**
- Use **PyTorch prebuilt binaries that bundle CUDA runtime** (e.g., cu121 / cu124).
Plan:
- Use PyTorch prebuilt binaries that bundle CUDA runtime (cu121/cu124/cu128).
- Rely on NVIDIA driver compatibility rather than local CUDA toolkit version.
- Avoid compiling custom CUDA extensions unless necessary.
Implementation notes:
- Prefer installing PyTorch via conda or pip using official CUDA 12.x builds.
- For RTX 5070 (sm_120), use CUDA 12.8 wheels via pip:
`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128`
- Prefer conda for Python, ffmpeg, and general deps; use pip for torch if sm_120 support is required.
- If xFormers causes build issues, use PyTorch SDPA and disable xFormers.
---
@@ -64,12 +66,13 @@ Implementation notes:
## 5) Approved Stack (Do Not Deviate)
### Core
- Python 3.10 or 3.11 (conda env)
- PyTorch (CUDA 12.x build: cu121 or cu124)
- PyTorch (CUDA 12.x build, cu121/cu124/cu128)
- diffusers + transformers + accelerate + safetensors
- ffmpeg for assembly
- opencv-python for frame IO (if needed)
- pydantic for config/schema validation
- rich / loguru for logs
- ftfy for text normalization (required by WAN)
### Testing
- pytest
@@ -124,7 +127,7 @@ We will later build a utility script:
- For each shot: generate clip
- Support:
- seed control
- chunking (e.g., generate 46 seconds then continue)
- chunking (e.g., generate 4-6 seconds then continue)
- optional init frame handoff between shots
### D) Assembly
@@ -149,7 +152,7 @@ For each shot and final render, save:
- timing + VRAM notes if possible
Every run produces a folder:
- outputs/<project>/<timestamp>/
- outputs/<project>/
- shots/
- assembled/
- metadata/
@@ -166,7 +169,7 @@ Every run produces a folder:
- assembly command lines are correct
- metadata is generated correctly
Do not require visual quality assertions. Test structure and determinism.
Do not require visual quality assertions. Test structure and determinism.
---
@@ -201,10 +204,10 @@ Required:
---
## 13) Definition of Done
A feature is done only if:
A feature is "done" only if:
- implemented
- tests added/updated
- docs updated
- reproducible install instructions remain valid
End of file.
End of file.