adjust the resolution based on available VRAM. add elapsed time.

2026-02-04 03:06:31 -05:00
parent 1252518832
commit c33c1d2f36
7 changed files with 347 additions and 174 deletions
--- a/AGENTS.MD
+++ b/AGENTS.MD
@@ -11,13 +11,13 @@ The owner (user) is not an ML expert. The system must:
 ---

 ## 1) High-Level Goal
-Build a local pipeline that converts **text-only storyboards** into **15–30 second videos** by:
+Build a local pipeline that converts text-only storyboards into 15-30 second videos by:
 1) converting storyboard -> shot plan
 2) generating shot clips (T2V or I2V when possible)
 3) assembling clips into a final MP4
 4) upscaling to 2K/4K if desired

-This is a **shot-based** system, not “one prompt makes a whole movie”.
+This is a shot-based system, not "one prompt makes a whole movie".

 ---

@@ -38,9 +38,9 @@ Design must be stable under 12GB VRAM using:
 ---

 ## 3) Output Targets (Realistic)
- Native generation: 720p–1080p (preferred)
+- Native generation: 720p-1080p (preferred)
 - Final delivery: 1080p required; 2K/4K via upscaling
- Duration: 15–30s per video (may be segmented)
+- Duration: 15-30s per video (may be segmented)
 - FPS: 24 default
 - Output: MP4 (H.264/H.265)

@@ -50,13 +50,15 @@ Design must be stable under 12GB VRAM using:
 User has CUDA Toolkit 13.1 installed. Current PyTorch builds generally ship with and target CUDA 12.x runtimes.
 We must NOT assume PyTorch will build/run against local CUDA 13.1 toolkit.

-**Plan:**
- Use **PyTorch prebuilt binaries that bundle CUDA runtime** (e.g., cu121 / cu124).
+Plan:
+- Use PyTorch prebuilt binaries that bundle CUDA runtime (cu121/cu124/cu128).
 - Rely on NVIDIA driver compatibility rather than local CUDA toolkit version.
 - Avoid compiling custom CUDA extensions unless necessary.

 Implementation notes:
- Prefer installing PyTorch via conda or pip using official CUDA 12.x builds.
+- For RTX 5070 (sm_120), use CUDA 12.8 wheels via pip:
+  `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128`
+- Prefer conda for Python, ffmpeg, and general deps; use pip for torch if sm_120 support is required.
 - If xFormers causes build issues, use PyTorch SDPA and disable xFormers.

 ---
@@ -64,12 +66,13 @@ Implementation notes:
 ## 5) Approved Stack (Do Not Deviate)
 ### Core
 - Python 3.10 or 3.11 (conda env)
- PyTorch (CUDA 12.x build: cu121 or cu124)
+- PyTorch (CUDA 12.x build, cu121/cu124/cu128)
 - diffusers + transformers + accelerate + safetensors
 - ffmpeg for assembly
 - opencv-python for frame IO (if needed)
 - pydantic for config/schema validation
 - rich / loguru for logs
+- ftfy for text normalization (required by WAN)

 ### Testing
 - pytest
@@ -124,7 +127,7 @@ We will later build a utility script:
 - For each shot: generate clip
 - Support:
  - seed control
-  - chunking (e.g., generate 4–6 seconds then continue)
+  - chunking (e.g., generate 4-6 seconds then continue)
  - optional init frame handoff between shots

 ### D) Assembly
@@ -149,7 +152,7 @@ For each shot and final render, save:
 - timing + VRAM notes if possible

 Every run produces a folder:
- outputs/<project>/<timestamp>/
+- outputs/<project>/
  - shots/
  - assembled/
  - metadata/
@@ -166,7 +169,7 @@ Every run produces a folder:
  - assembly command lines are correct
  - metadata is generated correctly

-Do not require “visual quality” assertions. Test structure and determinism.
+Do not require visual quality assertions. Test structure and determinism.

 ---

@@ -201,10 +204,10 @@ Required:
 ---

 ## 13) Definition of Done
-A feature is “done” only if:
+A feature is "done" only if:
 - implemented
 - tests added/updated
 - docs updated
 - reproducible install instructions remain valid

-End of file.
+End of file.