"""Preprocess a Ditto-1M subset for SCD training. Selects N video editing pairs from Ditto-1M, encodes edited videos to VAE latents, uv run python scripts/preprocess_ditto_subset.py --subset-size 500 ...
- Stage 1 :: Projection Matrix Alignment between Vision Encoder & Pretrained LLM on CC-3M-595K (Custom) - Stage 2 :: Projection & LLM Finetuning on LLaVa v1.5 Instruct (including various ...