wan2-video-generation

Running

App Files Files Community

Smikke commited on Oct 25

Commit

d16eb70

verified ·

1 Parent(s): c0e1c6a

Deploy optimized Wan2.2 video generation with Zero GPU support

Browse files

Files changed (5) hide show

.gitignore +66 -0
DEPLOYMENT.md +285 -0
README.md +130 -6
app.py +296 -0
requirements.txt +38 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,66 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+env/
+ENV/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Gradio
+gradio_cached_examples/
+flagged/
+# Model outputs
+output.mp4
+*.mp4
+*.avi
+*.mov
+outputs/
+# Hugging Face cache
+.cache/
+models/
+# Environment variables
+.env
+.env.local
+# Logs
+logs/
+*.log
+# Temporary files
+tmp/
+temp/
+*.tmp

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,285 @@

+# Deployment Guide for Wan2.2 on Hugging Face Spaces
+This guide explains how to deploy the Wan2.2 video generation model to Hugging Face Spaces with Zero GPU support.
+## Prerequisites
+1. A Hugging Face account (create one at https://huggingface.co/join)
+2. Git installed on your local machine
+3. Git LFS (Large File Storage) installed
+## Deployment Steps
+### Option 1: Deploy via Hugging Face Web Interface
+1. **Create a New Space**
+   - Go to https://huggingface.co/new-space
+   - Choose a name for your Space (e.g., "wan2-video-gen")
+   - Select "Gradio" as the SDK
+   - Choose "Public" or "Private" visibility
+   - Click "Create Space"
+2. **Upload Files**
+   - Use the web interface to upload files:
+     - `app.py`
+     - `requirements.txt`
+     - `README.md`
+     - `.gitignore`
+3. **Enable Zero GPU**
+   - In your Space settings, enable "Zero GPU"
+   - This provides automatic GPU allocation during inference
+4. **Wait for Build**
+   - Hugging Face will automatically build your Space
+   - This may take 10-15 minutes for the first build
+   - Check the build logs for any errors
+### Option 2: Deploy via Git (Recommended)
+1. **Clone Your Space**
+   ```bash
+   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+   cd YOUR_SPACE_NAME
+   ```
+2. **Copy Files**
+   ```bash
+   # Copy all files from huggingface-wan2.2 directory
+   cp /path/to/huggingface-wan2.2/* .
+   ```
+3. **Commit and Push**
+   ```bash
+   git add .
+   git commit -m "Initial deployment of Wan2.2 video generation"
+   git push
+   ```
+4. **Enable Zero GPU**
+   - Go to your Space settings on Hugging Face
+   - Navigate to "Settings" → "Zero GPU"
+   - Enable Zero GPU support
+### Option 3: Deploy from This Repository
+If you've already cloned this repository:
+```bash
+cd /home/user/Kakka/huggingface-wan2.2
+# Initialize git if not already done
+git init
+# Add Hugging Face Space as remote
+git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+# Commit files
+git add .
+git commit -m "Initial deployment of Wan2.2 video generation"
+# Push to Hugging Face
+git push hf main
+```
+## Configuration
+### Zero GPU Settings
+The app is configured to use Zero GPU with the following settings:
+- **Duration**: 180 seconds (3 minutes) per generation
+- **Allocation**: Automatic (triggered by generation request)
+- **Optimized defaults**: Reduced frames (73) and steps (35) to fit within time limit
+This is configured in `app.py` with the decorator:
+```python
+@spaces.GPU(duration=180)  # 3 minutes max for Pro accounts
+```
+**Important**: Even with Pro subscription, the maximum GPU duration is limited to 180 seconds (3 minutes). The default settings have been optimized to complete generation within this time:
+- Default frames: 73 (3 seconds of video at 24fps)
+- Default inference steps: 35 (balanced speed/quality)
+- Maximum frames slider: 145 (6 seconds)
+- Maximum inference steps: 60
+### Memory Requirements
+The Wan2.2-TI2V-5B model requires:
+- **Minimum**: 24GB VRAM
+- **Recommended**: 40GB+ VRAM for Zero GPU
+Zero GPU on Hugging Face Spaces provides sufficient VRAM for this model (H200 GPU with 70GB).
+## Testing Your Deployment
+1. **Wait for Build to Complete**
+   - Check the build logs in your Space
+   - Wait for "Running" status
+2. **Test Basic Generation**
+   - Try the default example: "Two anthropomorphic cats in comfy boxing gear fight on stage"
+   - Generation should take 5-10 minutes
+3. **Test Image-to-Video**
+   - Upload a test image
+   - Add a descriptive prompt
+   - Verify video generation works
+## Troubleshooting
+### Critical: Import Order Issue
+**Issue**: `RuntimeError: CUDA has been initialized before importing the 'spaces' package`
+**Solution**: This is CRITICAL! The `spaces` package MUST be imported BEFORE any CUDA-related packages (torch, diffusers, etc.)
+**Correct import order in app.py:**
+```python
+# IMPORTANT: spaces must be imported first
+import spaces
+# Standard library imports
+import os
+# Third-party imports (non-CUDA)
+import numpy as np
+from PIL import Image
+import gradio as gr
+# CUDA-related imports (must come after spaces)
+import torch
+from diffusers import WanPipeline, AutoencoderKLWan
+```
+**Why this matters**: Hugging Face Zero GPU needs to manage CUDA initialization. If torch or other CUDA libraries initialize CUDA before `spaces` is imported, Zero GPU cannot properly manage GPU allocation.
+### Build Fails
+**Issue**: Requirements installation fails
+- **Solution**: Check `requirements.txt` for compatibility issues
+- Ensure PyTorch version is compatible with CUDA on Zero GPU
+- Make sure using latest Gradio version (5.49.0+) for security
+**Issue**: Out of memory during build
+- **Solution**: Zero GPU should have enough memory; check model loading code
+**Issue**: "Can't initialize NVML" warnings
+- **Solution**: These are normal in Zero GPU environment during build time
+- They should not affect runtime when GPU is allocated
+### Runtime Errors
+**Issue**: "CUDA out of memory"
+- **Solution**: Reduce `num_frames` or image resolution
+- Check if Zero GPU is properly enabled in settings
+**Issue**: "Model not found"
+- **Solution**: Verify internet connection for model download
+- Check Hugging Face Hub status
+**Issue**: Generation timeout
+- **Solution**: Reduce inference steps or video length
+- Increase GPU duration in `@spaces.GPU(duration=XX)`
+**Issue**: Gradio security vulnerability warning
+- **Solution**: Update to Gradio 5.49.0 or later in requirements.txt
+- Check README.md YAML front matter has correct `sdk_version: 5.49.0`
+**Issue**: "ZeroGPU illegal duration! The requested GPU duration (Xs) is larger than the maximum allowed"
+- **Solution**: Reduce the duration parameter in `@spaces.GPU(duration=XX)`
+- For Pro accounts, use 180 seconds or less: `@spaces.GPU(duration=180)`
+- Free tier typically limited to 60 seconds
+- Optimize your default settings to complete within the time limit:
+  - Reduce `num_frames` (e.g., 73 for 3 seconds instead of 121 for 5 seconds)
+  - Reduce `num_inference_steps` (e.g., 35 instead of 50)
+### Slow Generation
+**Issue**: Generation takes too long
+- **Solution**: This is expected; video generation is compute-intensive
+- Typical time: 2-3 minutes for 3-second video with optimized settings (73 frames, 35 steps)
+- Consider reducing `num_inference_steps` to 25-30 for faster (but lower quality) results
+- Note: Must complete within 180 seconds (3 minutes) for Pro, 60 seconds for Free tier
+## Optimization Tips
+1. **Current Optimized Settings**
+   - Already optimized: `num_frames=73` (3 seconds) and `num_inference_steps=35`
+   - These settings are designed to complete within 180-second Zero GPU limit
+   - For even faster testing, reduce steps to 25-30
+2. **Add Caching (Optional)**
+   - Enable example caching with `cache_examples=True` to pre-generate examples
+   - Note: This increases build time and storage requirements
+   - Current setting: `cache_examples=False` for faster builds
+3. **Queue Management**
+   - Current setting: `demo.queue(max_size=20)`
+   - Adjust based on expected traffic
+   - Larger queue = more concurrent users but more resource usage
+## Customization
+### Change Default Model
+To use a different Wan2.2 variant, modify `app.py`:
+```python
+# For larger model with better quality
+MODEL_ID = "Wan-AI/Wan2.2-T2V-A14B-Diffusers"
+# For image-to-video focused
+MODEL_ID = "Wan-AI/Wan2.2-I2V-A14B-Diffusers"
+```
+### Adjust UI
+Modify the Gradio interface in `app.py`:
+- Change default values in sliders
+- Add more examples
+- Customize theme and styling
+### Add Features
+Consider adding:
+- Video upscaling
+- Multiple video outputs
+- Batch generation
+- Download history
+- Custom aspect ratios
+## Monitoring
+### Check Space Status
+- Visit your Space URL
+- Check "Settings" → "Logs" for runtime logs
+- Monitor usage in "Settings" → "Analytics"
+### Usage Limits
+Zero GPU on Hugging Face has:
+- Time limits per session
+- Concurrent user limits
+- Monthly compute quotas (check your tier)
+## Support
+If you encounter issues:
+1. **Check Logs**: Space logs often contain error details
+2. **Hugging Face Forums**: https://discuss.huggingface.co/
+3. **Model Issues**: Report at Wan-AI's GitHub or model card
+4. **Space Settings**: Verify Zero GPU is enabled and quota is available
+## License
+This deployment uses:
+- Wan2.2 model (Apache 2.0)
+- Gradio (Apache 2.0)
+- Diffusers (Apache 2.0)
+Ensure compliance with all licenses when deploying.
+---
+**Happy Deploying!** 🚀

README.md CHANGED Viewed

@@ -1,12 +1,136 @@
 ---
-title: Wan2 Video Generation
-emoji: 📉
-colorFrom: indigo
-colorTo: blue
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Wan2.2 Video Generation
+emoji: 🎥
+colorFrom: purple
+colorTo: pink
 sdk: gradio
+sdk_version: 5.49.0
 app_file: app.py
 pinned: false
+license: apache-2.0
+tags:
+  - video-generation
+  - text-to-video
+  - image-to-video
+  - diffusers
+  - wan
+  - ai-video
+  - zero-gpu
+python_version: "3.10"
 ---
+# Wan2.2 Video Generation 🎥
+Generate high-quality videos from text prompts or images using the powerful **Wan2.2-TI2V-5B** model!
+This Space provides an easy-to-use interface for creating videos with state-of-the-art AI technology.
+## Features ✨
+- **Text-to-Video**: Generate videos from descriptive text prompts
+- **Image-to-Video**: Animate your images by adding an input image
+- **High Quality**: 720P resolution at 24fps
+- **Customizable**: Adjust resolution, number of frames, guidance scale, and more
+- **Reproducible**: Use seeds to recreate your favorite generations
+## Model Information 🤖
+**Wan2.2-TI2V-5B** is a unified text-to-video and image-to-video generation model with:
+- **5 billion parameters** optimized for consumer-grade GPUs
+- **720P resolution** support (1280x704 default)
+- **24 fps** smooth video output
+- **Optimized duration**: Default 3 seconds (optimized for Zero GPU limits)
+The model uses a Mixture-of-Experts (MoE) architecture and delivers outstanding video generation quality, surpassing many commercial models.
+## How to Use 🚀
+### Text-to-Video Generation
+1. Enter your prompt describing the video you want to create
+2. Adjust settings in "Advanced Settings" if desired
+3. Click "Generate Video"
+4. Wait for generation (typically 2-3 minutes on Zero GPU with default settings)
+### Image-to-Video Generation
+1. Upload an input image
+2. Enter a prompt describing how the image should animate
+3. Click "Generate Video"
+4. The output will maintain the aspect ratio of your input image
+5. Generation takes 2-3 minutes with optimized settings
+## Advanced Settings ⚙️
+- **Width/Height**: Video resolution (default: 1280x704)
+- **Number of Frames**: Longer videos need more frames (default: 73 frames ≈ 3 seconds, max: 145)
+- **Inference Steps**: More steps = better quality but slower (default: 35, optimized for speed)
+- **Guidance Scale**: How closely to follow the prompt (default: 5.0)
+- **Seed**: Set a specific seed for reproducible results
+**Note**: Settings are optimized to complete within Zero GPU's 3-minute time limit for Pro users.
+## Tips for Best Results 💡
+1. **Detailed Prompts**: Be specific about what you want to see
+   - Good: "Two anthropomorphic cats in comfy boxing gear fight on stage with dramatic lighting"
+   - Basic: "cats fighting"
+2. **Image-to-Video**: Use clear, high-quality input images that match your prompt
+3. **Quality vs Speed** (optimized for Zero GPU limits):
+   - Fast: 25-30 steps (~2 minutes)
+   - Balanced: 35 steps (default, ~2-3 minutes)
+   - Higher Quality: 40-50 steps (~3+ minutes, may timeout)
+4. **Experiment**: Try different guidance scales:
+   - Lower (3-4): More creative, less literal
+   - Default (5): Good balance
+   - Higher (7-10): Strictly follows prompt
+## Example Prompts 📝
+- "Two anthropomorphic cats in comfy boxing gear fight on stage"
+- "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully"
+- "A bustling futuristic city at night with neon lights and flying cars"
+- "A peaceful mountain landscape with snow-capped peaks and a flowing river"
+- "An astronaut riding a horse through a nebula in deep space"
+- "A dragon flying over a medieval castle at sunset"
+## Technical Details 🔧
+- **Model**: Wan-AI/Wan2.2-TI2V-5B-Diffusers
+- **Framework**: Hugging Face Diffusers
+- **Backend**: PyTorch with bfloat16 precision
+- **GPU**: Hugging Face Zero GPU (H200 with 70GB VRAM, automatically allocated)
+- **GPU Duration**: 180 seconds (3 minutes) for Pro users
+- **Generation Time**: ~2-3 minutes with optimized settings (73 frames, 35 steps)
+## Limitations ⚠️
+- Generation requires compute time (2-3 minutes with default settings)
+- Zero GPU allocation is time-limited (3 minutes for Pro, 60 seconds for Free)
+- Videos longer than 6 seconds (145 frames) may timeout
+- Higher quality settings (50+ steps) may timeout on Zero GPU
+- Complex scenes with many objects may be challenging
+## Credits 🙏
+- **Model**: [Wan-AI](https://huggingface.co/Wan-AI)
+- **Original Repository**: [Wan2.2](https://github.com/Wan-Video/Wan2.2)
+- **Framework**: [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
+## License 📄
+This Space uses the Wan2.2 model which is released under Apache 2.0 license.
+## Related Links 🔗
+- [Wan-AI on Hugging Face](https://huggingface.co/Wan-AI)
+- [Original Model Card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
+- [Diffusers Documentation](https://huggingface.co/docs/diffusers)
+---
+**Note**: This is a community-created Space for easy access to Wan2.2 video generation. Generation times may vary based on current GPU availability.

app.py ADDED Viewed

	@@ -0,0 +1,296 @@

+# IMPORTANT: spaces must be imported first to avoid CUDA initialization issues
+import spaces
+# Standard library imports
+import os
+# Third-party imports (non-CUDA)
+import numpy as np
+from PIL import Image
+import gradio as gr
+# CUDA-related imports (must come after spaces)
+import torch
+from diffusers import WanPipeline, AutoencoderKLWan
+from diffusers.utils import export_to_video
+# Model configuration
+MODEL_ID = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
+dtype = torch.bfloat16
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Global pipeline variable
+pipe = None
+def initialize_pipeline():
+    """Initialize the Wan2.2 pipeline"""
+    global pipe
+    if pipe is None:
+        print("Loading Wan2.2-TI2V-5B model...")
+        vae = AutoencoderKLWan.from_pretrained(
+            MODEL_ID,
+            subfolder="vae",
+            torch_dtype=torch.float32
+        )
+        pipe = WanPipeline.from_pretrained(
+            MODEL_ID,
+            vae=vae,
+            torch_dtype=dtype
+        )
+        pipe.to(device)
+        print("Model loaded successfully!")
+    return pipe
+@spaces.GPU(duration=180)  # Allocate GPU for 3 minutes (max allowed for Pro)
+def generate_video(
+    prompt: str,
+    image: Image.Image = None,
+    width: int = 1280,
+    height: int = 704,
+    num_frames: int = 73,
+    num_inference_steps: int = 35,
+    guidance_scale: float = 5.0,
+    seed: int = -1
+):
+    """
+    Generate video from text prompt and optional image
+    Args:
+        prompt: Text description of the video to generate
+        image: Optional input image for image-to-video generation
+        width: Video width (default: 1280)
+        height: Video height (default: 704)
+        num_frames: Number of frames to generate (default: 73 for 3 seconds at 24fps)
+        num_inference_steps: Number of denoising steps (default: 35 for faster generation)
+        guidance_scale: Guidance scale for generation (default: 5.0)
+        seed: Random seed for reproducibility (-1 for random)
+    """
+    try:
+        # Initialize pipeline
+        pipeline = initialize_pipeline()
+        # Set seed for reproducibility
+        if seed == -1:
+            seed = torch.randint(0, 2**32 - 1, (1,)).item()
+        generator = torch.Generator(device=device).manual_seed(seed)
+        # Prepare generation parameters
+        gen_params = {
+            "prompt": prompt,
+            "height": height,
+            "width": width,
+            "num_frames": num_frames,
+            "guidance_scale": guidance_scale,
+            "num_inference_steps": num_inference_steps,
+            "generator": generator,
+        }
+        # Add image if provided (for image-to-video)
+        if image is not None:
+            gen_params["image"] = image
+        # Generate video
+        print(f"Generating video with prompt: {prompt}")
+        print(f"Parameters: {width}x{height}, {num_frames} frames, seed: {seed}")
+        output = pipeline(**gen_params).frames[0]
+        # Export to video file
+        output_path = "output.mp4"
+        export_to_video(output, output_path, fps=24)
+        return output_path, f"Video generated successfully! Seed used: {seed}"
+    except Exception as e:
+        error_msg = f"Error generating video: {str(e)}"
+        print(error_msg)
+        return None, error_msg
+# Create Gradio interface
+with gr.Blocks(title="Wan2.2 Video Generation") as demo:
+    gr.Markdown(
+        """
+        # Wan2.2 Video Generation
+        Generate high-quality videos from text prompts or images using Wan2.2-TI2V-5B model.
+        This model supports both **Text-to-Video** and **Image-to-Video** generation at 720P/24fps.
+        **Note:** Generation takes 2-3 minutes. Settings are optimized for Zero GPU 3-minute limit.
+        """
+    )
+    with gr.Row():
+        with gr.Column():
+            # Input controls
+            prompt_input = gr.Textbox(
+                label="Prompt",
+                placeholder="Describe the video you want to generate...",
+                lines=3,
+                value="Two anthropomorphic cats in comfy boxing gear fight on stage"
+            )
+            image_input = gr.Image(
+                label="Input Image (Optional - for Image-to-Video)",
+                type="pil",
+                sources=["upload"]
+            )
+            with gr.Accordion("Advanced Settings", open=False):
+                with gr.Row():
+                    width_input = gr.Slider(
+                        label="Width",
+                        minimum=512,
+                        maximum=1920,
+                        step=64,
+                        value=1280
+                    )
+                    height_input = gr.Slider(
+                        label="Height",
+                        minimum=512,
+                        maximum=1080,
+                        step=64,
+                        value=704
+                    )
+                num_frames_input = gr.Slider(
+                    label="Number of Frames (more frames = longer video)",
+                    minimum=25,
+                    maximum=145,
+                    step=24,
+                    value=73,
+                    info="73 frames ≈ 3 seconds at 24fps (optimized for Zero GPU limits)"
+                )
+                num_steps_input = gr.Slider(
+                    label="Inference Steps (more steps = better quality, slower)",
+                    minimum=20,
+                    maximum=60,
+                    step=5,
+                    value=35
+                )
+                guidance_scale_input = gr.Slider(
+                    label="Guidance Scale (higher = closer to prompt)",
+                    minimum=1.0,
+                    maximum=15.0,
+                    step=0.5,
+                    value=5.0
+                )
+                seed_input = gr.Number(
+                    label="Seed (-1 for random)",
+                    value=-1,
+                    precision=0
+                )
+            generate_btn = gr.Button("Generate Video", variant="primary", size="lg")
+        with gr.Column():
+            # Output
+            video_output = gr.Video(
+                label="Generated Video",
+                autoplay=True
+            )
+            status_output = gr.Textbox(
+                label="Status",
+                lines=2
+            )
+    # Examples
+    gr.Examples(
+        examples=[
+            [
+                "Two anthropomorphic cats in comfy boxing gear fight on stage",
+                None,
+                1280,
+                704,
+                73,
+                35,
+                5.0,
+                42
+            ],
+            [
+                "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully",
+                None,
+                1280,
+                704,
+                73,
+                35,
+                5.0,
+                123
+            ],
+            [
+                "A bustling futuristic city at night with neon lights and flying cars",
+                None,
+                1280,
+                704,
+                73,
+                35,
+                5.0,
+                456
+            ],
+            [
+                "A peaceful mountain landscape with snow-capped peaks and a flowing river",
+                None,
+                1280,
+                704,
+                73,
+                35,
+                5.0,
+                789
+            ],
+        ],
+        inputs=[
+            prompt_input,
+            image_input,
+            width_input,
+            height_input,
+            num_frames_input,
+            num_steps_input,
+            guidance_scale_input,
+            seed_input
+        ],
+        outputs=[video_output, status_output],
+        fn=generate_video,
+        cache_examples=False,
+    )
+    # Connect generate button
+    generate_btn.click(
+        fn=generate_video,
+        inputs=[
+            prompt_input,
+            image_input,
+            width_input,
+            height_input,
+            num_frames_input,
+            num_steps_input,
+            guidance_scale_input,
+            seed_input
+        ],
+        outputs=[video_output, status_output]
+    )
+    gr.Markdown(
+        """
+        ## Tips for Best Results:
+        - Use detailed, descriptive prompts
+        - For image-to-video: Upload a clear image that matches your prompt
+        - Higher inference steps = better quality but slower generation
+        - Adjust guidance scale to balance creativity vs. prompt adherence
+        - Use the same seed to reproduce results
+        - Keep generation under 3 minutes to fit Zero GPU limits
+        ## Model Information:
+        - Model: Wan2.2-TI2V-5B (5B parameters)
+        - Resolution: 720P (1280x704 or custom)
+        - Frame Rate: 24 fps
+        - Default Duration: 3 seconds (optimized for Zero GPU)
+        - Generation Time: ~2-3 minutes on Zero GPU (with optimized settings)
+        """
+    )
+# Launch the app
+if __name__ == "__main__":
+    demo.queue(max_size=20)
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,38 @@

+# CRITICAL: spaces must be imported first in app.py to avoid CUDA initialization issues
+# Hugging Face Spaces GPU support
+spaces>=0.30.0
+# Gradio for the UI - using latest version (security update required)
+gradio>=5.49.0
+# Core dependencies for Wan2.2 video generation
+torch>=2.5.0
+torchvision>=0.20.0
+numpy>=1.26.0
+pillow>=10.4.0
+# Diffusers - using main branch for latest Wan2.2 features
+git+https://github.com/huggingface/diffusers
+# Accelerate for optimization
+accelerate>=1.0.0
+# Additional dependencies for video processing
+opencv-python>=4.10.0
+av>=13.0.0
+imageio>=2.35.0
+imageio-ffmpeg>=0.5.0
+# Transformers for T5 text encoder
+transformers>=4.46.0
+# Safe tensor loading
+safetensors>=0.4.5
+# Model downloading
+huggingface-hub>=0.26.0
+# Additional optimization libraries
+sentencepiece>=0.2.0
+protobuf>=5.28.0