wan2-video-generation

Running

App Files Files Community

wan2-video-generation / README.md

Smikke

Deploy optimized Wan2.2 video generation with Zero GPU support

d16eb70 verified about 1 month ago

preview code

raw

history blame contribute delete

4.97 kB

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

metadata

title: Wan2.2 Video Generation
emoji: 🎥
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - video-generation
  - text-to-video
  - image-to-video
  - diffusers
  - wan
  - ai-video
  - zero-gpu
python_version: '3.10'

Wan2.2 Video Generation 🎥

Generate high-quality videos from text prompts or images using the powerful Wan2.2-TI2V-5B model!

This Space provides an easy-to-use interface for creating videos with state-of-the-art AI technology.

Features ✨

Text-to-Video: Generate videos from descriptive text prompts
Image-to-Video: Animate your images by adding an input image
High Quality: 720P resolution at 24fps
Customizable: Adjust resolution, number of frames, guidance scale, and more
Reproducible: Use seeds to recreate your favorite generations

Model Information 🤖

Wan2.2-TI2V-5B is a unified text-to-video and image-to-video generation model with:

5 billion parameters optimized for consumer-grade GPUs
720P resolution support (1280x704 default)
24 fps smooth video output
Optimized duration: Default 3 seconds (optimized for Zero GPU limits)

The model uses a Mixture-of-Experts (MoE) architecture and delivers outstanding video generation quality, surpassing many commercial models.

How to Use 🚀

Text-to-Video Generation

Enter your prompt describing the video you want to create
Adjust settings in "Advanced Settings" if desired
Click "Generate Video"
Wait for generation (typically 2-3 minutes on Zero GPU with default settings)

Image-to-Video Generation

Upload an input image
Enter a prompt describing how the image should animate
Click "Generate Video"
The output will maintain the aspect ratio of your input image
Generation takes 2-3 minutes with optimized settings

Advanced Settings ⚙️

Width/Height: Video resolution (default: 1280x704)
Number of Frames: Longer videos need more frames (default: 73 frames ≈ 3 seconds, max: 145)
Inference Steps: More steps = better quality but slower (default: 35, optimized for speed)
Guidance Scale: How closely to follow the prompt (default: 5.0)
Seed: Set a specific seed for reproducible results

Note: Settings are optimized to complete within Zero GPU's 3-minute time limit for Pro users.

Tips for Best Results 💡

Detailed Prompts: Be specific about what you want to see
- Good: "Two anthropomorphic cats in comfy boxing gear fight on stage with dramatic lighting"
- Basic: "cats fighting"
Image-to-Video: Use clear, high-quality input images that match your prompt
Quality vs Speed (optimized for Zero GPU limits):
- Fast: 25-30 steps (~2 minutes)
- Balanced: 35 steps (default, ~2-3 minutes)
- Higher Quality: 40-50 steps (~3+ minutes, may timeout)
Experiment: Try different guidance scales:
- Lower (3-4): More creative, less literal
- Default (5): Good balance
- Higher (7-10): Strictly follows prompt

Example Prompts 📝

"Two anthropomorphic cats in comfy boxing gear fight on stage"
"A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully"
"A bustling futuristic city at night with neon lights and flying cars"
"A peaceful mountain landscape with snow-capped peaks and a flowing river"
"An astronaut riding a horse through a nebula in deep space"
"A dragon flying over a medieval castle at sunset"

Technical Details 🔧

Model: Wan-AI/Wan2.2-TI2V-5B-Diffusers
Framework: Hugging Face Diffusers
Backend: PyTorch with bfloat16 precision
GPU: Hugging Face Zero GPU (H200 with 70GB VRAM, automatically allocated)
GPU Duration: 180 seconds (3 minutes) for Pro users
Generation Time: ~2-3 minutes with optimized settings (73 frames, 35 steps)

Limitations ⚠️

Generation requires compute time (2-3 minutes with default settings)
Zero GPU allocation is time-limited (3 minutes for Pro, 60 seconds for Free)
Videos longer than 6 seconds (145 frames) may timeout
Higher quality settings (50+ steps) may timeout on Zero GPU
Complex scenes with many objects may be challenging

Credits 🙏

Model: Wan-AI
Original Repository: Wan2.2
Framework: Hugging Face Diffusers

License 📄

This Space uses the Wan2.2 model which is released under Apache 2.0 license.