Smikke commited on
Commit
d16eb70
·
verified ·
1 Parent(s): c0e1c6a

Deploy optimized Wan2.2 video generation with Zero GPU support

Browse files
Files changed (5) hide show
  1. .gitignore +66 -0
  2. DEPLOYMENT.md +285 -0
  3. README.md +130 -6
  4. app.py +296 -0
  5. requirements.txt +38 -0
.gitignore ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual environments
24
+ venv/
25
+ env/
26
+ ENV/
27
+ .venv
28
+
29
+ # IDE
30
+ .vscode/
31
+ .idea/
32
+ *.swp
33
+ *.swo
34
+ *~
35
+
36
+ # OS
37
+ .DS_Store
38
+ Thumbs.db
39
+
40
+ # Gradio
41
+ gradio_cached_examples/
42
+ flagged/
43
+
44
+ # Model outputs
45
+ output.mp4
46
+ *.mp4
47
+ *.avi
48
+ *.mov
49
+ outputs/
50
+
51
+ # Hugging Face cache
52
+ .cache/
53
+ models/
54
+
55
+ # Environment variables
56
+ .env
57
+ .env.local
58
+
59
+ # Logs
60
+ logs/
61
+ *.log
62
+
63
+ # Temporary files
64
+ tmp/
65
+ temp/
66
+ *.tmp
DEPLOYMENT.md ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Guide for Wan2.2 on Hugging Face Spaces
2
+
3
+ This guide explains how to deploy the Wan2.2 video generation model to Hugging Face Spaces with Zero GPU support.
4
+
5
+ ## Prerequisites
6
+
7
+ 1. A Hugging Face account (create one at https://huggingface.co/join)
8
+ 2. Git installed on your local machine
9
+ 3. Git LFS (Large File Storage) installed
10
+
11
+ ## Deployment Steps
12
+
13
+ ### Option 1: Deploy via Hugging Face Web Interface
14
+
15
+ 1. **Create a New Space**
16
+ - Go to https://huggingface.co/new-space
17
+ - Choose a name for your Space (e.g., "wan2-video-gen")
18
+ - Select "Gradio" as the SDK
19
+ - Choose "Public" or "Private" visibility
20
+ - Click "Create Space"
21
+
22
+ 2. **Upload Files**
23
+ - Use the web interface to upload files:
24
+ - `app.py`
25
+ - `requirements.txt`
26
+ - `README.md`
27
+ - `.gitignore`
28
+
29
+ 3. **Enable Zero GPU**
30
+ - In your Space settings, enable "Zero GPU"
31
+ - This provides automatic GPU allocation during inference
32
+
33
+ 4. **Wait for Build**
34
+ - Hugging Face will automatically build your Space
35
+ - This may take 10-15 minutes for the first build
36
+ - Check the build logs for any errors
37
+
38
+ ### Option 2: Deploy via Git (Recommended)
39
+
40
+ 1. **Clone Your Space**
41
+ ```bash
42
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
43
+ cd YOUR_SPACE_NAME
44
+ ```
45
+
46
+ 2. **Copy Files**
47
+ ```bash
48
+ # Copy all files from huggingface-wan2.2 directory
49
+ cp /path/to/huggingface-wan2.2/* .
50
+ ```
51
+
52
+ 3. **Commit and Push**
53
+ ```bash
54
+ git add .
55
+ git commit -m "Initial deployment of Wan2.2 video generation"
56
+ git push
57
+ ```
58
+
59
+ 4. **Enable Zero GPU**
60
+ - Go to your Space settings on Hugging Face
61
+ - Navigate to "Settings" → "Zero GPU"
62
+ - Enable Zero GPU support
63
+
64
+ ### Option 3: Deploy from This Repository
65
+
66
+ If you've already cloned this repository:
67
+
68
+ ```bash
69
+ cd /home/user/Kakka/huggingface-wan2.2
70
+
71
+ # Initialize git if not already done
72
+ git init
73
+
74
+ # Add Hugging Face Space as remote
75
+ git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
76
+
77
+ # Commit files
78
+ git add .
79
+ git commit -m "Initial deployment of Wan2.2 video generation"
80
+
81
+ # Push to Hugging Face
82
+ git push hf main
83
+ ```
84
+
85
+ ## Configuration
86
+
87
+ ### Zero GPU Settings
88
+
89
+ The app is configured to use Zero GPU with the following settings:
90
+ - **Duration**: 180 seconds (3 minutes) per generation
91
+ - **Allocation**: Automatic (triggered by generation request)
92
+ - **Optimized defaults**: Reduced frames (73) and steps (35) to fit within time limit
93
+
94
+ This is configured in `app.py` with the decorator:
95
+ ```python
96
+ @spaces.GPU(duration=180) # 3 minutes max for Pro accounts
97
+ ```
98
+
99
+ **Important**: Even with Pro subscription, the maximum GPU duration is limited to 180 seconds (3 minutes). The default settings have been optimized to complete generation within this time:
100
+ - Default frames: 73 (3 seconds of video at 24fps)
101
+ - Default inference steps: 35 (balanced speed/quality)
102
+ - Maximum frames slider: 145 (6 seconds)
103
+ - Maximum inference steps: 60
104
+
105
+ ### Memory Requirements
106
+
107
+ The Wan2.2-TI2V-5B model requires:
108
+ - **Minimum**: 24GB VRAM
109
+ - **Recommended**: 40GB+ VRAM for Zero GPU
110
+
111
+ Zero GPU on Hugging Face Spaces provides sufficient VRAM for this model (H200 GPU with 70GB).
112
+
113
+ ## Testing Your Deployment
114
+
115
+ 1. **Wait for Build to Complete**
116
+ - Check the build logs in your Space
117
+ - Wait for "Running" status
118
+
119
+ 2. **Test Basic Generation**
120
+ - Try the default example: "Two anthropomorphic cats in comfy boxing gear fight on stage"
121
+ - Generation should take 5-10 minutes
122
+
123
+ 3. **Test Image-to-Video**
124
+ - Upload a test image
125
+ - Add a descriptive prompt
126
+ - Verify video generation works
127
+
128
+ ## Troubleshooting
129
+
130
+ ### Critical: Import Order Issue
131
+
132
+ **Issue**: `RuntimeError: CUDA has been initialized before importing the 'spaces' package`
133
+
134
+ **Solution**: This is CRITICAL! The `spaces` package MUST be imported BEFORE any CUDA-related packages (torch, diffusers, etc.)
135
+
136
+ **Correct import order in app.py:**
137
+ ```python
138
+ # IMPORTANT: spaces must be imported first
139
+ import spaces
140
+
141
+ # Standard library imports
142
+ import os
143
+
144
+ # Third-party imports (non-CUDA)
145
+ import numpy as np
146
+ from PIL import Image
147
+ import gradio as gr
148
+
149
+ # CUDA-related imports (must come after spaces)
150
+ import torch
151
+ from diffusers import WanPipeline, AutoencoderKLWan
152
+ ```
153
+
154
+ **Why this matters**: Hugging Face Zero GPU needs to manage CUDA initialization. If torch or other CUDA libraries initialize CUDA before `spaces` is imported, Zero GPU cannot properly manage GPU allocation.
155
+
156
+ ### Build Fails
157
+
158
+ **Issue**: Requirements installation fails
159
+ - **Solution**: Check `requirements.txt` for compatibility issues
160
+ - Ensure PyTorch version is compatible with CUDA on Zero GPU
161
+ - Make sure using latest Gradio version (5.49.0+) for security
162
+
163
+ **Issue**: Out of memory during build
164
+ - **Solution**: Zero GPU should have enough memory; check model loading code
165
+
166
+ **Issue**: "Can't initialize NVML" warnings
167
+ - **Solution**: These are normal in Zero GPU environment during build time
168
+ - They should not affect runtime when GPU is allocated
169
+
170
+ ### Runtime Errors
171
+
172
+ **Issue**: "CUDA out of memory"
173
+ - **Solution**: Reduce `num_frames` or image resolution
174
+ - Check if Zero GPU is properly enabled in settings
175
+
176
+ **Issue**: "Model not found"
177
+ - **Solution**: Verify internet connection for model download
178
+ - Check Hugging Face Hub status
179
+
180
+ **Issue**: Generation timeout
181
+ - **Solution**: Reduce inference steps or video length
182
+ - Increase GPU duration in `@spaces.GPU(duration=XX)`
183
+
184
+ **Issue**: Gradio security vulnerability warning
185
+ - **Solution**: Update to Gradio 5.49.0 or later in requirements.txt
186
+ - Check README.md YAML front matter has correct `sdk_version: 5.49.0`
187
+
188
+ **Issue**: "ZeroGPU illegal duration! The requested GPU duration (Xs) is larger than the maximum allowed"
189
+ - **Solution**: Reduce the duration parameter in `@spaces.GPU(duration=XX)`
190
+ - For Pro accounts, use 180 seconds or less: `@spaces.GPU(duration=180)`
191
+ - Free tier typically limited to 60 seconds
192
+ - Optimize your default settings to complete within the time limit:
193
+ - Reduce `num_frames` (e.g., 73 for 3 seconds instead of 121 for 5 seconds)
194
+ - Reduce `num_inference_steps` (e.g., 35 instead of 50)
195
+
196
+ ### Slow Generation
197
+
198
+ **Issue**: Generation takes too long
199
+ - **Solution**: This is expected; video generation is compute-intensive
200
+ - Typical time: 2-3 minutes for 3-second video with optimized settings (73 frames, 35 steps)
201
+ - Consider reducing `num_inference_steps` to 25-30 for faster (but lower quality) results
202
+ - Note: Must complete within 180 seconds (3 minutes) for Pro, 60 seconds for Free tier
203
+
204
+ ## Optimization Tips
205
+
206
+ 1. **Current Optimized Settings**
207
+ - Already optimized: `num_frames=73` (3 seconds) and `num_inference_steps=35`
208
+ - These settings are designed to complete within 180-second Zero GPU limit
209
+ - For even faster testing, reduce steps to 25-30
210
+
211
+ 2. **Add Caching (Optional)**
212
+ - Enable example caching with `cache_examples=True` to pre-generate examples
213
+ - Note: This increases build time and storage requirements
214
+ - Current setting: `cache_examples=False` for faster builds
215
+
216
+ 3. **Queue Management**
217
+ - Current setting: `demo.queue(max_size=20)`
218
+ - Adjust based on expected traffic
219
+ - Larger queue = more concurrent users but more resource usage
220
+
221
+ ## Customization
222
+
223
+ ### Change Default Model
224
+
225
+ To use a different Wan2.2 variant, modify `app.py`:
226
+
227
+ ```python
228
+ # For larger model with better quality
229
+ MODEL_ID = "Wan-AI/Wan2.2-T2V-A14B-Diffusers"
230
+
231
+ # For image-to-video focused
232
+ MODEL_ID = "Wan-AI/Wan2.2-I2V-A14B-Diffusers"
233
+ ```
234
+
235
+ ### Adjust UI
236
+
237
+ Modify the Gradio interface in `app.py`:
238
+ - Change default values in sliders
239
+ - Add more examples
240
+ - Customize theme and styling
241
+
242
+ ### Add Features
243
+
244
+ Consider adding:
245
+ - Video upscaling
246
+ - Multiple video outputs
247
+ - Batch generation
248
+ - Download history
249
+ - Custom aspect ratios
250
+
251
+ ## Monitoring
252
+
253
+ ### Check Space Status
254
+ - Visit your Space URL
255
+ - Check "Settings" → "Logs" for runtime logs
256
+ - Monitor usage in "Settings" → "Analytics"
257
+
258
+ ### Usage Limits
259
+
260
+ Zero GPU on Hugging Face has:
261
+ - Time limits per session
262
+ - Concurrent user limits
263
+ - Monthly compute quotas (check your tier)
264
+
265
+ ## Support
266
+
267
+ If you encounter issues:
268
+
269
+ 1. **Check Logs**: Space logs often contain error details
270
+ 2. **Hugging Face Forums**: https://discuss.huggingface.co/
271
+ 3. **Model Issues**: Report at Wan-AI's GitHub or model card
272
+ 4. **Space Settings**: Verify Zero GPU is enabled and quota is available
273
+
274
+ ## License
275
+
276
+ This deployment uses:
277
+ - Wan2.2 model (Apache 2.0)
278
+ - Gradio (Apache 2.0)
279
+ - Diffusers (Apache 2.0)
280
+
281
+ Ensure compliance with all licenses when deploying.
282
+
283
+ ---
284
+
285
+ **Happy Deploying!** 🚀
README.md CHANGED
@@ -1,12 +1,136 @@
1
  ---
2
- title: Wan2 Video Generation
3
- emoji: 📉
4
- colorFrom: indigo
5
- colorTo: blue
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Wan2.2 Video Generation
3
+ emoji: 🎥
4
+ colorFrom: purple
5
+ colorTo: pink
6
  sdk: gradio
7
+ sdk_version: 5.49.0
8
  app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
+ tags:
12
+ - video-generation
13
+ - text-to-video
14
+ - image-to-video
15
+ - diffusers
16
+ - wan
17
+ - ai-video
18
+ - zero-gpu
19
+ python_version: "3.10"
20
  ---
21
 
22
+ # Wan2.2 Video Generation 🎥
23
+
24
+ Generate high-quality videos from text prompts or images using the powerful **Wan2.2-TI2V-5B** model!
25
+
26
+ This Space provides an easy-to-use interface for creating videos with state-of-the-art AI technology.
27
+
28
+ ## Features ✨
29
+
30
+ - **Text-to-Video**: Generate videos from descriptive text prompts
31
+ - **Image-to-Video**: Animate your images by adding an input image
32
+ - **High Quality**: 720P resolution at 24fps
33
+ - **Customizable**: Adjust resolution, number of frames, guidance scale, and more
34
+ - **Reproducible**: Use seeds to recreate your favorite generations
35
+
36
+ ## Model Information 🤖
37
+
38
+ **Wan2.2-TI2V-5B** is a unified text-to-video and image-to-video generation model with:
39
+
40
+ - **5 billion parameters** optimized for consumer-grade GPUs
41
+ - **720P resolution** support (1280x704 default)
42
+ - **24 fps** smooth video output
43
+ - **Optimized duration**: Default 3 seconds (optimized for Zero GPU limits)
44
+
45
+ The model uses a Mixture-of-Experts (MoE) architecture and delivers outstanding video generation quality, surpassing many commercial models.
46
+
47
+ ## How to Use 🚀
48
+
49
+ ### Text-to-Video Generation
50
+
51
+ 1. Enter your prompt describing the video you want to create
52
+ 2. Adjust settings in "Advanced Settings" if desired
53
+ 3. Click "Generate Video"
54
+ 4. Wait for generation (typically 2-3 minutes on Zero GPU with default settings)
55
+
56
+ ### Image-to-Video Generation
57
+
58
+ 1. Upload an input image
59
+ 2. Enter a prompt describing how the image should animate
60
+ 3. Click "Generate Video"
61
+ 4. The output will maintain the aspect ratio of your input image
62
+ 5. Generation takes 2-3 minutes with optimized settings
63
+
64
+ ## Advanced Settings ⚙️
65
+
66
+ - **Width/Height**: Video resolution (default: 1280x704)
67
+ - **Number of Frames**: Longer videos need more frames (default: 73 frames ≈ 3 seconds, max: 145)
68
+ - **Inference Steps**: More steps = better quality but slower (default: 35, optimized for speed)
69
+ - **Guidance Scale**: How closely to follow the prompt (default: 5.0)
70
+ - **Seed**: Set a specific seed for reproducible results
71
+
72
+ **Note**: Settings are optimized to complete within Zero GPU's 3-minute time limit for Pro users.
73
+
74
+ ## Tips for Best Results 💡
75
+
76
+ 1. **Detailed Prompts**: Be specific about what you want to see
77
+ - Good: "Two anthropomorphic cats in comfy boxing gear fight on stage with dramatic lighting"
78
+ - Basic: "cats fighting"
79
+
80
+ 2. **Image-to-Video**: Use clear, high-quality input images that match your prompt
81
+
82
+ 3. **Quality vs Speed** (optimized for Zero GPU limits):
83
+ - Fast: 25-30 steps (~2 minutes)
84
+ - Balanced: 35 steps (default, ~2-3 minutes)
85
+ - Higher Quality: 40-50 steps (~3+ minutes, may timeout)
86
+
87
+ 4. **Experiment**: Try different guidance scales:
88
+ - Lower (3-4): More creative, less literal
89
+ - Default (5): Good balance
90
+ - Higher (7-10): Strictly follows prompt
91
+
92
+ ## Example Prompts 📝
93
+
94
+ - "Two anthropomorphic cats in comfy boxing gear fight on stage"
95
+ - "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully"
96
+ - "A bustling futuristic city at night with neon lights and flying cars"
97
+ - "A peaceful mountain landscape with snow-capped peaks and a flowing river"
98
+ - "An astronaut riding a horse through a nebula in deep space"
99
+ - "A dragon flying over a medieval castle at sunset"
100
+
101
+ ## Technical Details 🔧
102
+
103
+ - **Model**: Wan-AI/Wan2.2-TI2V-5B-Diffusers
104
+ - **Framework**: Hugging Face Diffusers
105
+ - **Backend**: PyTorch with bfloat16 precision
106
+ - **GPU**: Hugging Face Zero GPU (H200 with 70GB VRAM, automatically allocated)
107
+ - **GPU Duration**: 180 seconds (3 minutes) for Pro users
108
+ - **Generation Time**: ~2-3 minutes with optimized settings (73 frames, 35 steps)
109
+
110
+ ## Limitations ⚠️
111
+
112
+ - Generation requires compute time (2-3 minutes with default settings)
113
+ - Zero GPU allocation is time-limited (3 minutes for Pro, 60 seconds for Free)
114
+ - Videos longer than 6 seconds (145 frames) may timeout
115
+ - Higher quality settings (50+ steps) may timeout on Zero GPU
116
+ - Complex scenes with many objects may be challenging
117
+
118
+ ## Credits 🙏
119
+
120
+ - **Model**: [Wan-AI](https://huggingface.co/Wan-AI)
121
+ - **Original Repository**: [Wan2.2](https://github.com/Wan-Video/Wan2.2)
122
+ - **Framework**: [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
123
+
124
+ ## License 📄
125
+
126
+ This Space uses the Wan2.2 model which is released under Apache 2.0 license.
127
+
128
+ ## Related Links 🔗
129
+
130
+ - [Wan-AI on Hugging Face](https://huggingface.co/Wan-AI)
131
+ - [Original Model Card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
132
+ - [Diffusers Documentation](https://huggingface.co/docs/diffusers)
133
+
134
+ ---
135
+
136
+ **Note**: This is a community-created Space for easy access to Wan2.2 video generation. Generation times may vary based on current GPU availability.
app.py ADDED
@@ -0,0 +1,296 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # IMPORTANT: spaces must be imported first to avoid CUDA initialization issues
2
+ import spaces
3
+
4
+ # Standard library imports
5
+ import os
6
+
7
+ # Third-party imports (non-CUDA)
8
+ import numpy as np
9
+ from PIL import Image
10
+ import gradio as gr
11
+
12
+ # CUDA-related imports (must come after spaces)
13
+ import torch
14
+ from diffusers import WanPipeline, AutoencoderKLWan
15
+ from diffusers.utils import export_to_video
16
+
17
+ # Model configuration
18
+ MODEL_ID = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
19
+ dtype = torch.bfloat16
20
+ device = "cuda" if torch.cuda.is_available() else "cpu"
21
+
22
+ # Global pipeline variable
23
+ pipe = None
24
+
25
+ def initialize_pipeline():
26
+ """Initialize the Wan2.2 pipeline"""
27
+ global pipe
28
+ if pipe is None:
29
+ print("Loading Wan2.2-TI2V-5B model...")
30
+ vae = AutoencoderKLWan.from_pretrained(
31
+ MODEL_ID,
32
+ subfolder="vae",
33
+ torch_dtype=torch.float32
34
+ )
35
+ pipe = WanPipeline.from_pretrained(
36
+ MODEL_ID,
37
+ vae=vae,
38
+ torch_dtype=dtype
39
+ )
40
+ pipe.to(device)
41
+ print("Model loaded successfully!")
42
+ return pipe
43
+
44
+ @spaces.GPU(duration=180) # Allocate GPU for 3 minutes (max allowed for Pro)
45
+ def generate_video(
46
+ prompt: str,
47
+ image: Image.Image = None,
48
+ width: int = 1280,
49
+ height: int = 704,
50
+ num_frames: int = 73,
51
+ num_inference_steps: int = 35,
52
+ guidance_scale: float = 5.0,
53
+ seed: int = -1
54
+ ):
55
+ """
56
+ Generate video from text prompt and optional image
57
+
58
+ Args:
59
+ prompt: Text description of the video to generate
60
+ image: Optional input image for image-to-video generation
61
+ width: Video width (default: 1280)
62
+ height: Video height (default: 704)
63
+ num_frames: Number of frames to generate (default: 73 for 3 seconds at 24fps)
64
+ num_inference_steps: Number of denoising steps (default: 35 for faster generation)
65
+ guidance_scale: Guidance scale for generation (default: 5.0)
66
+ seed: Random seed for reproducibility (-1 for random)
67
+ """
68
+ try:
69
+ # Initialize pipeline
70
+ pipeline = initialize_pipeline()
71
+
72
+ # Set seed for reproducibility
73
+ if seed == -1:
74
+ seed = torch.randint(0, 2**32 - 1, (1,)).item()
75
+ generator = torch.Generator(device=device).manual_seed(seed)
76
+
77
+ # Prepare generation parameters
78
+ gen_params = {
79
+ "prompt": prompt,
80
+ "height": height,
81
+ "width": width,
82
+ "num_frames": num_frames,
83
+ "guidance_scale": guidance_scale,
84
+ "num_inference_steps": num_inference_steps,
85
+ "generator": generator,
86
+ }
87
+
88
+ # Add image if provided (for image-to-video)
89
+ if image is not None:
90
+ gen_params["image"] = image
91
+
92
+ # Generate video
93
+ print(f"Generating video with prompt: {prompt}")
94
+ print(f"Parameters: {width}x{height}, {num_frames} frames, seed: {seed}")
95
+
96
+ output = pipeline(**gen_params).frames[0]
97
+
98
+ # Export to video file
99
+ output_path = "output.mp4"
100
+ export_to_video(output, output_path, fps=24)
101
+
102
+ return output_path, f"Video generated successfully! Seed used: {seed}"
103
+
104
+ except Exception as e:
105
+ error_msg = f"Error generating video: {str(e)}"
106
+ print(error_msg)
107
+ return None, error_msg
108
+
109
+ # Create Gradio interface
110
+ with gr.Blocks(title="Wan2.2 Video Generation") as demo:
111
+ gr.Markdown(
112
+ """
113
+ # Wan2.2 Video Generation
114
+
115
+ Generate high-quality videos from text prompts or images using Wan2.2-TI2V-5B model.
116
+ This model supports both **Text-to-Video** and **Image-to-Video** generation at 720P/24fps.
117
+
118
+ **Note:** Generation takes 2-3 minutes. Settings are optimized for Zero GPU 3-minute limit.
119
+ """
120
+ )
121
+
122
+ with gr.Row():
123
+ with gr.Column():
124
+ # Input controls
125
+ prompt_input = gr.Textbox(
126
+ label="Prompt",
127
+ placeholder="Describe the video you want to generate...",
128
+ lines=3,
129
+ value="Two anthropomorphic cats in comfy boxing gear fight on stage"
130
+ )
131
+
132
+ image_input = gr.Image(
133
+ label="Input Image (Optional - for Image-to-Video)",
134
+ type="pil",
135
+ sources=["upload"]
136
+ )
137
+
138
+ with gr.Accordion("Advanced Settings", open=False):
139
+ with gr.Row():
140
+ width_input = gr.Slider(
141
+ label="Width",
142
+ minimum=512,
143
+ maximum=1920,
144
+ step=64,
145
+ value=1280
146
+ )
147
+ height_input = gr.Slider(
148
+ label="Height",
149
+ minimum=512,
150
+ maximum=1080,
151
+ step=64,
152
+ value=704
153
+ )
154
+
155
+ num_frames_input = gr.Slider(
156
+ label="Number of Frames (more frames = longer video)",
157
+ minimum=25,
158
+ maximum=145,
159
+ step=24,
160
+ value=73,
161
+ info="73 frames ≈ 3 seconds at 24fps (optimized for Zero GPU limits)"
162
+ )
163
+
164
+ num_steps_input = gr.Slider(
165
+ label="Inference Steps (more steps = better quality, slower)",
166
+ minimum=20,
167
+ maximum=60,
168
+ step=5,
169
+ value=35
170
+ )
171
+
172
+ guidance_scale_input = gr.Slider(
173
+ label="Guidance Scale (higher = closer to prompt)",
174
+ minimum=1.0,
175
+ maximum=15.0,
176
+ step=0.5,
177
+ value=5.0
178
+ )
179
+
180
+ seed_input = gr.Number(
181
+ label="Seed (-1 for random)",
182
+ value=-1,
183
+ precision=0
184
+ )
185
+
186
+ generate_btn = gr.Button("Generate Video", variant="primary", size="lg")
187
+
188
+ with gr.Column():
189
+ # Output
190
+ video_output = gr.Video(
191
+ label="Generated Video",
192
+ autoplay=True
193
+ )
194
+ status_output = gr.Textbox(
195
+ label="Status",
196
+ lines=2
197
+ )
198
+
199
+ # Examples
200
+ gr.Examples(
201
+ examples=[
202
+ [
203
+ "Two anthropomorphic cats in comfy boxing gear fight on stage",
204
+ None,
205
+ 1280,
206
+ 704,
207
+ 73,
208
+ 35,
209
+ 5.0,
210
+ 42
211
+ ],
212
+ [
213
+ "A serene underwater scene with colorful coral reefs and tropical fish swimming gracefully",
214
+ None,
215
+ 1280,
216
+ 704,
217
+ 73,
218
+ 35,
219
+ 5.0,
220
+ 123
221
+ ],
222
+ [
223
+ "A bustling futuristic city at night with neon lights and flying cars",
224
+ None,
225
+ 1280,
226
+ 704,
227
+ 73,
228
+ 35,
229
+ 5.0,
230
+ 456
231
+ ],
232
+ [
233
+ "A peaceful mountain landscape with snow-capped peaks and a flowing river",
234
+ None,
235
+ 1280,
236
+ 704,
237
+ 73,
238
+ 35,
239
+ 5.0,
240
+ 789
241
+ ],
242
+ ],
243
+ inputs=[
244
+ prompt_input,
245
+ image_input,
246
+ width_input,
247
+ height_input,
248
+ num_frames_input,
249
+ num_steps_input,
250
+ guidance_scale_input,
251
+ seed_input
252
+ ],
253
+ outputs=[video_output, status_output],
254
+ fn=generate_video,
255
+ cache_examples=False,
256
+ )
257
+
258
+ # Connect generate button
259
+ generate_btn.click(
260
+ fn=generate_video,
261
+ inputs=[
262
+ prompt_input,
263
+ image_input,
264
+ width_input,
265
+ height_input,
266
+ num_frames_input,
267
+ num_steps_input,
268
+ guidance_scale_input,
269
+ seed_input
270
+ ],
271
+ outputs=[video_output, status_output]
272
+ )
273
+
274
+ gr.Markdown(
275
+ """
276
+ ## Tips for Best Results:
277
+ - Use detailed, descriptive prompts
278
+ - For image-to-video: Upload a clear image that matches your prompt
279
+ - Higher inference steps = better quality but slower generation
280
+ - Adjust guidance scale to balance creativity vs. prompt adherence
281
+ - Use the same seed to reproduce results
282
+ - Keep generation under 3 minutes to fit Zero GPU limits
283
+
284
+ ## Model Information:
285
+ - Model: Wan2.2-TI2V-5B (5B parameters)
286
+ - Resolution: 720P (1280x704 or custom)
287
+ - Frame Rate: 24 fps
288
+ - Default Duration: 3 seconds (optimized for Zero GPU)
289
+ - Generation Time: ~2-3 minutes on Zero GPU (with optimized settings)
290
+ """
291
+ )
292
+
293
+ # Launch the app
294
+ if __name__ == "__main__":
295
+ demo.queue(max_size=20)
296
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CRITICAL: spaces must be imported first in app.py to avoid CUDA initialization issues
2
+
3
+ # Hugging Face Spaces GPU support
4
+ spaces>=0.30.0
5
+
6
+ # Gradio for the UI - using latest version (security update required)
7
+ gradio>=5.49.0
8
+
9
+ # Core dependencies for Wan2.2 video generation
10
+ torch>=2.5.0
11
+ torchvision>=0.20.0
12
+ numpy>=1.26.0
13
+ pillow>=10.4.0
14
+
15
+ # Diffusers - using main branch for latest Wan2.2 features
16
+ git+https://github.com/huggingface/diffusers
17
+
18
+ # Accelerate for optimization
19
+ accelerate>=1.0.0
20
+
21
+ # Additional dependencies for video processing
22
+ opencv-python>=4.10.0
23
+ av>=13.0.0
24
+ imageio>=2.35.0
25
+ imageio-ffmpeg>=0.5.0
26
+
27
+ # Transformers for T5 text encoder
28
+ transformers>=4.46.0
29
+
30
+ # Safe tensor loading
31
+ safetensors>=0.4.5
32
+
33
+ # Model downloading
34
+ huggingface-hub>=0.26.0
35
+
36
+ # Additional optimization libraries
37
+ sentencepiece>=0.2.0
38
+ protobuf>=5.28.0