# Deployment Guide for Wan2.2 on Hugging Face Spaces

This guide explains how to deploy the Wan2.2 video generation model to Hugging Face Spaces with Zero GPU support.

## Prerequisites

1. A Hugging Face account (create one at https://huggingface.co/join)
2. Git installed on your local machine
3. Git LFS (Large File Storage) installed

## Deployment Steps

### Option 1: Deploy via Hugging Face Web Interface

1. **Create a New Space**
   - Go to https://huggingface.co/new-space
   - Choose a name for your Space (e.g., "wan2-video-gen")
   - Select "Gradio" as the SDK
   - Choose "Public" or "Private" visibility
   - Click "Create Space"

2. **Upload Files**
   - Use the web interface to upload files:
     - `app.py`
     - `requirements.txt`
     - `README.md`
     - `.gitignore`

3. **Enable Zero GPU**
   - In your Space settings, enable "Zero GPU"
   - This provides automatic GPU allocation during inference

4. **Wait for Build**
   - Hugging Face will automatically build your Space
   - This may take 10-15 minutes for the first build
   - Check the build logs for any errors

### Option 2: Deploy via Git (Recommended)

1. **Clone Your Space**
   ```bash
   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
   cd YOUR_SPACE_NAME
   ```

2. **Copy Files**
   ```bash
   # Copy all files from huggingface-wan2.2 directory
   cp /path/to/huggingface-wan2.2/* .
   ```

3. **Commit and Push**
   ```bash
   git add .
   git commit -m "Initial deployment of Wan2.2 video generation"
   git push
   ```

4. **Enable Zero GPU**
   - Go to your Space settings on Hugging Face
   - Navigate to "Settings" → "Zero GPU"
   - Enable Zero GPU support

### Option 3: Deploy from This Repository

If you've already cloned this repository:

```bash
cd /home/user/Kakka/huggingface-wan2.2

# Initialize git if not already done
git init

# Add Hugging Face Space as remote
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

# Commit files
git add .
git commit -m "Initial deployment of Wan2.2 video generation"

# Push to Hugging Face
git push hf main
```

## Configuration

### Zero GPU Settings

The app is configured to use Zero GPU with the following settings:
- **Duration**: 180 seconds (3 minutes) per generation
- **Allocation**: Automatic (triggered by generation request)
- **Optimized defaults**: Reduced frames (73) and steps (35) to fit within time limit

This is configured in `app.py` with the decorator:
```python
@spaces.GPU(duration=180)  # 3 minutes max for Pro accounts
```

**Important**: Even with Pro subscription, the maximum GPU duration is limited to 180 seconds (3 minutes). The default settings have been optimized to complete generation within this time:
- Default frames: 73 (3 seconds of video at 24fps)
- Default inference steps: 35 (balanced speed/quality)
- Maximum frames slider: 145 (6 seconds)
- Maximum inference steps: 60

### Memory Requirements

The Wan2.2-TI2V-5B model requires:
- **Minimum**: 24GB VRAM
- **Recommended**: 40GB+ VRAM for Zero GPU

Zero GPU on Hugging Face Spaces provides sufficient VRAM for this model (H200 GPU with 70GB).

## Testing Your Deployment

1. **Wait for Build to Complete**
   - Check the build logs in your Space
   - Wait for "Running" status

2. **Test Basic Generation**
   - Try the default example: "Two anthropomorphic cats in comfy boxing gear fight on stage"
   - Generation should take 5-10 minutes

3. **Test Image-to-Video**
   - Upload a test image
   - Add a descriptive prompt
   - Verify video generation works

## Troubleshooting

### Critical: Import Order Issue

**Issue**: `RuntimeError: CUDA has been initialized before importing the 'spaces' package`

**Solution**: This is CRITICAL! The `spaces` package MUST be imported BEFORE any CUDA-related packages (torch, diffusers, etc.)

**Correct import order in app.py:**
```python
# IMPORTANT: spaces must be imported first
import spaces

# Standard library imports
import os

# Third-party imports (non-CUDA)
import numpy as np
from PIL import Image
import gradio as gr

# CUDA-related imports (must come after spaces)
import torch
from diffusers import WanPipeline, AutoencoderKLWan
```

**Why this matters**: Hugging Face Zero GPU needs to manage CUDA initialization. If torch or other CUDA libraries initialize CUDA before `spaces` is imported, Zero GPU cannot properly manage GPU allocation.

### Build Fails

**Issue**: Requirements installation fails
- **Solution**: Check `requirements.txt` for compatibility issues
- Ensure PyTorch version is compatible with CUDA on Zero GPU
- Make sure using latest Gradio version (5.49.0+) for security

**Issue**: Out of memory during build
- **Solution**: Zero GPU should have enough memory; check model loading code

**Issue**: "Can't initialize NVML" warnings
- **Solution**: These are normal in Zero GPU environment during build time
- They should not affect runtime when GPU is allocated

### Runtime Errors

**Issue**: "CUDA out of memory"
- **Solution**: Reduce `num_frames` or image resolution
- Check if Zero GPU is properly enabled in settings

**Issue**: "Model not found"
- **Solution**: Verify internet connection for model download
- Check Hugging Face Hub status

**Issue**: Generation timeout
- **Solution**: Reduce inference steps or video length
- Increase GPU duration in `@spaces.GPU(duration=XX)`

**Issue**: Gradio security vulnerability warning
- **Solution**: Update to Gradio 5.49.0 or later in requirements.txt
- Check README.md YAML front matter has correct `sdk_version: 5.49.0`

**Issue**: "ZeroGPU illegal duration! The requested GPU duration (Xs) is larger than the maximum allowed"
- **Solution**: Reduce the duration parameter in `@spaces.GPU(duration=XX)`
- For Pro accounts, use 180 seconds or less: `@spaces.GPU(duration=180)`
- Free tier typically limited to 60 seconds
- Optimize your default settings to complete within the time limit:
  - Reduce `num_frames` (e.g., 73 for 3 seconds instead of 121 for 5 seconds)
  - Reduce `num_inference_steps` (e.g., 35 instead of 50)

### Slow Generation

**Issue**: Generation takes too long
- **Solution**: This is expected; video generation is compute-intensive
- Typical time: 2-3 minutes for 3-second video with optimized settings (73 frames, 35 steps)
- Consider reducing `num_inference_steps` to 25-30 for faster (but lower quality) results
- Note: Must complete within 180 seconds (3 minutes) for Pro, 60 seconds for Free tier

## Optimization Tips

1. **Current Optimized Settings**
   - Already optimized: `num_frames=73` (3 seconds) and `num_inference_steps=35`
   - These settings are designed to complete within 180-second Zero GPU limit
   - For even faster testing, reduce steps to 25-30

2. **Add Caching (Optional)**
   - Enable example caching with `cache_examples=True` to pre-generate examples
   - Note: This increases build time and storage requirements
   - Current setting: `cache_examples=False` for faster builds

3. **Queue Management**
   - Current setting: `demo.queue(max_size=20)`
   - Adjust based on expected traffic
   - Larger queue = more concurrent users but more resource usage

## Customization

### Change Default Model

To use a different Wan2.2 variant, modify `app.py`:

```python
# For larger model with better quality
MODEL_ID = "Wan-AI/Wan2.2-T2V-A14B-Diffusers"

# For image-to-video focused
MODEL_ID = "Wan-AI/Wan2.2-I2V-A14B-Diffusers"
```

### Adjust UI

Modify the Gradio interface in `app.py`:
- Change default values in sliders
- Add more examples
- Customize theme and styling

### Add Features

Consider adding:
- Video upscaling
- Multiple video outputs
- Batch generation
- Download history
- Custom aspect ratios

## Monitoring

### Check Space Status
- Visit your Space URL
- Check "Settings" → "Logs" for runtime logs
- Monitor usage in "Settings" → "Analytics"

### Usage Limits

Zero GPU on Hugging Face has:
- Time limits per session
- Concurrent user limits
- Monthly compute quotas (check your tier)

## Support

If you encounter issues:

1. **Check Logs**: Space logs often contain error details
2. **Hugging Face Forums**: https://discuss.huggingface.co/
3. **Model Issues**: Report at Wan-AI's GitHub or model card
4. **Space Settings**: Verify Zero GPU is enabled and quota is available

## License

This deployment uses:
- Wan2.2 model (Apache 2.0)
- Gradio (Apache 2.0)
- Diffusers (Apache 2.0)

Ensure compliance with all licenses when deploying.

---

**Happy Deploying!** 🚀