---
license: other
library_name: diffusers
pipeline_tag: image-to-video
tags:
  - wan
  - image-to-video
  - video-generation
  - gguf
---

<!-- README Version: v1.3 -->

# Wan 2.2 Image-to-Video (I2V-A14B) - GGUF FP16 Quantized Models

This repository contains GGUF quantized versions of the **Wan 2.2 Image-to-Video A14B** model, optimized for efficient inference with reduced VRAM requirements while maintaining high-quality video generation capabilities.

## Model Description

Wan 2.2 is an advanced large-scale video generative model that uses a **Mixture-of-Experts (MoE) architecture** specifically designed for image-to-video synthesis. The A14B variant features a dual-expert design with approximately 14 billion parameters per expert:

- **High-Noise Expert**: Optimized for early denoising stages, focusing on overall layout and composition
- **Low-Noise Expert**: Specialized for later denoising stages, refining video details and quality

The model generates videos at **480P and 720P resolutions** from static images, with support for text-guided prompts to control the generation process. Wan 2.2 incorporates meticulously curated aesthetic data with detailed labels for lighting, composition, contrast, and color tone, enabling precise cinematic-style video generation.

## Repository Contents

This repository contains three GGUF model files optimized for different use cases:

```
diffusion_models/wan/
├── wan22-i2v-a14b-high.gguf           (15 GB)  - Full FP16 high-noise expert
├── wan22-i2v-a14b-high-q4-k-s.gguf    (8.2 GB) - Q4_K_S quantized high-noise expert
└── wan22-i2v-a14b-low-q4-k-s.gguf     (8.2 GB) - Q4_K_S quantized low-noise expert
```

**Total Repository Size**: 31 GB

### Model Files Explained

- **wan22-i2v-a14b-high.gguf**: Full precision FP16 high-noise expert model for maximum quality
- **wan22-i2v-a14b-high-q4-k-s.gguf**: Q4_K_S quantized high-noise expert (46% size reduction)
- **wan22-i2v-a14b-low-q4-k-s.gguf**: Q4_K_S quantized low-noise expert (46% size reduction)

**Quantization Format**: Q4_K_S (4-bit K-quant Small) provides an optimal balance between model size, memory usage, and generation quality.

## Hardware Requirements

### Minimum Requirements

| Configuration | VRAM | Disk Space | RAM |
|--------------|------|------------|-----|
| **Full FP16** | 24 GB | 31 GB | 32 GB |
| **Q4_K_S Quantized** | 12 GB | 31 GB | 16 GB |
| **Mixed (FP16 + Q4_K_S)** | 18 GB | 31 GB | 24 GB |

### Recommended Requirements

- **GPU**: NVIDIA RTX 4090 (24GB), RTX 6000 Ada (48GB), or A6000 (48GB)
- **CPU**: Modern multi-core processor (8+ cores recommended)
- **Storage**: SSD for faster model loading
- **Operating System**: Windows 10/11, Linux (Ubuntu 22.04+)

### Performance Notes

- **FP16 models** provide the highest quality but require more VRAM
- **Q4_K_S quantization** reduces VRAM usage by ~50% with minimal quality loss
- Video generation time depends on resolution (480P ~30-60s, 720P ~60-120s per video)
- Batch processing can improve throughput but requires additional VRAM

## Usage Examples

### ComfyUI Integration

The most common way to use these GGUF models is through [ComfyUI](https://github.com/comfyanonymous/ComfyUI) with the [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node.

**Installation**:
```bash
# Navigate to ComfyUI custom nodes directory
cd ComfyUI/custom_nodes

# Clone the GGUF node
git clone https://github.com/city96/ComfyUI-GGUF

# Install dependencies
cd ComfyUI-GGUF
pip install -r requirements.txt
```

**Model Setup**:
```bash
# Copy models to ComfyUI directory
cp E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\*.gguf ComfyUI\models\unet\
```

**Workflow Configuration**:
1. Load image input node
2. Add **GGUF Model Loader** node
3. Select `wan22-i2v-a14b-high-q4-k-s.gguf` (for high-noise expert)
4. Add prompt conditioning (optional)
5. Configure video sampler with:
   - Steps: 50-100
   - CFG Scale: 7-9
   - Resolution: 480P or 720P
6. Connect to video output node

### Python Usage (Diffusers)

For direct Python usage with absolute paths:

```python
from diffusers import DiffusionPipeline
import torch

# Note: GGUF models require conversion or specialized loaders
# For native Diffusers support, use the base model:
# pipe = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers")

# For GGUF files, use ComfyUI or llama.cpp-based loaders
# Example using custom GGUF loader (requires compatible library):
from comfyui_gguf_loader import load_gguf_model

model_path = r"E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\wan22-i2v-a14b-high-q4-k-s.gguf"
model = load_gguf_model(model_path, device="cuda", dtype=torch.float16)

# Generate video from image
image = load_image("input_image.jpg")
video = model.generate(
    image=image,
    prompt="A serene landscape with gentle wind moving through grass",
    num_frames=48,
    resolution="720p",
    guidance_scale=8.0,
    num_inference_steps=75
)

# Save video
video.save("output_video.mp4")
```

### Advanced Configuration

```python
# Memory-optimized configuration for 12GB VRAM
config = {
    "model_path": r"E:\huggingface\wan22-fp16-i2v-gguf\diffusion_models\wan\wan22-i2v-a14b-high-q4-k-s.gguf",
    "vae_tiling": True,          # Reduce VAE memory usage
    "enable_xformers": True,      # Memory-efficient attention
    "gradient_checkpointing": True,
    "low_vram_mode": True,
    "chunk_size": 2,              # Process video in chunks
}
```

## Model Specifications

### Architecture

- **Base Model**: Wan 2.2 I2V-A14B (Image-to-Video)
- **Parameters**: 14.3 billion per expert (~27B total, 14B active)
- **Architecture**: Mixture-of-Experts (MoE) Diffusion Transformer
- **Experts**: Dual-expert design (high-noise + low-noise)
- **Precision**: FP16 (full) / Q4_K_S (quantized)
- **Format**: GGUF (GPT-Generated Unified Format)

### Capabilities

- **Input**: Static images (any resolution, recommended 512x512 or higher)
- **Output**: Video sequences at 480P (854x480) or 720P (1280x720)
- **Frame Count**: Configurable (typically 24-96 frames)
- **Frame Rate**: 24 FPS (configurable)
- **Duration**: 1-4 seconds typical output
- **Text Conditioning**: Optional prompt-guided generation
- **Style Control**: Lighting, composition, contrast, color tone

### Quantization Details

**Q4_K_S Quantization**:
- **Bit Depth**: 4-bit per weight (mixed with some 6-bit components)
- **Method**: K-quant Small (balanced quality/size trade-off)
- **Size Reduction**: ~46% compared to FP16
- **Quality Loss**: Minimal (~2-5% perceptual difference)
- **Speed**: Similar or faster inference due to reduced memory bandwidth

## Performance Tips and Optimization

### Memory Optimization

1. **Use Quantized Models**: Start with Q4_K_S versions for 12GB VRAM systems
2. **Enable VAE Tiling**: Reduces memory usage by processing image tiles
3. **Lower Resolution**: Generate at 480P first, upscale if needed
4. **Reduce Batch Size**: Process one video at a time on limited VRAM
5. **Model Offloading**: Move models to CPU between inference steps

### Quality Optimization

1. **Inference Steps**: Use 75-100 steps for best quality (50 minimum)
2. **Guidance Scale**: CFG 7-9 provides good prompt adherence
3. **Prompt Engineering**: Describe motion, lighting, and camera movement
4. **Input Image Quality**: Higher quality input = better video output
5. **Resolution Matching**: Match input aspect ratio to output resolution

### Speed Optimization

1. **Use Quantized Models**: Q4_K_S inference is 10-20% faster
2. **Enable xFormers**: Memory-efficient attention for faster processing
3. **Optimize Steps**: Balance quality vs speed (50-75 steps for faster generation)
4. **Compile Model**: Use `torch.compile()` for 15-25% speedup (PyTorch 2.0+)
5. **GPU Warmup**: Run one generation to compile kernels before batch processing

### Example Prompts

**Good Prompts**:
- "Gentle camera pan right, golden hour lighting, soft wind through trees"
- "Slow zoom in, dramatic lighting from left, subtle motion in background"
- "Static camera, clouds moving across sky, soft ambient lighting"

**Avoid**:
- Overly complex multi-action prompts
- Conflicting motion directions
- Unrealistic physics or transformations

## License

This model is released under a **custom Wan license**. Please refer to the original Wan 2.2 model repository for complete licensing terms.

### Usage Terms

Users are accountable for the content they generate and must not:
- Violate laws or regulations
- Cause harm to individuals or groups
- Generate or spread misinformation or disinformation
- Target or harm vulnerable populations

### Commercial Use

Please consult the original Wan 2.2 license for commercial use terms and conditions.

## Citation

If you use Wan 2.2 models in your research or applications, please cite:

```bibtex
@article{wan2025,
  title={Wan: Open and Advanced Large-Scale Video Generative Models},
  author={Team Wan and Contributors},
  journal={arXiv preprint arXiv:2503.20314},
  year={2025}
}
```

## Related Resources

### Official Resources

- **Original Model**: [Wan-AI/Wan2.2-I2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B)
- **Diffusers Version**: [Wan-AI/Wan2.2-I2V-A14B-Diffusers](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers)
- **GGUF Collection**: [QuantStack/Wan2.2-I2V-A14B-GGUF](https://huggingface.co/QuantStack/Wan2.2-I2V-A14B-GGUF)
- **GitHub Repository**: [Wan-Video/Wan2.2](https://github.com/Wan-Video/Wan2.2)
- **Research Paper**: [arXiv:2503.20314](https://arxiv.org/abs/2503.20314)

### Community Resources

- **ComfyUI Integration**: [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF)
- **Tutorial**: [Wan 2.2 VideoGen in ComfyUI](https://www.stablediffusiontutorials.com/2025/08/wan-2.2-video-generation.html)
- **Low VRAM Guide**: [Running Wan 2.2 GGUF with Low VRAM](https://www.nextdiffusion.ai/tutorials/how-to-run-wan22-image-to-video-gguf-models-in-comfyui-low-vram)

### Other Wan 2.2 Variants

- **Text-to-Video**: [Wan2.2-T2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B)
- **Text+Image-to-Video**: [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
- **Speech-to-Video**: [Wan2.2-S2V-14B](https://huggingface.co/Wan-AI/Wan2.2-S2V-14B)

## Troubleshooting

### Common Issues

**Issue**: Out of memory errors
**Solution**: Use Q4_K_S quantized models, enable VAE tiling, reduce resolution to 480P

**Issue**: Slow generation speed
**Solution**: Use quantized models, enable xFormers, reduce inference steps to 50-75

**Issue**: Poor video quality
**Solution**: Increase inference steps to 75-100, use higher guidance scale (8-9), improve input image quality

**Issue**: Model fails to load
**Solution**: Verify GGUF loader compatibility, check file integrity, ensure sufficient disk space

**Issue**: Inconsistent motion
**Solution**: Use clearer motion prompts, adjust guidance scale, increase inference steps

## Support and Contact

For issues, questions, or contributions:

- **Model Issues**: [Wan-AI on Hugging Face](https://huggingface.co/Wan-AI)
- **GGUF Issues**: [ComfyUI-GGUF GitHub](https://github.com/city96/ComfyUI-GGUF/issues)
- **General Discussion**: [Hugging Face Forums](https://discuss.huggingface.co/)

---

**Model Version**: v2.2
**README Version**: v1.3
**Last Updated**: 2025-10-14
**Format**: GGUF (FP16 + Q4_K_S)
**Base Model**: Wan-AI/Wan2.2-I2V-A14B