--- datasets: - quanhaol/MagicData base_model: - Wan-AI/Wan2.2-TI2V-5B - quanhaol/Wan2.2-TI2V-5B-Turbo ---

Wan2.2-TI2V-5B-Turbo-Diffusers

This repo is the Diffusers version of quanhaol/Wan2.2-TI2V-5B-Turbo GitHub HuggingFace HuggingFace Wan2.2-TI2V-5B-Turbo is designed for efficient step distillation and CFG distillation based on Wan2.2-TI2V-5B. Leveraging the Self-Forcing framework, it enables 4-step TI2V-5B model training. **Our model can generate 121-frame videos at 24 FPS with a resolution of 1280×704 in just 4 steps, eliminating the need for the CFG trick.** To the best of our knowledge, Wan2.2-TI2V-5B-Turbo is the **first** open-source repository of the distilled I2V version of Wan2.2-TI2V-5B. ## 🔥Video Demos
## 🐍 Installation ```bash pip install -U diffusers ``` ## 🚀Quick Start ### Text To Video ```python from diffusers import WanPipeline, UniPCMultistepScheduler device = "cuda" pipe = WanPipeline.from_pretrained("yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers", torch_dtype=torch.bfloat16).to(device) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0) width = 1280 height = 704 num_frames = 121 prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." with torch.inference_mode(): video = pipe( prompt = prompt, guidance_scale = 1.0, num_inference_steps = 4, generator = torch.Generator(device=device).manual_seed(43), width = width, height = height, num_frames = num_frames, ).frames[0] export_to_video(video, "video.mp4", fps=24) ``` ### Image To Video ```python import torch import numpy as np from diffusers import UniPCMultistepScheduler, WanImageToVideoPipeline from diffusers.utils import export_to_video, load_image device = "cuda" pipe = WanImageToVideoPipeline.from_pretrained("yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers", torch_dtype=torch.bfloat16).to(device) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0) max_area = 1280 * 704 mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1] image = load_image("https://github.com/quanhaol/Wan2.2-TI2V-5B-Turbo/blob/main/examples/images/cat.JPG?raw=true").convert("RGB") aspect_ratio = image.width / image.height width= round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value height = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value image = image.resize((width, height)) prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." num_frames = 121 with torch.inference_mode(): video = pipe( prompt = prompt, image = image, guidance_scale = 1.0, num_inference_steps = 4, generator = torch.Generator(device=device).manual_seed(43), width = width, height = height, num_frames = num_frames, ).frames[0] export_to_video(video, "video.mp4", fps=24) ```