DreamOmni2: Multimodal Instruction-based Editing and Generation Paper • 2510.06679 • Published Oct 8 • 73
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models Paper • 2509.17627 • Published Sep 22 • 66
MGM-Omni Collection MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech • 18 items • Updated Oct 11 • 10
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators Paper • 2402.06894 • Published Feb 10, 2024 • 1
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published Mar 30 • 138
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Paper • 2502.13995 • Published Feb 19 • 9
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Paper • 2502.07531 • Published Feb 11 • 13
Stable Flow: Vital Layers for Training-Free Image Editing Paper • 2411.14430 • Published Nov 21, 2024 • 22
Zero-Shot Voice Cloning Collection TTS models that support zero-shot voice cloning • 8 items • Updated 5 days ago • 14
steiner-preview Collection Reasoning models trained on synthetic data using reinforcement learning. • 3 items • Updated Oct 20, 2024 • 33
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 15 items • Updated Apr 18 • 241
Audio Dialogues: Dialogues dataset for audio and music understanding Paper • 2404.07616 • Published Apr 11, 2024 • 16