Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30 • 43
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper • 2506.18898 • Published Jun 23 • 33
Seedance 1.0: Exploring the Boundaries of Video Generation Models Paper • 2506.09113 • Published Jun 10 • 104
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Paper • 2506.09350 • Published Jun 11 • 48
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14 • 97
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Paper • 2504.04010 • Published Apr 5 • 9
X-Dancer: Expressive Music to Human Dance Video Generation Paper • 2502.17414 • Published Feb 24 • 14