DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 7 days ago • 68
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer Paper • 2601.16515 • Published 14 days ago • 15
ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion Paper • 2601.16148 • Published 14 days ago • 12
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 15 days ago • 42
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 14 days ago • 13
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models Paper • 2601.11087 • Published 21 days ago • 11
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 22 days ago • 32
Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering Paper • 2601.09697 • Published 22 days ago • 8
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published 24 days ago • 16
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction Paper • 2601.05966 • Published 27 days ago • 23
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control Paper • 2601.05138 • Published 28 days ago • 18
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams Paper • 2601.02281 • Published Jan 5 • 33
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published Jan 1 • 130
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published Jan 5 • 61
Self-Evaluation Unlocks Any-Step Text-to-Image Generation Paper • 2512.22374 • Published Dec 26, 2025 • 17