RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging Paper • 2510.20479 • Published Oct 23 • 10
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published Oct 23 • 45
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published Oct 16 • 47
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published Oct 24 • 99
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published Oct 27 • 22
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27 • 26
Rethinking Visual Intelligence: Insights from Video Pretraining Paper • 2510.24448 • Published Oct 28 • 5
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors Paper • 2510.17439 • Published Oct 20 • 26
RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper • 2510.23763 • Published Oct 27 • 53
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning Paper • 2510.23473 • Published Oct 27 • 84
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30 • 33
Exploring Conditions for Diffusion models in Robotic Control Paper • 2510.15510 • Published Oct 17 • 39