InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search Paper • 2512.18745 • Published 19 days ago • 11
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning Paper • 2512.22120 • Published 13 days ago • 13
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published 21 days ago • 95