REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published 20 days ago • 25
HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration Paper • 2510.27266 • Published Oct 31 • 20
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle Paper • 2508.05612 • Published Aug 7 • 2