Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Paper • 2510.25760 • Published Oct 29 • 16
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods Paper • 2510.07143 • Published Oct 8 • 12
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods Paper • 2510.07143 • Published Oct 8 • 12
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published Sep 16 • 28
A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation Paper • 2307.12574 • Published Jul 24, 2023
Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation Paper • 2308.05493 • Published Aug 10, 2023
EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition Paper • 2403.14082 • Published Mar 21, 2024
Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities Paper • 2407.11351 • Published Jul 16, 2024
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context Paper • 2411.16213 • Published Nov 25, 2024 • 2
TimeX++: Learning Time-Series Explanations with Information Bottleneck Paper • 2405.09308 • Published May 15, 2024
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation Paper • 2401.17664 • Published Jan 31, 2024
RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning Paper • 2502.00848 • Published Feb 2 • 1
Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera Paper • 2309.09297 • Published Sep 17, 2023
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges Paper • 2412.11936 • Published Dec 16, 2024
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance Paper • 2503.02581 • Published Mar 4
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 144
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness Paper • 2503.18445 • Published Mar 24 • 1