Xu Zheng

xz287

https://zhengxujosh.github.io/

AI & ML interests

Computer Vision & MLLM

Recent Activity

upvoted a paper about 1 month ago

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

upvoted a paper about 2 months ago

AI for Service: Proactive Assistance with AI Glasses

authored a paper about 2 months ago

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

View all activity

Organizations

None yet

upvoted a paper about 1 month ago

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

Paper • 2510.25760 • Published Oct 29 • 16

upvoted a paper about 2 months ago

AI for Service: Proactive Assistance with AI Glasses

Paper • 2510.14359 • Published Oct 16 • 73

authored a paper about 2 months ago

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

Paper • 2510.07143 • Published Oct 8 • 12

upvoted a paper about 2 months ago

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

Paper • 2510.07143 • Published Oct 8 • 12

upvoted a paper 3 months ago

PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era

Paper • 2509.12989 • Published Sep 16 • 28

authored 15 papers 6 months ago

A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation

Paper • 2307.12574 • Published Jul 24, 2023

Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation

Paper • 2308.05493 • Published Aug 10, 2023

EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition

Paper • 2403.14082 • Published Mar 21, 2024

Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities

Paper • 2407.11351 • Published Jul 16, 2024

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

Paper • 2411.16213 • Published Nov 25, 2024 • 2

TimeX++: Learning Time-Series Explanations with Information Bottleneck

Paper • 2405.09308 • Published May 15, 2024

Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation

Paper • 2401.17664 • Published Jan 31, 2024

RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning

Paper • 2502.00848 • Published Feb 2 • 1

Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera

Paper • 2309.09297 • Published Sep 17, 2023

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Paper • 2412.11936 • Published Dec 16, 2024

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Paper • 2503.02581 • Published Mar 4

DiMeR: Disentangled Mesh Reconstruction Model

Paper • 2504.17670 • Published Apr 24 • 24

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Paper • 2505.19147 • Published May 25 • 144

Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness

Paper • 2503.18445 • Published Mar 24 • 1

MLLMs are Deeply Affected by Modality Bias

Paper • 2505.18657 • Published May 24 • 5

Xu Zheng

AI & ML interests

Recent Activity

Organizations

xz287's activity