Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 β’ 7 items β’ Updated 10 days ago β’ 161
Running Featured 558 Vision Arena (Testing VLMs side-by-side) πΌ 558 Display image analysis results
Running Featured 129 Open VLM Video Leaderboard π 129 VLMEvalKit Eval Results in video understanding benchmark