Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 11 days ago • 165
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published 27 days ago • 194
Visual Programmability: A Guide for Code-as-Thought in Chart Understanding Paper • 2509.09286 • Published Sep 11 • 11
Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9 • 83
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic Paper • 2509.01363 • Published Sep 1 • 58
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 208
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Paper • 2508.21148 • Published Aug 28 • 140
👁️ LFM2-VL Collection LFM2-VL is our first series of vision-language models, designed for on-device deployment. • 10 items • Updated 7 days ago • 58
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 49
ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability Paper • 2508.07050 • Published Aug 9 • 117
Adapting Vision-Language Models Without Labels: A Comprehensive Survey Paper • 2508.05547 • Published Aug 7 • 11
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal Paper • 2508.05988 • Published Aug 8 • 19
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents Paper • 2507.03112 • Published Jul 3 • 31
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents Paper • 2507.04590 • Published Jul 7 • 16