Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.21776

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 37
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 119
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 142

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

Paper • 2503.15667 • Published Mar 19 • 8
Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27 • 79

Multimodal Reasoning

about 2 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27 • 1.21M • • 12.9k
deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27 • 706k • • 4k
mistralai/Mistral-Small-24B-Instruct-2501

24B • Updated Jul 28 • 381k • 948
deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated Feb 1 • 8.61k • 465

Video Understanding

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 96
Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27 • 79
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 104
Kwai Keye-VL 1.5 Technical Report

Paper • 2509.01563 • Published Sep 1 • 37

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

Paper • 2503.21380 • Published Mar 27 • 38
Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27 • 79
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published Mar 27 • 23
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15

VLM RL Reasoning

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21 • 24
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

Paper • 2503.16660 • Published Mar 20 • 72
CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Paper • 2503.18931 • Published Mar 24 • 30
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18 • 20

paper maybe useful

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Paper • 2502.08590 • Published Feb 12 • 42
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47
Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published Feb 18 • 86
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space

Paper • 2503.09419 • Published Mar 12 • 6

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 37
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

Video Understanding

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 96
Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27 • 79
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 104
Kwai Keye-VL 1.5 Technical Report

Paper • 2509.01563 • Published Sep 1 • 37

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 119
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 142

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

Paper • 2503.21380 • Published Mar 27 • 38
Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27 • 79
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published Mar 27 • 23
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

Paper • 2503.15667 • Published Mar 19 • 8
Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published Mar 27 • 79

VLM RL Reasoning

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21 • 24
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

Paper • 2503.16660 • Published Mar 20 • 72
CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Paper • 2503.18931 • Published Mar 24 • 30
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18 • 20

Multimodal Reasoning

about 2 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

paper maybe useful

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Paper • 2502.08590 • Published Feb 12 • 42
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47
Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published Feb 18 • 86
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space

Paper • 2503.09419 • Published Mar 12 • 6

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27 • 1.21M • • 12.9k
deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27 • 706k • • 4k
mistralai/Mistral-Small-24B-Instruct-2501

24B • Updated Jul 28 • 381k • 948
deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated Feb 1 • 8.61k • 465

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs