LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding Paper ⢠2512.16229 ⢠Published Dec 18, 2025 ⢠16
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper ⢠2512.19673 ⢠Published Dec 22, 2025 ⢠64
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Paper ⢠2510.09541 ⢠Published Oct 10, 2025 ⢠17
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper ⢠2510.25992 ⢠Published Oct 29, 2025 ⢠48
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper ⢠2504.13837 ⢠Published Apr 18, 2025 ⢠139
VisPlay: Self-Evolving Vision-Language Models from Images Paper ⢠2511.15661 ⢠Published Nov 19, 2025 ⢠43
The Path Not Taken: RLVR Provably Learns Off the Principals Paper ⢠2511.08567 ⢠Published Nov 11, 2025 ⢠34
Runtime error Featured 2.95k The Smol Training Playbook š 2.95k The secrets to building world-class LLMs
Kimi Linear: An Expressive, Efficient Attention Architecture Paper ⢠2510.26692 ⢠Published Oct 30, 2025 ⢠122
view article Article Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques š š Aug 26, 2024 ⢠84
Demystifying Reinforcement Learning in Agentic Reasoning Paper ⢠2510.11701 ⢠Published Oct 13, 2025 ⢠32
A Survey of Reinforcement Learning for Large Reasoning Models Paper ⢠2509.08827 ⢠Published Sep 10, 2025 ⢠190