Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning Paper • 2602.01745 • Published 12 days ago • 7
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published 5 days ago • 39
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published 12 days ago • 32
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving Paper • 2601.01426 • Published Jan 4 • 24
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19, 2025 • 118
MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation Paper • 2405.11430 • Published May 19, 2024 • 2
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving Paper • 2502.20238 • Published Feb 27, 2025 • 23