PIPer: On-Device Environment Setup via Online Reinforcement Learning Paper • 2509.25455 • Published Sep 29 • 37
🦫 PIPer Collection All the resources for our paper "PIPer: On-Device Environment Setup via Online Reinforcement Learning"! • 9 items • Updated Oct 1 • 3
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling Paper • 2510.24645 • Published Oct 28 • 7
Spurious Rewards: Rethinking Training Signals in RLVR Paper • 2506.10947 • Published Jun 12 • 2
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published Oct 30 • 5
Data-Efficient RLVR via Off-Policy Influence Guidance Paper • 2510.26491 • Published Oct 30 • 9
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published 25 days ago • 31
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published 27 days ago • 50
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published Oct 13 • 31
— UI is a good thing 💅 — Collection cool spaces with a cool UI, what could be better? • 5 items • Updated May 5 • 28
[NeurIPS 2025] RPC Resources Collection Sampled Reasoning Paths for NeurIPS 2025 Paper: A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning • 6 items • Updated Oct 23 • 8
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought Paper • 2510.04230 • Published Oct 5 • 26
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 49 items • Updated 12 days ago • 133
New Trends for Modern Machine Translation with Large Reasoning Models Paper • 2503.10351 • Published Mar 13 • 25
AceReason Collection Math and Code reasoning model trained through reinforcement learning (RL) • 7 items • Updated 3 days ago • 19
Tool Use Reasoning Collection A collection of tool use reasoning dataset in Hermes format • 5 items • Updated Jul 23 • 9