PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling Paper • 2512.04784 • Published 8 days ago • 23
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models Paper • 2510.06014 • Published Oct 7 • 10
InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation Paper • 2510.09724 • Published Oct 10 • 10
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28 • 71
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27 • 96
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 174
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18 • 111
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11 • 46
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published Aug 27 • 36
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published Aug 20 • 84
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published Aug 16 • 71
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback Paper • 2507.22080 • Published Jul 25 • 9
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters Paper • 2507.13618 • Published Jul 18 • 16
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20 • 46
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8 • 75