OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 6 days ago • 55
Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training Paper • 2602.05940 • Published 6 days ago • 17
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 72
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27, 2025 • 97