Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics Paper • 2510.17797 • Published Oct 20 • 10
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts Paper • 2510.19363 • Published Oct 22 • 61
Establishing Best Practices for Building Rigorous Agentic Benchmarks Paper • 2507.02825 • Published Jul 3 • 1
Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published Sep 16 • 71
view article Article OpenAI just dropped two massive open-weight models — *but how do we separate the reality from the hype?* Aug 9 • 11
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 253
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 Jul 5, 2024 • 303
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 Mar 20, 2024 • 105
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7 • 255
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published Feb 13 • 37
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Oct 7 • 66
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 175
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published Jan 18 • 15
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9, 2024 • 45