FaithEval Benchmark Collection A collection of contextual QA datasets dedicated to evaluate the contextual faithfulness of LLMs • 3 items • Updated Oct 31 • 3
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26 • 70
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7 • 39
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 142
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published Feb 18 • 72
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published Feb 13 • 37
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published Feb 13 • 15
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 191