Probing-RM Collection Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models • 2 items • Updated 18 days ago
Probing-RM Collection Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models • 2 items • Updated 18 days ago
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 101
GRAM-R^2: Self-Training Generative Foundation Reward Models for Reward Reasoning Paper • 2509.02492 • Published Sep 2 • 1
GRAM: A Generative Foundation Reward Model for Reward Generalization Paper • 2506.14175 • Published Jun 17 • 1
GRAM-RR Collection Self-Training Generative Foundation Reward Models for Reward Reasoning • 4 items • Updated 30 days ago
GRAM-RR Collection Self-Training Generative Foundation Reward Models for Reward Reasoning • 4 items • Updated 30 days ago
GRAM Collection Generative Foundation Reward Models for Reward Generalization • 8 items • Updated Jun 19 • 1