view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 275
mistralai/Mistral-7B-Instruct-v0.2 Text Generation • 7B • Updated Jul 24, 2025 • 2.33M • • 3.07k