PedagogyRL-Experiments OpenLearnLM/deepseek_qwen3_8b_pedagogical_think_reward_grpo_step_300 8B • Updated Jul 9, 2025 • 14 OpenLearnLM/deepseek_qwen3_8b_pedagogical_think_noreward_grpo_step_300 8B • Updated Jul 9, 2025 • 19 OpenLearnLM/deepseek_qwen3_8b_think_noreward_grpo_step_300 8B • Updated Jul 9, 2025 • 21
PedagogyRL-Experiments OpenLearnLM/deepseek_qwen3_8b_pedagogical_think_reward_grpo_step_300 8B • Updated Jul 9, 2025 • 14 OpenLearnLM/deepseek_qwen3_8b_pedagogical_think_noreward_grpo_step_300 8B • Updated Jul 9, 2025 • 19 OpenLearnLM/deepseek_qwen3_8b_think_noreward_grpo_step_300 8B • Updated Jul 9, 2025 • 21