Post
71
PromptRL: Language Models as Co-Learners in Flow-Based Image Generation RL 🚀
We found two critical failure modes in flow-based RL:
1️⃣ Quality-Diversity Dilemma: High-quality models produce similar outputs, bottlenecking RL exploration
2️⃣ Prompt Linguistic Hacking: Models overfit to surface patterns—paraphrase the prompt and performance tanks
Solution: **Jointly train LM + FM** — the LM dynamically generates semantically-consistent but diverse prompt variants
📊 Results:
• GenEval: 0.97
• OCR accuracy: 0.98
• PickScore: 24.05
• 2×+ fewer rollouts than flow-only RL
Paper: arxiv.org/abs/2602.01382
Code: github.com/G-U-N/UniRL
#AI #TextToImage #ReinforcementLearning #Diffusion
We found two critical failure modes in flow-based RL:
1️⃣ Quality-Diversity Dilemma: High-quality models produce similar outputs, bottlenecking RL exploration
2️⃣ Prompt Linguistic Hacking: Models overfit to surface patterns—paraphrase the prompt and performance tanks
Solution: **Jointly train LM + FM** — the LM dynamically generates semantically-consistent but diverse prompt variants
📊 Results:
• GenEval: 0.97
• OCR accuracy: 0.98
• PickScore: 24.05
• 2×+ fewer rollouts than flow-only RL
Paper: arxiv.org/abs/2602.01382
Code: github.com/G-U-N/UniRL
#AI #TextToImage #ReinforcementLearning #Diffusion