-
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Paper • 2509.22576 • Published • 134 -
AgentBench: Evaluating LLMs as Agents
Paper • 2308.03688 • Published • 25 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
JeonJinhyeok
jinn33
·
AI & ML interests
None yet
Recent Activity
updated
a model
about 19 hours ago
jinn33/crm-sft-adapter-v2
published
a model
about 19 hours ago
jinn33/crm-sft-adapter-v2
updated
a model
11 days ago
jinn33/crm-dpo-adapter
Organizations
None yet