VLM with textual-driven GRPO training for vision-grounded decision making (https://arxiv.org/pdf/2503.16965, NeurIPS 2025)
Derek Zhe Hu
zhehuderek
AI & ML interests
NLP, Multimodality
Recent Activity
updated
a model 21 days ago
zhehuderek/qwen2.5-3b-diverse-arggen-1sample-sft-100k published
a model 21 days ago
zhehuderek/qwen2.5-3b-diverse-arggen-1sample-sft-100k updated
a model 24 days ago
zhehuderek/qwen25_vl_7b_stage2_virl39k_step80 Organizations
None yet
YesBut
The collections of visual humor understanding and comparative reasoning.
-
zhehuderek/YESBUT_Benchmark
Viewer • Updated • 348 • 19 • 1 -
zhehuderek/YESBUT_Benchmark_V2
Viewer • Updated • 1.26k • 18 • 1 -
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Paper • 2405.19088 • Published -
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Paper • 2503.23137 • Published • 2
Praxis-VLM
VLM with textual-driven GRPO training for vision-grounded decision making (https://arxiv.org/pdf/2503.16965, NeurIPS 2025)
YesBut
The collections of visual humor understanding and comparative reasoning.
-
zhehuderek/YESBUT_Benchmark
Viewer • Updated • 348 • 19 • 1 -
zhehuderek/YESBUT_Benchmark_V2
Viewer • Updated • 1.26k • 18 • 1 -
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Paper • 2405.19088 • Published -
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Paper • 2503.23137 • Published • 2
models 11
zhehuderek/qwen2.5-3b-diverse-arggen-1sample-sft-100k
Text Generation • 3B • Updated
• 16
zhehuderek/qwen25_vl_7b_stage2_virl39k_step80
Image-Text-to-Text • 8B • Updated
• 15
zhehuderek/qwen25_vl_7b_guru_mixed_dapo_run2_step170
Image-Text-to-Text • 8B • Updated
zhehuderek/qwen25_vl_7b_guru_mixed_dapo_run2_step110
Image-Text-to-Text • 8B • Updated
zhehuderek/qwen25_vl_7b_guru_mixed_grpo_run4_step150
Image-Text-to-Text • 8B • Updated
zhehuderek/qwen2_5_vl_7b_GEOQA_8K_step90_hf
Image-Text-to-Text • 8B • Updated
zhehuderek/praxis_vlm_7b_decisionmaking
Image-Text-to-Text • 8B • Updated
zhehuderek/praxis_vlm_3b_decisionmaking
Image-Text-to-Text • 4B • Updated
zhehuderek/qwen2_5_vl_3b_GEOQA_8K_hf
Image-Text-to-Text • 4B • Updated
• 1
zhehuderek/llama-2-7b-chinese
Text Generation • 7B • Updated
datasets 12
zhehuderek/ViRL39K_proc
Viewer
• Updated
• 38.9k • 31
zhehuderek/processed_guru-RL-92k
Viewer
• Updated
• 72.3k • 7
zhehuderek/VIVA_Plus_Benchmark
Viewer
• Updated
• 6.37k • 19
zhehuderek/OpenThoughts3-1.2M-processed
Viewer
• Updated
• 39.6k • 6
zhehuderek/humor_understanding_combined
Viewer
• Updated
• 4.89k • 12 • 1
zhehuderek/humor_understanding_nyt
Viewer
• Updated
• 2.69k • 9
zhehuderek/comparative_reasoning_mllm_compbench
Viewer
• Updated
• 21.8k • 504
zhehuderek/humor_understanding_deepeval
Viewer
• Updated
• 2.96k • 10
zhehuderek/textual_decisionmaking_data
Viewer
• Updated
• 11k • 12 • 1
zhehuderek/YESBUT_Benchmark_V2
Viewer
• Updated
• 1.26k • 18 • 1