head_dim * num_attention_heads != hidden_size
#4 opened 4 months ago
by
zhangchuanhu
【Evaluation】Best practice for evaluating Qwen3 !!
🚀
🔥
3
#2 opened 7 months ago
by
wangxingjun778
Add languages tag
#1 opened 7 months ago
by
de-francophones