Model weights in "Distilling to Hybrid Attention Models via KL-Guided Layer Selection" (https://arxiv.org/abs/2512.20569).
Yanhong Li
yanhong-li
AI & ML interests
None yet
Recent Activity
updated
a collection
about 1 month ago
Hybrid-Distillation updated
a collection
about 1 month ago
Hybrid-Distillation updated
a collection
about 1 month ago
Hybrid-Distillation Organizations
None yet