distilled-qwen3-0.6b-full-mmlu / README.md

CarlOwOs

Upload distilled Qwen model (Full) - α=0.7, T=4.0

160d925 verified 6 months ago

preview code

raw

history blame

1.53 kB

metadata

base_model: Qwen/Qwen3-0.6B-Base
tags:
  - knowledge-distillation
  - full-fine-tuning
  - mmlu
  - qwen
library_name: transformers
license: apache-2.0
datasets:
  - cais/mmlu

Distilled Qwen Model - Full Fine-tuning

This model was created through knowledge distillation from Qwen/Qwen3-8B-Base to Qwen/Qwen3-0.6B-Base using full parameter fine-tuning.

Model Details

Base Model: Qwen/Qwen3-0.6B-Base
Teacher Model: Qwen/Qwen3-8B-Base
Method: Knowledge Distillation with Full Fine-tuning
Dataset: MMLU (Massive Multitask Language Understanding)
Distillation Alpha: 0.7
Temperature: 4.0
Total Parameters: ~600M (all parameters updated)

Training Details

Training Samples: 100
Epochs: 1
Batch Size: 2
Learning Rate: 5e-05
Final Eval Loss: N/A

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the distilled model directly
tokenizer = AutoTokenizer.from_pretrained("CarlOwOs/distilled-qwen3-0.6b-full-mmlu")
model = AutoModelForCausalLM.from_pretrained("CarlOwOs/distilled-qwen3-0.6b-full-mmlu")

# Generate text
inputs = tokenizer("Question: What is the capital of France?\nA. London\nB. Berlin\nC. Paris\nD. Madrid\nAnswer:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

This model should be evaluated on MCQA tasks using log-likelihood comparison, as implemented in the evaluation framework.