CarlOwOs's picture
Upload distilled Qwen model (Full) - α=0.7, T=4.0
160d925 verified
|
raw
history blame
1.53 kB
metadata
base_model: Qwen/Qwen3-0.6B-Base
tags:
  - knowledge-distillation
  - full-fine-tuning
  - mmlu
  - qwen
library_name: transformers
license: apache-2.0
datasets:
  - cais/mmlu

Distilled Qwen Model - Full Fine-tuning

This model was created through knowledge distillation from Qwen/Qwen3-8B-Base to Qwen/Qwen3-0.6B-Base using full parameter fine-tuning.

Model Details

  • Base Model: Qwen/Qwen3-0.6B-Base
  • Teacher Model: Qwen/Qwen3-8B-Base
  • Method: Knowledge Distillation with Full Fine-tuning
  • Dataset: MMLU (Massive Multitask Language Understanding)
  • Distillation Alpha: 0.7
  • Temperature: 4.0
  • Total Parameters: ~600M (all parameters updated)

Training Details

  • Training Samples: 100
  • Epochs: 1
  • Batch Size: 2
  • Learning Rate: 5e-05
  • Final Eval Loss: N/A

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the distilled model directly
tokenizer = AutoTokenizer.from_pretrained("CarlOwOs/distilled-qwen3-0.6b-full-mmlu")
model = AutoModelForCausalLM.from_pretrained("CarlOwOs/distilled-qwen3-0.6b-full-mmlu")

# Generate text
inputs = tokenizer("Question: What is the capital of France?\nA. London\nB. Berlin\nC. Paris\nD. Madrid\nAnswer:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

This model should be evaluated on MCQA tasks using log-likelihood comparison, as implemented in the evaluation framework.