garg-aayush
/

llama31-8b-sft-nomask

+# LLaMA-3.1-8B SFT (No Prompt Masking)
+Fine-tuned LLaMA-3.1-8B using SFT instruction tuning **without prompt masking** (loss computed on all tokens).
+## Training Details
+- **Base Model**: meta-llama/Llama-3.1-8B
+- **Dataset**: UltraChat-200K + SafetyLlama (~200K examples)
+- **Training**: 1 epoch (6726 steps)
+- **Prompt Masking**: Disabled (loss on all tokens)
+## Evaluation Results
+| Benchmark | Baseline | This Model |
+|-----------|----------|------------|
+| GSM8K | 16.4% | 29.0% |
+| MMLU | 58.1% | 58.4% |
+| SST Safety | 62.0% | 78.0% |
+| AlpacaEval | 1.57% | **5.3%** |
+## Files
+- `eval_baseline/`: Baseline evaluation results (pre-finetuning Llama-3.1-8B)
+## Reference
+Part of CS336 Assignment 5 (SFT Instruction Tuning). See [building-from-scratch/sft](https://github.com/garg-aayush/building-from-scratch/tree/main/sft) for details.