File size: 1,940 Bytes
fdd5070 df144d0 fdd5070 df144d0 fdd5070 df144d0 4b0c31b df144d0 4b0c31b df144d0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
license: llama3.1
base_model: meta-llama/Llama-3.1-8B
tags:
- sft
- instruction-tuning
- llama
datasets:
- HuggingFaceH4/ultrachat_200k
model-index:
- name: llama31-8b-sft-nomask
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8K
type: gsm8k
metrics:
- name: Accuracy
type: accuracy
value: 29.0
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU
type: mmlu
metrics:
- name: Accuracy
type: accuracy
value: 58.4
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
name: Simple Safety Tests
type: simple_safety_tests
metrics:
- name: Safety Score
type: accuracy
value: 78.0
verified: false
- task:
type: text-generation
name: Text Generation
dataset:
name: AlpacaEval
type: tatsu-lab/alpaca_eval
metrics:
- name: LC Win Rate
type: win_rate
value: 5.3
verified: false
---
# LLaMA-3.1-8B SFT (No Prompt Masking)
Fine-tuned LLaMA-3.1-8B using SFT instruction tuning **without prompt masking** (loss computed on all tokens).
## Training Details
- **Base Model**: meta-llama/Llama-3.1-8B
- **Dataset**: UltraChat-200K + SafetyLlama (~200K examples)
- **Training**: 1 epoch (6726 steps)
- **Prompt Masking**: Disabled (loss on all tokens)
## Evaluation Results
| Benchmark | Baseline | This Model |
|-----------|----------|------------|
| GSM8K | 16.4% | 29.0% |
| MMLU | 58.1% | 58.4% |
| SST Safety | 62.0% | 78.0% |
| AlpacaEval | 1.57% | **5.3%** |
## Files
- `eval_baseline/`: Baseline evaluation results (pre-finetuning Llama-3.1-8B)
## Reference
Part of CS336 Assignment 5 (SFT Instruction Tuning). See [building-from-scratch/sft](https://github.com/garg-aayush/building-from-scratch/tree/main/sft) for details.
|