llama31-8b-sft-nomask / README.md

garg-aayush

Upload README.md with huggingface_hub

df144d0 verified 5 days ago

preview code

raw

history blame contribute delete

1.94 kB

metadata

license: llama3.1
base_model: meta-llama/Llama-3.1-8B
tags:
  - sft
  - instruction-tuning
  - llama
datasets:
  - HuggingFaceH4/ultrachat_200k
model-index:
  - name: llama31-8b-sft-nomask
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8K
          type: gsm8k
        metrics:
          - name: Accuracy
            type: accuracy
            value: 29
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU
          type: mmlu
        metrics:
          - name: Accuracy
            type: accuracy
            value: 58.4
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Simple Safety Tests
          type: simple_safety_tests
        metrics:
          - name: Safety Score
            type: accuracy
            value: 78
            verified: false
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AlpacaEval
          type: tatsu-lab/alpaca_eval
        metrics:
          - name: LC Win Rate
            type: win_rate
            value: 5.3
            verified: false

LLaMA-3.1-8B SFT (No Prompt Masking)

Fine-tuned LLaMA-3.1-8B using SFT instruction tuning without prompt masking (loss computed on all tokens).

Training Details

Base Model: meta-llama/Llama-3.1-8B
Dataset: UltraChat-200K + SafetyLlama (~200K examples)
Training: 1 epoch (6726 steps)
Prompt Masking: Disabled (loss on all tokens)

Evaluation Results

Benchmark	Baseline	This Model
GSM8K	16.4%	29.0%
MMLU	58.1%	58.4%
SST Safety	62.0%	78.0%
AlpacaEval	1.57%	5.3%

Files

eval_baseline/: Baseline evaluation results (pre-finetuning Llama-3.1-8B)

Reference

Part of CS336 Assignment 5 (SFT Instruction Tuning). See building-from-scratch/sft for details.