# gemma-2b-orpo Training notebook

gemma-2b-orpo is ORPO fine-tune of [google/gemma-2b](https://huggingface.co/google/gemma-2b) with
[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).

Some good resources:
- [HF Transformers Trainer docs](https://huggingface.co/docs/transformers/main_classes/trainer)
- [Docs on training with ORPO using HF TRL](https://huggingface.co/docs/trl/main/en/orpo_trainer)
- [TRL example script for ORPO](https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py)
- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma)

In [None]:
! pip install git+https://github.com/huggingface/trl.git  # install TRL from the main branch to use the ORPOTrainer
! pip install bitsandbytes accelerate
! pip install ninja packaging
! MAX_JOBS=6 pip install flash-attn --no-build-isolation --upgrade  # flash-attn speeds up the training on compatible GPUs
! pip install wandb

In [None]:
# Login to the Hugging Face Hub to save the model
from huggingface_hub import login

login(token="YOUR_TOKEN")

In [4]:
# https://huggingface.co/docs/trl/main/en/orpo_trainer#trl.ORPOConfig
# https://www.philschmid.de/fine-tune-google-gemma

from trl import ORPOConfig, ORPOTrainer

# in the following config, we combine the usual HF Trainer args with the ORPOConfig args (beta)

cfg = ORPOConfig(
    output_dir='content/gemma-2b-orpo',     # usual HF Trainer args: https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer.args
    num_train_epochs=3,                     # number of training epochs
    per_device_train_batch_size=2,          # batch size per device during training
    gradient_accumulation_steps=2,          # number of steps before performing a backward/update pass
    gradient_checkpointing=True,            # use gradient checkpointing to save memory
    optim="adamw_torch_fused",              # use fused adamw optimizer
    logging_steps=20,                       # log every 20 steps
    bf16=True,                              # use bfloat16 precision
    tf32=True,                              # use tf32          
    learning_rate=5e-5,                     # learning rate
    warmup_ratio=0.1,
    warmup_steps=100,
    lr_scheduler_type="cosine",
    max_prompt_length=512,
    remove_unused_columns=False,
    max_length=1024,
    beta=0.1,                               # ORPO beta
    save_total_limit=3,                     # args related to saving the model...
    save_strategy="epoch",
    push_to_hub=True,                       
    report_to=['wandb'],                    # report metrics to Weights & Biases
    hub_model_id='anakin87/gemma-2b-orpo',
)

In [5]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

In [None]:
model_id = "google/gemma-2b"
tokenizer_id = "philschmid/gemma-tokenizer-chatml"


# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)
tokenizer.padding_side = 'right' # to prevent warnings

In [7]:
from datasets import load_dataset
import multiprocessing

In [None]:
# https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py

ds = load_dataset("alvarobartt/dpo-mix-7k-simplified")

def process(row):
    row["prompt"] = tokenizer.apply_chat_template(row["prompt"], tokenize=False)
    row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
    row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)
    return row

ds = ds.map(
    process,
    num_proc=multiprocessing.cpu_count(),
    load_from_cache_file=False,
)
train_dataset = ds["train"]
eval_dataset = ds["test"]

In [9]:
train_dataset[0]

{'dataset': 'argilla/distilabel-intel-orca-dpo-pairs',
 'prompt': '<bos><|im_start|>user\nQ:Question: how old julio cesar chavez when he fought de la hoya I found the following answer on Google: He holds records for most successful consecutive defenses of world titles (27), most title fights (37), most title-fight victories (31) and he is after Joe Louis with (23) for most title defenses won by knockout (21). Is that a correct answer? Yes or no.\nA:<|im_end|>\n',
 'chosen': "<|im_start|>assistant\n Yes, the information you found on Google is correct. Julio César Chávez holds several records related to world title defenses and victories, and he is considered one of the greatest boxers in history. Here is a detailed answer to your question:\n\nJulio César Chávez was born on July 12, 1962, in Ciudad Obregón, Sonora, Mexico. He began boxing at a young age and quickly made a name for himself in the sport, winning his first world title in 1984 when he defeated Mario Miranda for the WBC super

In [None]:
import wandb
run = wandb.init(project="YOUR_PROJECT_NAME")

In [None]:
orpo_trainer = ORPOTrainer(
    model=model,
    args=cfg,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer
)

In [None]:
orpo_trainer.train()

In [None]:
orpo_trainer.push_to_hub()