🌟 BERT-base SQuAD v1.1 Question Answering Model

Fine-tuned bert-base-uncased on SQuAD v1.1 for English extractive question answering.

Model Space Transformers

This model takes a context paragraph and a question and predicts the most likely answer span inside the context.
It is intended as a compact, educational example of fine-tuning a smol LM (~110M parameters) for QA on Google Colab and deploying it on Hugging Face.


πŸ”Ž Use Cases

  • Educational demos of extractive question answering
  • Small QA systems over short English paragraphs
  • Teaching / learning how to:
    • preprocess SQuAD-style datasets
    • fine-tune BERT with Trainer
    • publish models + Spaces on Hugging Face

⚠️ Not intended for production-critical use (medical/legal/financial advice, etc.).


🧠 Model Details

  • Base model: bert-base-uncased
  • Architecture: Encoder-only Transformer with QA span head (start/end logits)
  • Parameters: ~110M
  • Task: Extractive Question Answering
  • Language: English
  • Author: omarbayoumi2
  • Training platform: Google Colab (free GPU)

πŸ“š Training Data

  • Dataset: SQuAD v1.1 (rajpurkar/squad)
  • Train split: ~87k question–answer pairs
  • Validation split: ~10k question–answer pairs
  • Domain: Wikipedia articles (encyclopedic text)

Each example provides:

  • context: paragraph
  • question: question string
  • answers: list with text and answer_start (character offset)

βš™οΈ Training Configuration

Fine-tuning was performed with the Hugging Face Trainer API.

  • Optimizer: AdamW (via Trainer)
  • Epochs: 2
  • Learning rate: 3e-5
  • Batch size: 8 (train), 16 (eval)
  • Max sequence length: 384
  • Doc stride: 128
  • Weight decay: 0.01
  • Mixed precision: FP16 (when GPU supports it)

Loss is computed as cross-entropy over the start and end token positions.


πŸ“Š Evaluation

Evaluation was performed on the SQuAD v1.1 validation split using the standard SQuAD metric:

  • Exact Match (EM): measures whether predicted span matches ground truth span exactly
  • F1 score: token-level overlap between predicted and true answer text

Typical results for this setup are:

Metric Score (approx)
Exact Match 66–80%
F1 77–88%

Scores may vary depending on number of training examples, epochs and random seed.


πŸš€ How to Use

1. With Transformers pipeline

from transformers import pipeline

qa = pipeline(
    "question-answering",
    model="omarbayoumi2/bert-base-qa-squad-colab",
)

context = "BERT is a language representation model developed by researchers at Google."
question = "Who developed BERT?"

result = qa(question=question, context=context)
print(result)
# {'score': ..., 'start': ..., 'end': ..., 'answer': 'researchers at Google'}
Downloads last month
39
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train omarbayoumi2/bert-base-qa-squad-colab

Space using omarbayoumi2/bert-base-qa-squad-colab 1