π BERT-base SQuAD v1.1 Question Answering Model
Fine-tuned bert-base-uncased on SQuAD v1.1 for English extractive question answering.
This model takes a context paragraph and a question and predicts the most likely answer span inside the context.
It is intended as a compact, educational example of fine-tuning a smol LM (~110M parameters) for QA on Google Colab and deploying it on Hugging Face.
π Use Cases
- Educational demos of extractive question answering
- Small QA systems over short English paragraphs
- Teaching / learning how to:
- preprocess SQuAD-style datasets
- fine-tune BERT with
Trainer - publish models + Spaces on Hugging Face
β οΈ Not intended for production-critical use (medical/legal/financial advice, etc.).
π§ Model Details
- Base model:
bert-base-uncased - Architecture: Encoder-only Transformer with QA span head (start/end logits)
- Parameters: ~110M
- Task: Extractive Question Answering
- Language: English
- Author:
omarbayoumi2 - Training platform: Google Colab (free GPU)
π Training Data
- Dataset: SQuAD v1.1 (
rajpurkar/squad) - Train split: ~87k questionβanswer pairs
- Validation split: ~10k questionβanswer pairs
- Domain: Wikipedia articles (encyclopedic text)
Each example provides:
context: paragraphquestion: question stringanswers: list withtextandanswer_start(character offset)
βοΈ Training Configuration
Fine-tuning was performed with the Hugging Face Trainer API.
- Optimizer: AdamW (via
Trainer) - Epochs: 2
- Learning rate: 3e-5
- Batch size: 8 (train), 16 (eval)
- Max sequence length: 384
- Doc stride: 128
- Weight decay: 0.01
- Mixed precision: FP16 (when GPU supports it)
Loss is computed as cross-entropy over the start and end token positions.
π Evaluation
Evaluation was performed on the SQuAD v1.1 validation split using the standard SQuAD metric:
- Exact Match (EM): measures whether predicted span matches ground truth span exactly
- F1 score: token-level overlap between predicted and true answer text
Typical results for this setup are:
| Metric | Score (approx) |
|---|---|
| Exact Match | 66β80% |
| F1 | 77β88% |
Scores may vary depending on number of training examples, epochs and random seed.
π How to Use
1. With Transformers pipeline
from transformers import pipeline
qa = pipeline(
"question-answering",
model="omarbayoumi2/bert-base-qa-squad-colab",
)
context = "BERT is a language representation model developed by researchers at Google."
question = "Who developed BERT?"
result = qa(question=question, context=context)
print(result)
# {'score': ..., 'start': ..., 'end': ..., 'answer': 'researchers at Google'}
- Downloads last month
- 39