File size: 11,226 Bytes
0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 cad9e8f 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 cad9e8f 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 0b82484 3637503 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 |
---
language: en
license: apache-2.0
tags:
- question-answering
- bert
- squad
- extractive-qa
- baseline
datasets:
- squad
metrics:
- f1
- exact_match
model-index:
- name: bert-base-uncased-squad-baseline
results:
- task:
type: question-answering
name: Question Answering
dataset:
name: SQuAD 1.1
type: squad
split: validation
metrics:
- type: exact_match
value: 79.45
name: Exact Match
- type: f1
value: 87.41
name: F1 Score
---
# BERT Base Uncased - SQuAD 1.1 Baseline
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the SQuAD 1.1 dataset for extractive question answering.
## Model Description
**BERT (Bidirectional Encoder Representations from Transformers)** fine-tuned on the Stanford Question Answering Dataset (SQuAD 1.1) to perform extractive question answering - finding the answer span within a given context passage.
- **Model Type:** Question Answering (Extractive)
- **Base Model:** `bert-base-uncased`
- **Language:** English
- **License:** Apache 2.0
- **Fine-tuned on:** SQuAD 1.1
- **Parameters:** 108,893,186 (all trainable)
## Intended Use
### Primary Use Cases
This model is designed for extractive question answering tasks where:
- The answer exists as a continuous span of text within the provided context
- Questions are factual and answerable from the context
- English language text processing
### Example Usage
```python
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
# Load model and tokenizer
model = AutoModelForQuestionAnswering.from_pretrained("G20-CS4248/bert-baseline-qa")
tokenizer = AutoTokenizer.from_pretrained("G20-CS4248/bert-baseline-qa")
# Create QA pipeline
qa_pipeline = pipeline(
"question-answering",
model=model,
tokenizer=tokenizer
)
# Ask a question
context = """
The Amazon rainforest is a moist broadleaf tropical rainforest in the Amazon biome
that covers most of the Amazon basin of South America. This basin encompasses
7,000,000 km2 (2,700,000 sq mi), of which 5,500,000 km2 (2,100,000 sq mi) are
covered by the rainforest.
"""
question = "How large is the Amazon basin?"
result = qa_pipeline(question=question, context=context)
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['score']:.4f}")
```
**Output:**
```
Answer: 7,000,000 km2
Confidence: 0.9234
```
### Direct Model Usage (without pipeline)
```python
import torch
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
model = AutoModelForQuestionAnswering.from_pretrained("G20-CS4248/bert-baseline-qa")
tokenizer = AutoTokenizer.from_pretrained("G20-CS4248/bert-baseline-qa")
question = "What is the capital of France?"
context = "Paris is the capital and largest city of France."
# Tokenize
inputs = tokenizer(question, context, return_tensors="pt")
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
# Get answer span
answer_start = torch.argmax(outputs.start_logits)
answer_end = torch.argmax(outputs.end_logits) + 1
answer = tokenizer.convert_tokens_to_string(
tokenizer.convert_ids_to_tokens(inputs.input_ids[0][answer_start:answer_end])
)
print(f"Answer: {answer}")
```
## Training Data
### Dataset: SQuAD 1.1
The Stanford Question Answering Dataset (SQuAD) v1.1 consists of questions posed by crowdworkers on a set of Wikipedia articles.
**Training Set:**
- **Examples:** 87,599
- **Average question length:** 10.06 words
- **Average context length:** 119.76 words
- **Average answer length:** 3.16 words
**Validation Set:**
- **Examples:** 10,570
- **Average question length:** 10.22 words
- **Average context length:** 123.95 words
- **Average answer length:** 3.02 words
### Data Preprocessing
- **Tokenizer:** `bert-base-uncased`
- **Max sequence length:** 384 tokens
- **Stride:** 128 tokens (for handling long contexts)
- **Padding:** Maximum length
- **Truncation:** Only second sequence (context)
Long contexts are split into multiple features with overlapping windows to ensure answers aren't lost at sequence boundaries.
## Training Procedure
### Training Hyperparameters
| Parameter | Value |
|-----------|-------|
| **Base model** | bert-base-uncased |
| **Optimizer** | AdamW |
| **Learning rate** | 3e-5 |
| **Learning rate schedule** | Linear with warmup |
| **Warmup ratio** | 0.1 (10% of training) |
| **Weight decay** | 0.01 |
| **Batch size (train)** | 8 |
| **Batch size (eval)** | 8 |
| **Number of epochs** | 1 |
| **Mixed precision** | FP16 (enabled) |
| **Gradient accumulation** | 1 |
| **Max gradient norm** | 1.0 |
### Training Environment
- **Hardware:** NVIDIA GPU (CUDA enabled)
- **Framework:** PyTorch with Transformers library
- **Training time:** ~29.5 minutes (1 epoch)
- **Training samples/second:** 44.95
- **Total FLOPs:** 14,541,777 GF
### Training Metrics
- **Final training loss:** 1.2236
- **Evaluation strategy:** End of epoch
- **Metric for best model:** Evaluation loss
## Performance
### Evaluation Results
Evaluated on SQuAD 1.1 validation set (10,570 examples):
| Metric | Score |
|--------|-------|
| **Exact Match (EM)** | **79.45%** |
| **F1 Score** | **87.41%** |
### Metric Explanations
- **Exact Match (EM):** Percentage of predictions that match the ground truth answer exactly
- **F1 Score:** Token-level F1 score measuring overlap between predicted and ground truth answers
### Comparison to BERT Base Performance
| Model | EM | F1 | Training |
|-------|----|----|----------|
| **This model (1 epoch)** | 79.45 | 87.41 | 29.5 min |
| BERT Base (original paper, 3 epochs) | 80.8 | 88.5 | ~2-3 hours |
| BERT Base (fully trained) | 81-84 | 88-91 | ~2-3 hours |
**Note:** This is a baseline model trained for only 1 epoch. Performance can be improved with additional training epochs.
### Performance by Question Type
The model performs well on:
- ✅ Factual questions (What, When, Where, Who)
- ✅ Short answer spans (1-5 words)
- ✅ Questions with clear context
May struggle with:
- ⚠️ Questions requiring reasoning across multiple sentences
- ⚠️ Very long answer spans
- ⚠️ Ambiguous questions with multiple valid answers
- ⚠️ Questions requiring world knowledge not in context
## Limitations and Biases
### Known Limitations
1. **Extractive Only:** Can only extract answers present in the context; cannot generate or synthesize answers
2. **Single Answer:** Provides only one answer span, even if multiple valid answers exist
3. **Context Dependency:** Requires relevant context; cannot answer from general knowledge
4. **Length Constraints:** Limited to 384 tokens per context window
5. **English Only:** Trained on English text; not suitable for other languages
6. **Training Duration:** Only 1 epoch of training; may underfit compared to longer training
### Potential Biases
- **Domain Bias:** Trained primarily on Wikipedia articles; may perform worse on other text types (news, technical docs, etc.)
- **Temporal Bias:** Training data from 2016; may have outdated information
- **Cultural Bias:** Reflects biases present in Wikipedia content
- **Answer Position Bias:** May favor answers appearing in certain positions within context
- **BERT Base Biases:** Inherits any biases from the pre-trained BERT base model
### Out-of-Scope Use
This model should NOT be used for:
- ❌ Medical, legal, or financial advice
- ❌ High-stakes decision making
- ❌ Generative question answering (creating new answers)
- ❌ Non-English languages
- ❌ Yes/no or multiple choice questions (without adaptation)
- ❌ Questions requiring reasoning beyond the context
- ❌ Real-time fact checking or verification
## Technical Specifications
### Model Architecture
```
BertForQuestionAnswering(
(bert): BertModel(
(embeddings): BertEmbeddings
(encoder): BertEncoder (12 layers)
(pooler): BertPooler
)
(qa_outputs): Linear(768 -> 2) # Start and end position logits
)
```
- **Hidden size:** 768
- **Attention heads:** 12
- **Intermediate size:** 3072
- **Hidden layers:** 12
- **Vocabulary size:** 30,522
- **Max position embeddings:** 512
- **Total parameters:** 108,893,186
### Input Format
The model expects tokenized input with:
- Question and context concatenated with `[SEP]` token
- Format: `[CLS] question [SEP] context [SEP]`
- Token type IDs to distinguish question (0) from context (1)
- Attention mask to identify real vs padding tokens
### Output Format
Returns:
- `start_logits`: Scores for each token being the start of the answer span
- `end_logits`: Scores for each token being the end of the answer span
The predicted answer is the span from token with highest start_logit to token with highest end_logit (where end >= start).
## Evaluation Data
**SQuAD 1.1 Validation Set**
- 10,570 question-context-answer triples
- Same source and format as training data
- Used for final performance evaluation
## Environmental Impact
- **Training hardware:** 1x NVIDIA GPU
- **Training time:** ~29.5 minutes
- **Compute region:** Not specified
- **Carbon footprint:** Estimated minimal due to short training time
## Model Card Authors
[Your Name / Team Name]
## Model Card Contact
[Your Email / Contact Information]
## Citation
If you use this model, please cite:
```bibtex
@misc{bert-squad-baseline-2025,
author = {Your Name},
title = {BERT Base Uncased Fine-tuned on SQuAD 1.1 (Baseline)},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/your-username/bert-squad-baseline}}
}
```
### Original BERT Paper
```bibtex
@article{devlin2018bert,
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1810.04805},
year={2018}
}
```
### SQuAD Dataset
```bibtex
@article{rajpurkar2016squad,
title={SQuAD: 100,000+ Questions for Machine Comprehension of Text},
author={Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy},
journal={arXiv preprint arXiv:1606.05250},
year={2016}
}
```
## Additional Information
### Future Improvements
Potential enhancements for this baseline model:
- 🔄 Train for additional epochs (2-3 epochs recommended)
- 📈 Increase batch size with gradient accumulation
- 🎯 Implement learning rate scheduling
- 🔍 Add answer validation/verification
- 📊 Ensemble with multiple models
- 🚀 Distillation to smaller model for deployment
### Related Models
- [bert-base-uncased](https://huggingface.co/bert-base-uncased) - Base model
- [bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad) - Larger BERT variant
- [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squad) - Smaller, faster variant
### Acknowledgments
- Google Research for BERT
- Stanford NLP for SQuAD dataset
- Hugging Face for Transformers library
- [Your course/institution if applicable]
---
**Last updated:** October 2025
**Model version:** 1.0 (Baseline)
**Status:** Baseline model - suitable for development/comparison |