Squash Code Corruptor Model
T5-based model for generating realistic Python code bugs for educational purposes.
Model Description
This model is trained to introduce realistic bugs into Python code, including:
- Logic errors (operator swaps, off-by-one errors, wrong variables)
- Syntax errors (missing colons, indentation issues)
Trained on 1500 examples:
- 1000 syntax error pairs
- 500 logic error pairs (7 different categories)
Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("onegaiosu/squash-code-corruptor")
tokenizer = AutoTokenizer.from_pretrained("onegaiosu/squash-code-corruptor")
# Corrupt code
code = "def add(a, b):\n return a + b"
inputs = tokenizer(code, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
corrupted = tokenizer.decode(outputs[0], skip_special_tokens=True)
Training Data
Custom dataset of Python code pairs (correct โ buggy) focusing on common programming mistakes for beginner and intermediate learners.
Intended Use
Educational tool for the Squash app - helping students learn Python by fixing intentionally buggy code.
Limitations
- Trained specifically on Python code
- May not work well with very long or complex code snippets
- Best for code snippets under 50 lines
Citation
@misc{squash-code-corruptor,
author = {Mao Abel},
title = {Squash Code Corruptor},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/onegaiosu/squash-code-corruptor}}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support