Squash Code Corruptor Model

T5-based model for generating realistic Python code bugs for educational purposes.

Model Description

This model is trained to introduce realistic bugs into Python code, including:

Logic errors (operator swaps, off-by-one errors, wrong variables)
Syntax errors (missing colons, indentation issues)

Trained on 1500 examples:

1000 syntax error pairs
500 logic error pairs (7 different categories)

Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("onegaiosu/squash-code-corruptor")
tokenizer = AutoTokenizer.from_pretrained("onegaiosu/squash-code-corruptor")

# Corrupt code
code = "def add(a, b):\n    return a + b"
inputs = tokenizer(code, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
corrupted = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training Data

Custom dataset of Python code pairs (correct → buggy) focusing on common programming mistakes for beginner and intermediate learners.

Intended Use

Educational tool for the Squash app - helping students learn Python by fixing intentionally buggy code.

Limitations

Trained specifically on Python code
May not work well with very long or complex code snippets
Best for code snippets under 50 lines

Citation

@misc{squash-code-corruptor,
  author = {Mao Abel},
  title = {Squash Code Corruptor},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/onegaiosu/squash-code-corruptor}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support