Squash Code Corruptor Model

T5-based model for generating realistic Python code bugs for educational purposes.

Model Description

This model is trained to introduce realistic bugs into Python code, including:

  • Logic errors (operator swaps, off-by-one errors, wrong variables)
  • Syntax errors (missing colons, indentation issues)

Trained on 1500 examples:

  • 1000 syntax error pairs
  • 500 logic error pairs (7 different categories)

Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("onegaiosu/squash-code-corruptor")
tokenizer = AutoTokenizer.from_pretrained("onegaiosu/squash-code-corruptor")

# Corrupt code
code = "def add(a, b):\n    return a + b"
inputs = tokenizer(code, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
corrupted = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training Data

Custom dataset of Python code pairs (correct โ†’ buggy) focusing on common programming mistakes for beginner and intermediate learners.

Intended Use

Educational tool for the Squash app - helping students learn Python by fixing intentionally buggy code.

Limitations

  • Trained specifically on Python code
  • May not work well with very long or complex code snippets
  • Best for code snippets under 50 lines

Citation

@misc{squash-code-corruptor,
  author = {Mao Abel},
  title = {Squash Code Corruptor},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/onegaiosu/squash-code-corruptor}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support