Scaling up Masked Diffusion Models on Text
Paper
•
2410.18514
•
Published
•
1
This is the official implementation of the paper "Scaling up Masked Diffusion Models on Text" (https://arxiv.org/abs/2410.18514).
SMDM is a family of masked diffusion models (MDMs) trained on the SlimPajama dataset. The models demonstrate competitive performance with autoregressive models (ARMs) while offering unique advantages in terms of bidirectional reasoning and temporal adaptation.
Key features:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "nieshen/SMDM" # Replace with your model name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
The model was trained on the SlimPajama dataset using the following key components:
The model has been evaluated on various benchmarks:
If you use this model, please cite our paper:
@article{smdm2024,
title={Scaling up Masked Diffusion Models on Text},
author={[Authors]},
journal={arXiv preprint arXiv:2410.18514},
year={2024}
}
This model is released under the Apache 2.0 license.