Improve model card and add paper metadata
Browse filesHi! I'm Niels from the Hugging Face team.
This PR improves the model card for Internal Guidance (IG) by:
- Linking the repository to its [original paper](https://huggingface.co/papers/2512.24176) and project page.
- Adding relevant metadata tags for better discoverability.
- Summarizing key results (FID scores) on ImageNet 256x256.
- Adding the BibTeX citation for researchers to attribute the work.
Please review and merge if this looks good!
README.md
CHANGED
|
@@ -1,13 +1,37 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: unconditional-image-generation
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
|
|
|
| 5 |
# IG: Guiding a Diffusion Transformer with the Internal Dynamics of Itself
|
| 6 |
|
| 7 |
-
This repository contains the official PyTorch checkpoints for Internal Guidance.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: unconditional-image-generation
|
| 4 |
+
tags:
|
| 5 |
+
- image-generation
|
| 6 |
+
- diffusion-transformer
|
| 7 |
---
|
| 8 |
+
|
| 9 |
# IG: Guiding a Diffusion Transformer with the Internal Dynamics of Itself
|
| 10 |
|
| 11 |
+
This repository contains the official PyTorch checkpoints for **Internal Guidance (IG)**, as presented in the paper [Guiding a Diffusion Transformer with the Internal Dynamics of Itself](https://huggingface.co/papers/2512.24176).
|
| 12 |
+
|
| 13 |
+
Internal Guidance (IG) is a simple yet effective strategy that introduces auxiliary supervision on the intermediate layers during the training process. During sampling, it extrapolates the outputs of intermediate and deep layers to achieve superior generative results. This approach yields significant improvements in both training efficiency and image quality across various baselines like SiT and LightningDiT.
|
| 14 |
+
|
| 15 |
+
- **Paper:** [https://huggingface.co/papers/2512.24176](https://huggingface.co/papers/2512.24176)
|
| 16 |
+
- **Project Page:** [https://zhouxingyu13.github.io/Internal-Guidance/](https://zhouxingyu13.github.io/Internal-Guidance/)
|
| 17 |
+
- **Code:** [https://github.com/CVL-UESTC/Internal-Guidance](https://github.com/CVL-UESTC/Internal-Guidance)
|
| 18 |
+
|
| 19 |
+
## Results
|
| 20 |
+
|
| 21 |
+
On ImageNet 256x256, IG-guided models achieve state-of-the-art performance:
|
| 22 |
+
- **SiT-XL/2 + IG**: FID = 1.75 at 800 epochs.
|
| 23 |
+
- **LightningDiT-XL/1 + IG**: FID = 1.34 (random sampling).
|
| 24 |
+
- **LightningDiT-XL/1 + IG + CFG**: FID = 1.19 (random sampling) and **1.07** (uniform balanced sampling).
|
| 25 |
|
| 26 |
+
## Citation
|
| 27 |
|
| 28 |
+
If you find this work helpful or inspiring, please feel free to cite it:
|
| 29 |
|
| 30 |
+
```bibtex
|
| 31 |
+
@article{zhou2025guiding,
|
| 32 |
+
title={Guiding a Diffusion Transformer with the Internal Dynamics of Itself},
|
| 33 |
+
author={Zhou, Xingyu and Li, Qifan and Hu, Xiaobin and Chen, Hai and Gu, Shuhang},
|
| 34 |
+
journal={arXiv preprint arXiv:2512.24176},
|
| 35 |
+
year={2025}
|
| 36 |
+
}
|
| 37 |
+
```
|