SVG: Latent Diffusion Model without Variational Autoencoder
Model Description
SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
Key features:
- Replaces low-dimensional VAE latent space with high-dimensional semantic feature space.
- Includes a lightweight residual encoder for refining fine-grained details.
- Enables strong generation and perception performance.
How to Use
For code, and instructions, see the GitHub repository:
https://github.com/shiml20/SVG
Official project page:
https://howlin-wang.github.io/svg/
Arxiv paper:
- Downloads last month
- 29
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support