Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'.
Jieneng Chen
jienengchen
AI & ML interests
multi-modal LLMs
Recent Activity
upvoted
a
paper
about 1 month ago
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
liked
a model
3 months ago
moonshotai/Kimi-K2-Thinking
liked
a Space
3 months ago
CSU-JPG/VCode