LTX-2: Efficient Joint Audio-Visual Foundation Model Paper β’ 2601.03233 β’ Published Jan 6 β’ 151
baidu/ERNIE-4.5-VL-28B-A3B-Thinking Image-Text-to-Text β’ 30B β’ Updated about 1 hour ago β’ 1.05k β’ 518
Running Featured 58 ERNIE-4.5-VL-28B-A3B-Thinking Demo π 58 Compact model, powerful multimodal reasoning.
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper β’ 2401.01952 β’ Published Jan 3, 2024 β’ 32