UniAudio 2.0: A Unified Audio Language Model with Text-Aligned Factorized Audio Tokenization Paper • 2602.04683 • Published 8 days ago • 2
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 7 days ago • 289
AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research Paper • 2602.06540 • Published 6 days ago • 20
MiniCPM-o & MiniCPM-V Collection Multimodal models with leading performance. • 31 items • Updated 3 days ago • 64
OpenBEATs Collection Checkpoints for the WASPAA 2025 paper "OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder" • 93 items • Updated 17 days ago • 5
Nemotron Speech Collection Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 9 items • Updated 7 days ago • 37
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 7 items • Updated 7 days ago • 134
AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation Paper • 2601.17761 • Published 18 days ago • 14
DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding Paper • 2601.23161 • Published 13 days ago • 10
Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers Paper • 2601.10770 • Published 28 days ago • 3
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning Paper • 2601.11141 • Published 27 days ago • 23
Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction Paper • 2512.14865 • Published Dec 16, 2025 • 1
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model Paper • 2506.13642 • Published Jun 16, 2025 • 27