ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet Paper β’ 2111.14706 β’ Published Nov 29, 2021
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models Paper β’ 2406.09282 β’ Published Jun 13, 2024
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models Paper β’ 2502.10373 β’ Published Feb 14, 2025 β’ 1
Granary: Speech Recognition and Translation Dataset in 25 European Languages Paper β’ 2505.13404 β’ Published May 19, 2025 β’ 2
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning Paper β’ 2506.00338 β’ Published May 31, 2025 β’ 10
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Paper β’ 2409.09506 β’ Published Sep 14, 2024 β’ 4
Towards Robust Speech Representation Learning for Thousands of Languages Paper β’ 2407.00837 β’ Published Jun 30, 2024 β’ 11
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification Paper β’ 2402.12654 β’ Published Feb 20, 2024 β’ 1
E-Branchformer: Branchformer with Enhanced merging for speech recognition Paper β’ 2210.00077 β’ Published Sep 30, 2022 β’ 2
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models Paper β’ 2305.17651 β’ Published May 28, 2023 β’ 1
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning Paper β’ 2309.15317 β’ Published Sep 26, 2023
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data Paper β’ 2309.13876 β’ Published Sep 25, 2023 β’ 1
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition Paper β’ 2303.07624 β’ Published Mar 14, 2023 β’ 1
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks Paper β’ 2309.07937 β’ Published Sep 14, 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives Paper β’ 2302.12829 β’ Published Feb 24, 2023
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding Paper β’ 2207.02971 β’ Published Jul 6, 2022
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer Paper β’ 2401.16658 β’ Published Jan 30, 2024 β’ 14