Yifan Peng's picture

Yifan Peng

pyf98

·

https://pyf98.github.io

pyf98

AI & ML interests

Multimodal LLMs, Speech-to-Speech, Speech Recognition

Organizations

authored 5 papers 7 months ago

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Paper • 2111.14706 • Published Nov 29, 2021

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

Paper • 2406.09282 • Published Jun 13, 2024

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

Paper • 2502.10373 • Published Feb 14, 2025 • 1

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Paper • 2505.13404 • Published May 19, 2025 • 2

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Paper • 2506.00338 • Published May 31, 2025 • 10

authored 2 papers over 1 year ago

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Paper • 2409.09506 • Published Sep 14, 2024 • 4

Towards Robust Speech Representation Learning for Thousands of Languages

Paper • 2407.00837 • Published Jun 30, 2024 • 11

authored 10 papers almost 2 years ago

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Paper • 2402.12654 • Published Feb 20, 2024 • 1

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Paper • 2210.00077 • Published Sep 30, 2022 • 2

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

Paper • 2305.17651 • Published May 28, 2023 • 1

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

Paper • 2309.15317 • Published Sep 26, 2023

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Paper • 2309.13876 • Published Sep 25, 2023 • 1

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Paper • 2303.07624 • Published Mar 14, 2023 • 1

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

Paper • 2309.07937 • Published Sep 14, 2023

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

Paper • 2302.12829 • Published Feb 24, 2023

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Paper • 2207.02971 • Published Jul 6, 2022

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Paper • 2401.16658 • Published Jan 30, 2024 • 14