File size: 3,973 Bytes

---
license: openrail
datasets:
- ARTPARK-IISc/Vaani
- ai4bharat/Kathbath
- ai4bharat/Shrutilipi
language:
- hi
- en
metrics:
- accuracy
base_model:
- openai/whisper-medium
pipeline_tag: automatic-speech-recognition
tags:
- Hinglish
- Codeswitching
- whisper
- Speech-to-text
- Indic
- STT
---
# Shunya Labs Hinglish ASR Model

We wanted to make ASR that could intuitively capture how conversational Hindi is actually spoken. On average, every 2 words out of 10 spoken in conversational Hindi are in English. Traditional ARS models are trained to handle one language at a time, which makes them too slow and inaccurate when transcribing multilingual speech. 

And so we innovated. We trained Zero STT Codeswitch to natively process Hinglish speech and generate mixed-script tokens. 

This is a model worthy of how India actually speaks, because it can capture the way people naturally switch between Hindi and English mid-conversation. 

And now, we're making the lighter version of Zero STT Codeswitch open source for the community!

For a faster version of Zero STT Codeswitch, visit shunyalabs.ai. 

## Model Details

Base Model: OpenAI Whisper Medium

Post-trained by: Shunya Labs

Language: Hinglish (Hindi-English code-switching)

## Why This Model?

Standard ASR models treat Hindi and English as separate languages, forcing transcription into one or the other. This creates errors when speakers naturally switch between languages mid-sentence—which is how millions of people actually talk.
This model was trained specifically on code-switched speech, so it:

- Transcribes Hindi and English tokens as they naturally occur
- Handles mid-sentence language switches accurately
- Produces faster inference by avoiding language detection overhead
- Delivers higher accuracy on real-world Hinglish speech

### Demo

<!-- Provide the basic links for the model. -->

- Try the model at: https://www.shunyalabs.ai/zero-code-switch

## Transcription Comparison
   
   | Audio | Zero STT Codeswitch | Whisper Medium |
   |-------|-------------------|----------------|
   | <audio controls src="https://huggingface.co/shunyalabs/zero-stt-hinglish/resolve/main/FLEURS_sample.wav"></audio> | Rome में अलग अलग जगों पर कई बढ़े television screens लगाए गये ग ताकि लोग समारो देख सकें | रोम में अलग अलग जगहों पर कई बड़े टेलिवीजन स्क्रीन लगाए गए ताकि लोग स्मारो देख सकें |
   | <audio controls src="https://huggingface.co/shunyalabs/zero-stt-hinglish/resolve/main/Vaani_random_sample_06.wav"></audio> | और बागल में एक building है लाल कलर का पिंट किया हुआ | और बगल में एक बिल्डिंग है लाल कलर का पेंट किया हुआ है |
   | <audio controls src="https://huggingface.co/shunyalabs/zero-stt-hinglish/resolve/main/Vaani_random_sample_10.wav"></audio> | yoga med पर yoga कर रहे हैं | योगा मैट पर योगा कर रहे हैं |


### Use Cases

- Transcription of Hinglish conversations, podcasts, and videos
- Customer support and conversational agents serving Indian users 
- Meeting transcription for Indian workplaces
- Content creation and subtitling


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="shunya-labs/hinglish-whisper-medium")
result = transcriber("audio.mp3")
print(result["text"])
```

## Training Details

[Openai/whisper-medium](https://huggingface.co/openai/whisper-medium) post-trained on Google Vaani as well as proprietary datasets.

For a faster version of Zero STT Codeswitch, vistit shunyalabs.ai