File size: 3,973 Bytes
1daae69 69d4636 1daae69 0c76194 006c85b 0c76194 006c85b 0c76194 93b882a 0c76194 93b882a 0c76194 93b882a 0c76194 e8c05f5 5b47952 e8c05f5 0c76194 3a5e5cd 0c76194 93b882a 0c76194 3a5e5cd 93b882a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
license: openrail
datasets:
- ARTPARK-IISc/Vaani
- ai4bharat/Kathbath
- ai4bharat/Shrutilipi
language:
- hi
- en
metrics:
- accuracy
base_model:
- openai/whisper-medium
pipeline_tag: automatic-speech-recognition
tags:
- Hinglish
- Codeswitching
- whisper
- Speech-to-text
- Indic
- STT
---
# Shunya Labs Hinglish ASR Model
We wanted to make ASR that could intuitively capture how conversational Hindi is actually spoken. On average, every 2 words out of 10 spoken in conversational Hindi are in English. Traditional ARS models are trained to handle one language at a time, which makes them too slow and inaccurate when transcribing multilingual speech.
And so we innovated. We trained Zero STT Codeswitch to natively process Hinglish speech and generate mixed-script tokens.
This is a model worthy of how India actually speaks, because it can capture the way people naturally switch between Hindi and English mid-conversation.
And now, we're making the lighter version of Zero STT Codeswitch open source for the community!
For a faster version of Zero STT Codeswitch, visit shunyalabs.ai.
## Model Details
Base Model: OpenAI Whisper Medium
Post-trained by: Shunya Labs
Language: Hinglish (Hindi-English code-switching)
## Why This Model?
Standard ASR models treat Hindi and English as separate languages, forcing transcription into one or the other. This creates errors when speakers naturally switch between languages mid-sentence—which is how millions of people actually talk.
This model was trained specifically on code-switched speech, so it:
- Transcribes Hindi and English tokens as they naturally occur
- Handles mid-sentence language switches accurately
- Produces faster inference by avoiding language detection overhead
- Delivers higher accuracy on real-world Hinglish speech
### Demo
<!-- Provide the basic links for the model. -->
- Try the model at: https://www.shunyalabs.ai/zero-code-switch
## Transcription Comparison
| Audio | Zero STT Codeswitch | Whisper Medium |
|-------|-------------------|----------------|
| <audio controls src="https://huggingface.co/shunyalabs/zero-stt-hinglish/resolve/main/FLEURS_sample.wav"></audio> | Rome में अलग अलग जगों पर कई बढ़े television screens लगाए गये ग ताकि लोग समारो देख सकें | रोम में अलग अलग जगहों पर कई बड़े टेलिवीजन स्क्रीन लगाए गए ताकि लोग स्मारो देख सकें |
| <audio controls src="https://huggingface.co/shunyalabs/zero-stt-hinglish/resolve/main/Vaani_random_sample_06.wav"></audio> | और बागल में एक building है लाल कलर का पिंट किया हुआ | और बगल में एक बिल्डिंग है लाल कलर का पेंट किया हुआ है |
| <audio controls src="https://huggingface.co/shunyalabs/zero-stt-hinglish/resolve/main/Vaani_random_sample_10.wav"></audio> | yoga med पर yoga कर रहे हैं | योगा मैट पर योगा कर रहे हैं |
### Use Cases
- Transcription of Hinglish conversations, podcasts, and videos
- Customer support and conversational agents serving Indian users
- Meeting transcription for Indian workplaces
- Content creation and subtitling
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="shunya-labs/hinglish-whisper-medium")
result = transcriber("audio.mp3")
print(result["text"])
```
## Training Details
[Openai/whisper-medium](https://huggingface.co/openai/whisper-medium) post-trained on Google Vaani as well as proprietary datasets.
For a faster version of Zero STT Codeswitch, vistit shunyalabs.ai
|