File size: 4,089 Bytes
00858ee
 
0f2501d
00858ee
 
00d7a2d
 
 
 
12c7cf2
00858ee
00d7a2d
00858ee
 
00d7a2d
 
 
00858ee
 
 
 
014eaa7
a32593a
00858ee
00d7a2d
 
00858ee
00d7a2d
00858ee
 
a32593a
00d7a2d
 
 
 
 
 
 
 
 
 
 
9bb2aa8
 
 
00858ee
0990228
 
 
 
62345c2
0990228
 
62345c2
0990228
3b395fb
 
 
4ce6ba7
3d29259
f4d458d
 
3d29259
f4d458d
3d29259
 
 
f4d458d
 
3d29259
 
f4d458d
3d29259
 
f4d458d
 
 
3d29259
f4d458d
3b395fb
4ce6ba7
3b395fb
0990228
 
00858ee
3b395fb
 
0990228
 
3b395fb
 
 
0990228
3b395fb
0990228
3b395fb
 
 
 
 
 
 
0990228
 
77960a0
3b395fb
0990228
3b395fb
0990228
 
77960a0
 
 
 
ad83752
 
 
 
 
 
 
 
 
 
 
 
 
 
77960a0
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3-turbo
tags:
- automatic-speech-recognition
- whisper
- urdu
- mozilla-foundation/common_voice_17_0
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
- cer
- bleu
- chrf
model-index:
- name: whisper-large-v3-turbo-urdu
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: Common Voice 17.0 (Urdu)
      type: mozilla-foundation/common_voice_17_0
      config: ur
      split: test
      args: ur
    metrics:
    - type: wer
      value: 26.234
      name: WER
    - type: cer
      value: 8.795
      name: CER
    - type: bleu
      value: 58.032
      name: BLEU
    - type: chrf
      value: 81.636
      name: ChrF
language:
- ur
pipeline_tag: automatic-speech-recognition
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper large V3 Turbo Urdu ASR Model 🥇

This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) on the common_voice_17_0 dataset.

It achieves the following results on the evaluation set:
- Loss: 0.3534
- Wer: 25.7842


## Quick Usage

```python
from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-large-v3-turbo-urdu"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)
```

```sh
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}
```


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1500

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Wer     |
|:-------------:|:------:|:----:|:---------------:|:-------:|
| 0.6764        | 0.2545 | 300  | 0.6244          | 44.9776 |
| 0.5881        | 0.5089 | 600  | 0.5089          | 37.6214 |
| 0.4662        | 0.7634 | 900  | 0.4349          | 32.1322 |
| 0.3661        | 1.0178 | 1200 | 0.3634          | 26.5683 |
| 0.2293        | 1.2723 | 1500 | 0.3534          | 25.7842 |



### Framework versions

- Transformers 4.53.1
- Pytorch 2.8.0.dev20250319+cu128
- Datasets 3.6.0
- Tokenizers 0.21.2

---

## Evaluation

Urdu ASR Evaluation on Common Voice 17.0 (Test Split). 

| Metric | Value    | Description                        |
|--------|----------|------------------------------------|
| **WER**   | 26.234%  | Word Error Rate (lower is better) |
| **CER**   | 8.795%   | Character Error Rate              |
| **BLEU**  | 58.032%  | BLEU Score (higher is better)     |
| **ChrF**  | 81.636   | Character n-gram F-score          |

>👉 Review the testing script: [Testing Whisper Large V3 Turbo Urdu](https://www.kaggle.com/code/kingabzpro/testing-whisper-large-v3-turbo-urdu?scriptVersionId=249057976)


### Summary

The Word Error Rate (WER) of 26.23% is respectable, indicating that roughly three out of every four words are transcribed correctly. While there is room for improvement, this is a functional level of accuracy.

The model excels at the character level, with a low Character Error Rate (CER) of 8.80% and a very high ChrF score of 81.64. This shows it accurately captures the phonetic and structural details of the language. The high BLEU score further confirms that the generated transcriptions are coherent and closely match the reference text.

In summary, this is a high-performing and reliable ASR system that produces largely accurate and intelligible transcriptions.