mirror of
https://github.com/modelscope/FunASR
synced 2025-09-15 14:48:36 +08:00
22 KiB
22 KiB
Pretrained Models on ModelScope
Model License
- Apache License 2.0
Model Zoo
Here we provided several pretrained models on different datasets. The details of models and datasets can be found on ModelScope.
Speech Recognition Models
Paraformer Models
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|---|---|---|---|---|---|---|
| Paraformer-large | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Duration of input wav <= 20s |
| Paraformer-large-long | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which ould deal with arbitrary length input wav |
| Paraformer-large-contextual | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
| Paraformer | CN & EN | Alibaba Speech Data (50000hours) | 8358 | 68M | Offline | Duration of input wav <= 20s |
| Paraformer-online | CN & EN | Alibaba Speech Data (50000hours) | 8404 | 68M | Online | Which could deal with streaming input |
| Paraformer-large-online | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Online | Which could deal with streaming input |
| Paraformer-tiny | CN | Alibaba Speech Data (200hours) | 544 | 5.2M | Offline | Lightweight Paraformer model which supports Mandarin command words recognition |
| Paraformer-aishell | CN | AISHELL (178hours) | 4234 | 43M | Offline | |
| ParaformerBert-aishell | CN | AISHELL (178hours) | 4234 | 43M | Offline | |
| Paraformer-aishell2 | CN | AISHELL-2 (1000hours) | 5212 | 64M | Offline | |
| ParaformerBert-aishell2 | CN | AISHELL-2 (1000hours) | 5212 | 64M | Offline |
UniASR Models
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|---|---|---|---|---|---|---|
| UniASR | CN & EN | Alibaba Speech Data (60000 hours) | 8358 | 100M | Online | UniASR streaming offline unifying models |
| UniASR-large | CN & EN | Alibaba Speech Data (60000 hours) | 8358 | 220M | Offline | UniASR streaming offline unifying models |
| UniASR English | EN | Alibaba Speech Data (10000 hours) | 1080 | 95M | Online | UniASR streaming online unifying models |
| UniASR Russian | RU | Alibaba Speech Data (5000 hours) | 1664 | 95M | Online | UniASR streaming online unifying models |
| UniASR Japanese | JA | Alibaba Speech Data (5000 hours) | 5977 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Korean | KO | Alibaba Speech Data (2000 hours) | 6400 | 95M | Online | UniASR streaming online unifying models |
| UniASR Cantonese (CHS) | Cantonese (CHS) | Alibaba Speech Data (5000 hours) | 1468 | 95M | Online | UniASR streaming online unifying models |
| UniASR Indonesian | ID | Alibaba Speech Data (1000 hours) | 1067 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Vietnamese | VI | Alibaba Speech Data (1000 hours) | 1001 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Spanish | ES | Alibaba Speech Data (1000 hours) | 3445 | 95M | Online | UniASR streaming online unifying models |
| UniASR Portuguese | PT | Alibaba Speech Data (1000 hours) | 1617 | 95M | Online | UniASR streaming offline unifying models |
| UniASR French | FR | Alibaba Speech Data (1000 hours) | 3472 | 95M | Online | UniASR streaming online unifying models |
| UniASR German | GE | Alibaba Speech Data (1000 hours) | 3690 | 95M | Online | UniASR streaming online unifying models |
| UniASR Persian | FA | Alibaba Speech Data (1000 hours) | 1257 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Burmese | MY | Alibaba Speech Data (1000 hours) | 696 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Hebrew | HE | Alibaba Speech Data (1000 hours) | 1085 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Urdu | UR | Alibaba Speech Data (1000 hours) | 877 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Turkish | TR | Alibaba Speech Data (1000 hours) | 1582 | 95M | Online | UniASR streaming offline unifying models |
Conformer Models
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|---|---|---|---|---|---|---|
| Conformer | CN | AISHELL (178hours) | 4234 | 44M | Offline | Duration of input wav <= 20s |
| Conformer | CN | AISHELL-2 (1000hours) | 5212 | 44M | Offline | Duration of input wav <= 20s |
| Conformer | EN | Alibaba Speech Data (10000hours) | 4199 | 220M | Offline | Duration of input wav <= 20s |
RNN-T Models
Multi-talker Speech Recognition Models
MFCCA Models
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|---|---|---|---|---|---|---|
| MFCCA | CN | AliMeeting、AISHELL-4、Simudata (917hours) | 4950 | 45M | Offline | Duration of input wav <= 20s, channel of input wav <= 8 channel |
Voice Activity Detection Models
| Model Name | Training Data | Parameters | Sampling Rate | Notes |
|---|---|---|---|---|
| FSMN-VAD | Alibaba Speech Data (5000hours) | 0.4M | 16000 | |
| FSMN-VAD | Alibaba Speech Data (5000hours) | 0.4M | 8000 |
Punctuation Restoration Models
| Model Name | Training Data | Parameters | Vocab Size | Offline/Online | Notes |
|---|---|---|---|---|---|
| CT-Transformer | Alibaba Text Data | 70M | 272727 | Offline | offline punctuation model |
| CT-Transformer | Alibaba Text Data | 70M | 272727 | Online | online punctuation model |
Language Models
| Model Name | Training Data | Parameters | Vocab Size | Notes |
|---|---|---|---|---|
| Transformer | Alibaba Speech Data (?hours) | 57M | 8404 |
Speaker Verification Models
| Model Name | Training Data | Parameters | Number Speaker | Notes |
|---|---|---|---|---|
| Xvector | CNCeleb (1,200 hours) | 17.5M | 3465 | Xvector, speaker verification, Chinese |
| Xvector | CallHome (60 hours) | 61M | 6135 | Xvector, speaker verification, English |
Speaker Diarization Models
| Model Name | Training Data | Parameters | Notes |
|---|---|---|---|
| SOND | AliMeeting (120 hours) | 40.5M | Speaker diarization, profiles and records, Chinese |
| SOND | CallHome (60 hours) | 12M | Speaker diarization, profiles and records, English |
Timestamp Prediction Models
| Model Name | Language | Training Data | Parameters | Notes |
|---|---|---|---|---|
| TP-Aligner | CN | Alibaba Speech Data (50000hours) | 37.8M | Timestamp prediction, Mandarin, middle size |
Inverse Text Normalization (ITN) Models
| Model Name | Language | Parameters | Notes |
|---|---|---|---|
| English | EN | 1.54M | ITN, ASR post-processing |
| Russian | RU | 17.79M | ITN, ASR post-processing |
| Japanese | JA | 6.8M | ITN, ASR post-processing |
| Korean | KO | 1.28M | ITN, ASR post-processing |
| Indonesian | ID | 2.06M | ITN, ASR post-processing |
| Vietnamese | VI | 0.92M | ITN, ASR post-processing |
| Tagalog | TL | 0.65M | ITN, ASR post-processing |
| Spanish | ES | 1.32M | ITN, ASR post-processing |
| Portuguese | PT | 1.28M | ITN, ASR post-processing |
| French | FR | 4.39M | ITN, ASR post-processing |
| German | GE | 3.95M | ITN, ASR post-processing |