mirror of
https://github.com/modelscope/FunASR
synced 2025-09-15 14:48:36 +08:00
13 KiB
13 KiB
Pretrained models
Model License
- Apache License 2.0
Model Zoo
Here we provided several pretrained models on different datasets. The details of models and datasets can be found on ModelScope.
Speech Recognition Models
Paraformer Models
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|---|---|---|---|---|---|---|
| Paraformer-large | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Duration of input wav <= 20s |
| Paraformer-large-long | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which ould deal with arbitrary length input wav |
| paraformer-large-contextual | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
| Paraformer | CN & EN | Alibaba Speech Data (50000hours) | 8358 | 68M | Offline | Duration of input wav <= 20s |
| Paraformer-online | CN & EN | Alibaba Speech Data (50000hours) | 8404 | 68M | Online | Which could deal with streaming input |
| Paraformer-tiny | CN | Alibaba Speech Data (200hours) | 544 | 5.2M | Offline | Lightweight Paraformer model which supports Mandarin command words recognition |
| Paraformer-aishell | CN | AISHELL (178hours) | 4234 | 43M | Offline | |
| ParaformerBert-aishell | CN | AISHELL (178hours) | 4234 | 43M | Offline | |
| Paraformer-aishell2 | CN | AISHELL-2 (1000hours) | 5212 | 64M | Offline | |
| ParaformerBert-aishell2 | CN | AISHELL-2 (1000hours) | 5212 | 64M | Offline |
UniASR Models
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|---|---|---|---|---|---|---|
| UniASR | CN & EN | Alibaba Speech Data (60000hours) | 8358 | 100M | Online | UniASR streaming offline unifying models |
| UniASR-large | CN & EN | Alibaba Speech Data (60000hours) | 8358 | 220M | Offline | UniASR streaming offline unifying models |
| UniASR Burmese | Burmese | Alibaba Speech Data (? hours) | 696 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Hebrew | Hebrew | Alibaba Speech Data (? hours) | 1085 | 95M | Online | UniASR streaming offline unifying models |
| UniASR Urdu | Urdu | Alibaba Speech Data (? hours) | 877 | 95M | Online | UniASR streaming offline unifying models |
Conformer Models
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|---|---|---|---|---|---|---|
| Conformer | CN | AISHELL (178hours) | 4234 | 44M | Offline | Duration of input wav <= 20s |
| Conformer | CN | AISHELL-2 (1000hours) | 5212 | 44M | Offline | Duration of input wav <= 20s |
RNN-T Models
Multi-talker Speech Recognition Models
MFCCA Models
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|---|---|---|---|---|---|---|
| MFCCA | CN | AliMeeting、AISHELL-4、Simudata (917hours) | 4950 | 45M | Offline | Duration of input wav <= 20s, channel of input wav <= 8 channel |
Voice Activity Detection Models
| Model Name | Training Data | Parameters | Sampling Rate | Notes |
|---|---|---|---|---|
| FSMN-VAD | Alibaba Speech Data (5000hours) | 0.4M | 16000 | |
| FSMN-VAD | Alibaba Speech Data (5000hours) | 0.4M | 8000 |
Punctuation Restoration Models
| Model Name | Training Data | Parameters | Vocab Size | Offline/Online | Notes |
|---|---|---|---|---|---|
| CT-Transformer | Alibaba Text Data | 70M | 272727 | Offline | offline punctuation model |
| CT-Transformer | Alibaba Text Data | 70M | 272727 | Online | online punctuation model |
Language Models
| Model Name | Training Data | Parameters | Vocab Size | Notes |
|---|---|---|---|---|
| Transformer | Alibaba Speech Data (?hours) | 57M | 8404 |
Speaker Verification Models
| Model Name | Training Data | Parameters | Number Speaker | Notes |
|---|---|---|---|---|
| Xvector | CNCeleb (1,200 hours) | 17.5M | 3465 | Xvector, speaker verification, Chinese |
| Xvector | CallHome (60 hours) | 61M | 6135 | Xvector, speaker verification, English |
Speaker diarization Models
| Model Name | Training Data | Parameters | Notes |
|---|---|---|---|
| SOND | AliMeeting (120 hours) | 40.5M | Speaker diarization, profiles and records, Chinese |
| SOND | CallHome (60 hours) | 12M | Speaker diarization, profiles and records, English |