FunASR/docs/modelscope_models.md
2023-04-18 11:29:17 +08:00

13 KiB

Pretrained models

Model License

  • Apache License 2.0

Model Zoo

Here we provided several pretrained models on different datasets. The details of models and datasets can be found on ModelScope.

Speech Recognition Models

Paraformer Models

Model Name Language Training Data Vocab Size Parameter Offline/Online Notes
Paraformer-large CN & EN Alibaba Speech Data (60000hours) 8404 220M Offline Duration of input wav <= 20s
Paraformer-large-long CN & EN Alibaba Speech Data (60000hours) 8404 220M Offline Which ould deal with arbitrary length input wav
paraformer-large-contextual CN & EN Alibaba Speech Data (60000hours) 8404 220M Offline Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
Paraformer CN & EN Alibaba Speech Data (50000hours) 8358 68M Offline Duration of input wav <= 20s
Paraformer-online CN & EN Alibaba Speech Data (50000hours) 8404 68M Online Which could deal with streaming input
Paraformer-tiny CN Alibaba Speech Data (200hours) 544 5.2M Offline Lightweight Paraformer model which supports Mandarin command words recognition
Paraformer-aishell CN AISHELL (178hours) 4234 43M Offline
ParaformerBert-aishell CN AISHELL (178hours) 4234 43M Offline
Paraformer-aishell2 CN AISHELL-2 (1000hours) 5212 64M Offline
ParaformerBert-aishell2 CN AISHELL-2 (1000hours) 5212 64M Offline

UniASR Models

Model Name Language Training Data Vocab Size Parameter Offline/Online Notes
UniASR CN & EN Alibaba Speech Data (60000hours) 8358 100M Online UniASR streaming offline unifying models
UniASR-large CN & EN Alibaba Speech Data (60000hours) 8358 220M Offline UniASR streaming offline unifying models
UniASR Burmese Burmese Alibaba Speech Data (? hours) 696 95M Online UniASR streaming offline unifying models
UniASR Hebrew Hebrew Alibaba Speech Data (? hours) 1085 95M Online UniASR streaming offline unifying models
UniASR Urdu Urdu Alibaba Speech Data (? hours) 877 95M Online UniASR streaming offline unifying models

Conformer Models

Model Name Language Training Data Vocab Size Parameter Offline/Online Notes
Conformer CN AISHELL (178hours) 4234 44M Offline Duration of input wav <= 20s
Conformer CN AISHELL-2 (1000hours) 5212 44M Offline Duration of input wav <= 20s

RNN-T Models

Multi-talker Speech Recognition Models

MFCCA Models

Model Name Language Training Data Vocab Size Parameter Offline/Online Notes
MFCCA CN AliMeeting、AISHELL-4、Simudata (917hours) 4950 45M Offline Duration of input wav <= 20s, channel of input wav <= 8 channel

Voice Activity Detection Models

Model Name Training Data Parameters Sampling Rate Notes
FSMN-VAD Alibaba Speech Data (5000hours) 0.4M 16000
FSMN-VAD Alibaba Speech Data (5000hours) 0.4M 8000

Punctuation Restoration Models

Model Name Training Data Parameters Vocab Size Offline/Online Notes
CT-Transformer Alibaba Text Data 70M 272727 Offline offline punctuation model
CT-Transformer Alibaba Text Data 70M 272727 Online online punctuation model

Language Models

Model Name Training Data Parameters Vocab Size Notes
Transformer Alibaba Speech Data (?hours) 57M 8404

Speaker Verification Models

Model Name Training Data Parameters Number Speaker Notes
Xvector CNCeleb (1,200 hours) 17.5M 3465 Xvector, speaker verification, Chinese
Xvector CallHome (60 hours) 61M 6135 Xvector, speaker verification, English

Speaker diarization Models

Model Name Training Data Parameters Notes
SOND AliMeeting (120 hours) 40.5M Speaker diarization, profiles and records, Chinese
SOND CallHome (60 hours) 12M Speaker diarization, profiles and records, English