mirror of
https://github.com/modelscope/FunASR
synced 2025-09-15 14:48:36 +08:00
docs
This commit is contained in:
parent
9777fdec39
commit
a323aa9385
@ -1 +0,0 @@
|
||||
../funasr/export/README.md
|
||||
@ -1 +0,0 @@
|
||||
../funasr/runtime/python/grpc/Readme.md
|
||||
94
docs/huggingface_models.md
Normal file
94
docs/huggingface_models.md
Normal file
@ -0,0 +1,94 @@
|
||||
# Pretrained Models on Huggingface
|
||||
|
||||
## Model License
|
||||
- Apache License 2.0
|
||||
|
||||
## Model Zoo
|
||||
Here we provided several pretrained models on different datasets. The details of models and datasets can be found on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition).
|
||||
|
||||
### Speech Recognition Models
|
||||
#### Paraformer Models
|
||||
|
||||
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|
||||
|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Duration of input wav <= 20s |
|
||||
| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which ould deal with arbitrary length input wav |
|
||||
| [paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
|
||||
| [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) | CN & EN | Alibaba Speech Data (50000hours) | 8358 | 68M | Offline | Duration of input wav <= 20s |
|
||||
| [Paraformer-online](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) | CN & EN | Alibaba Speech Data (50000hours) | 8404 | 68M | Online | Which could deal with streaming input |
|
||||
| [Paraformer-tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary) | CN | Alibaba Speech Data (200hours) | 544 | 5.2M | Offline | Lightweight Paraformer model which supports Mandarin command words recognition |
|
||||
| [Paraformer-aishell](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-aishell1-pytorch/summary) | CN | AISHELL (178hours) | 4234 | 43M | Offline | |
|
||||
| [ParaformerBert-aishell](https://modelscope.cn/models/damo/speech_paraformerbert_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/summary) | CN | AISHELL (178hours) | 4234 | 43M | Offline | |
|
||||
| [Paraformer-aishell2](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/summary) | CN | AISHELL-2 (1000hours) | 5212 | 64M | Offline | |
|
||||
| [ParaformerBert-aishell2](https://www.modelscope.cn/models/damo/speech_paraformerbert_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/summary) | CN | AISHELL-2 (1000hours) | 5212 | 64M | Offline | |
|
||||
|
||||
|
||||
#### UniASR Models
|
||||
|
||||
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|
||||
|:--------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [UniASR](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8358 | 100M | Online | UniASR streaming offline unifying models |
|
||||
| [UniASR-large](https://modelscope.cn/models/damo/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8358 | 220M | Offline | UniASR streaming offline unifying models |
|
||||
| [UniASR Burmese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/summary) | Burmese | Alibaba Speech Data (? hours) | 696 | 95M | Online | UniASR streaming offline unifying models |
|
||||
| [UniASR Hebrew](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/summary) | Hebrew | Alibaba Speech Data (? hours) | 1085 | 95M | Online | UniASR streaming offline unifying models |
|
||||
| [UniASR Urdu](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/summary) | Urdu | Alibaba Speech Data (? hours) | 877 | 95M | Online | UniASR streaming offline unifying models |
|
||||
|
||||
#### Conformer Models
|
||||
|
||||
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|
||||
|:----------------------------------------------------------------------------------------------------------------------:|:--------:|:---------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [Conformer](https://modelscope.cn/models/damo/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/summary) | CN | AISHELL (178hours) | 4234 | 44M | Offline | Duration of input wav <= 20s |
|
||||
| [Conformer](https://www.modelscope.cn/models/damo/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/summary) | CN | AISHELL-2 (1000hours) | 5212 | 44M | Offline | Duration of input wav <= 20s |
|
||||
|
||||
|
||||
#### RNN-T Models
|
||||
|
||||
### Multi-talker Speech Recognition Models
|
||||
|
||||
#### MFCCA Models
|
||||
|
||||
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|
||||
|:-------------------------------------------------------------------------------------------------------------:|:--------:|:------------------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [MFCCA](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary) | CN | AliMeeting、AISHELL-4、Simudata (917hours) | 4950 | 45M | Offline | Duration of input wav <= 20s, channel of input wav <= 8 channel |
|
||||
|
||||
|
||||
|
||||
### Voice Activity Detection Models
|
||||
|
||||
| Model Name | Training Data | Parameters | Sampling Rate | Notes |
|
||||
|:----------------------------------------------------------------------------------------------:|:----------------------------:|:----------:|:-------------:|:------|
|
||||
| [FSMN-VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) | Alibaba Speech Data (5000hours) | 0.4M | 16000 | |
|
||||
| [FSMN-VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary) | Alibaba Speech Data (5000hours) | 0.4M | 8000 | |
|
||||
|
||||
### Punctuation Restoration Models
|
||||
|
||||
| Model Name | Training Data | Parameters | Vocab Size| Offline/Online | Notes |
|
||||
|:--------------------------------------------------------------------------------------------------------------------------:|:----------------------------:|:----------:|:----------:|:--------------:|:------|
|
||||
| [CT-Transformer](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary) | Alibaba Text Data | 70M | 272727 | Offline | offline punctuation model |
|
||||
| [CT-Transformer](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727/summary) | Alibaba Text Data | 70M | 272727 | Online | online punctuation model |
|
||||
|
||||
### Language Models
|
||||
|
||||
| Model Name | Training Data | Parameters | Vocab Size | Notes |
|
||||
|:----------------------------------------------------------------------------------------------------------------------:|:----------------------------:|:----------:|:----------:|:------|
|
||||
| [Transformer](https://www.modelscope.cn/models/damo/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/summary) | Alibaba Speech Data (?hours) | 57M | 8404 | |
|
||||
|
||||
### Speaker Verification Models
|
||||
|
||||
| Model Name | Training Data | Parameters | Number Speaker | Notes |
|
||||
|:-------------------------------------------------------------------------------------------------------------:|:-----------------:|:----------:|:----------:|:------|
|
||||
| [Xvector](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary) | CNCeleb (1,200 hours) | 17.5M | 3465 | Xvector, speaker verification, Chinese |
|
||||
| [Xvector](https://www.modelscope.cn/models/damo/speech_xvector_sv-en-us-callhome-8k-spk6135-pytorch/summary) | CallHome (60 hours) | 61M | 6135 | Xvector, speaker verification, English |
|
||||
|
||||
### Speaker diarization Models
|
||||
|
||||
| Model Name | Training Data | Parameters | Notes |
|
||||
|:----------------------------------------------------------------------------------------------------------------:|:-------------------:|:----------:|:------|
|
||||
| [SOND](https://www.modelscope.cn/models/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch/summary) | AliMeeting (120 hours) | 40.5M | Speaker diarization, profiles and records, Chinese |
|
||||
| [SOND](https://www.modelscope.cn/models/damo/speech_diarization_sond-en-us-callhome-8k-n16k4-pytorch/summary) | CallHome (60 hours) | 12M | Speaker diarization, profiles and records, English |
|
||||
|
||||
### Timestamp Prediction Models
|
||||
|
||||
| Model Name | Language | Training Data | Parameters | Notes |
|
||||
|:--------------------------------------------------------------------------------------------------:|:--------------:|:-------------------:|:----------:|:------|
|
||||
| [TP-Aligner](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) | CN | Alibaba Speech Data (50000hours) | 37.8M | Timestamp prediction, Mandarin, middle size |
|
||||
@ -19,10 +19,10 @@ FunASR hopes to build a bridge between academic research and industrial applicat
|
||||
:maxdepth: 1
|
||||
:caption: Recipe
|
||||
|
||||
./asr_recipe.md
|
||||
./sv_recipe.md
|
||||
./punc_recipe.md
|
||||
./vad_recipe.md
|
||||
./recipe/asr_recipe.md
|
||||
./recipe/sv_recipe.md
|
||||
./recipe/punc_recipe.md
|
||||
./recipe/vad_recipe.md
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
@ -34,22 +34,30 @@ FunASR hopes to build a bridge between academic research and industrial applicat
|
||||
:maxdepth: 1
|
||||
:caption: Runtime
|
||||
|
||||
./export.md
|
||||
./onnxruntime_python.md
|
||||
./onnxruntime_cpp.md
|
||||
./libtorch_python.md
|
||||
./grpc_python.md
|
||||
./grpc_cpp.md
|
||||
./websocket_python.md
|
||||
./runtime/export.md
|
||||
./runtime/onnxruntime_python.md
|
||||
./runtime/onnxruntime_cpp.md
|
||||
./runtime/libtorch_python.md
|
||||
./runtime/grpc_python.md
|
||||
./runtime/grpc_cpp.md
|
||||
./runtime/websocket_python.md
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: Model Zoo
|
||||
|
||||
./modelscope_models.md
|
||||
./huggingface_models.md
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: ModelScope pipeline
|
||||
|
||||
./modelscope_models.md
|
||||
./modelscope_usages.md
|
||||
./modescope_pipeline/quick_start.md
|
||||
./modescope_pipeline/asr_pipeline.md
|
||||
./modescope_pipeline/vad_pipeline.md
|
||||
./modescope_pipeline/punc_pipeline.md
|
||||
./modescope_pipeline/sv_pipeline.md
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
# Pretrained models
|
||||
# Pretrained Models on ModelScope
|
||||
|
||||
## Model License
|
||||
- Apache License 2.0
|
||||
|
||||
14
docs/modescope_pipeline/asr_pipeline.md
Normal file
14
docs/modescope_pipeline/asr_pipeline.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Speech Recognition
|
||||
|
||||
## Inference with pipeline
|
||||
### Quick start
|
||||
#### Inference with you data
|
||||
#### Inference with multi-threads on CPU
|
||||
#### Inference with multi GPU
|
||||
|
||||
## Finetune with pipeline
|
||||
### Quick start
|
||||
### Finetune with your data
|
||||
|
||||
## Inference with your finetuned model
|
||||
|
||||
14
docs/modescope_pipeline/lm_pipeline.md
Normal file
14
docs/modescope_pipeline/lm_pipeline.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Speech Recognition
|
||||
|
||||
## Inference with pipeline
|
||||
### Quick start
|
||||
#### Inference with you data
|
||||
#### Inference with multi-threads on CPU
|
||||
#### Inference with multi GPU
|
||||
|
||||
## Finetune with pipeline
|
||||
### Quick start
|
||||
### Finetune with your data
|
||||
|
||||
## Inference with your finetuned model
|
||||
|
||||
14
docs/modescope_pipeline/punc_pipeline.md
Normal file
14
docs/modescope_pipeline/punc_pipeline.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Speech Recognition
|
||||
|
||||
## Inference with pipeline
|
||||
### Quick start
|
||||
#### Inference with you data
|
||||
#### Inference with multi-threads on CPU
|
||||
#### Inference with multi GPU
|
||||
|
||||
## Finetune with pipeline
|
||||
### Quick start
|
||||
### Finetune with your data
|
||||
|
||||
## Inference with your finetuned model
|
||||
|
||||
6
docs/modescope_pipeline/quick_start.md
Normal file
6
docs/modescope_pipeline/quick_start.md
Normal file
@ -0,0 +1,6 @@
|
||||
# Speech Recognition
|
||||
|
||||
## Inference with pipeline
|
||||
|
||||
|
||||
## Finetune with pipeline
|
||||
14
docs/modescope_pipeline/sv_pipeline.md
Normal file
14
docs/modescope_pipeline/sv_pipeline.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Speech Recognition
|
||||
|
||||
## Inference with pipeline
|
||||
### Quick start
|
||||
#### Inference with you data
|
||||
#### Inference with multi-threads on CPU
|
||||
#### Inference with multi GPU
|
||||
|
||||
## Finetune with pipeline
|
||||
### Quick start
|
||||
### Finetune with your data
|
||||
|
||||
## Inference with your finetuned model
|
||||
|
||||
14
docs/modescope_pipeline/tp_pipeline.md
Normal file
14
docs/modescope_pipeline/tp_pipeline.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Speech Recognition
|
||||
|
||||
## Inference with pipeline
|
||||
### Quick start
|
||||
#### Inference with you data
|
||||
#### Inference with multi-threads on CPU
|
||||
#### Inference with multi GPU
|
||||
|
||||
## Finetune with pipeline
|
||||
### Quick start
|
||||
### Finetune with your data
|
||||
|
||||
## Inference with your finetuned model
|
||||
|
||||
14
docs/modescope_pipeline/vad_pipeline.md
Normal file
14
docs/modescope_pipeline/vad_pipeline.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Speech Recognition
|
||||
|
||||
## Inference with pipeline
|
||||
### Quick start
|
||||
#### Inference with you data
|
||||
#### Inference with multi-threads on CPU
|
||||
#### Inference with multi GPU
|
||||
|
||||
## Finetune with pipeline
|
||||
### Quick start
|
||||
### Finetune with your data
|
||||
|
||||
## Inference with your finetuned model
|
||||
|
||||
@ -1 +0,0 @@
|
||||
../funasr/runtime/onnxruntime/readme.md
|
||||
129
docs/recipe/lm_recipe.md
Normal file
129
docs/recipe/lm_recipe.md
Normal file
@ -0,0 +1,129 @@
|
||||
# Speech Recognition
|
||||
Here we take "Training a paraformer model from scratch using the AISHELL-1 dataset" as an example to introduce how to use FunASR. According to this example, users can similarly employ other datasets (such as AISHELL-2 dataset, etc.) to train other models (such as conformer, transformer, etc.).
|
||||
|
||||
## Overall Introduction
|
||||
We provide a recipe `egs/aishell/paraformer/run.sh` for training a paraformer model on AISHELL-1 dataset. This recipe consists of five stages, supporting training on multiple GPUs and decoding by CPU or GPU. Before introducing each stage in detail, we first explain several parameters which should be set by users.
|
||||
- `CUDA_VISIBLE_DEVICES`: visible gpu list
|
||||
- `gpu_num`: the number of GPUs used for training
|
||||
- `gpu_inference`: whether to use GPUs for decoding
|
||||
- `njob`: for CPU decoding, indicating the total number of CPU jobs; for GPU decoding, indicating the number of jobs on each GPU
|
||||
- `data_aishell`: the raw path of AISHELL-1 dataset
|
||||
- `feats_dir`: the path for saving processed data
|
||||
- `nj`: the number of jobs for data preparation
|
||||
- `speed_perturb`: the range of speech perturbed
|
||||
- `exp_dir`: the path for saving experimental results
|
||||
- `tag`: the suffix of experimental result directory
|
||||
|
||||
## Stage 0: Data preparation
|
||||
This stage processes raw AISHELL-1 dataset `$data_aishell` and generates the corresponding `wav.scp` and `text` in `$feats_dir/data/xxx`. `xxx` means `train/dev/test`. Here we assume users have already downloaded AISHELL-1 dataset. If not, users can download data [here](https://www.openslr.org/33/) and set the path for `$data_aishell`. The examples of `wav.scp` and `text` are as follows:
|
||||
* `wav.scp`
|
||||
```
|
||||
BAC009S0002W0122 /nfs/ASR_DATA/AISHELL-1/data_aishell/wav/train/S0002/BAC009S0002W0122.wav
|
||||
BAC009S0002W0123 /nfs/ASR_DATA/AISHELL-1/data_aishell/wav/train/S0002/BAC009S0002W0123.wav
|
||||
BAC009S0002W0124 /nfs/ASR_DATA/AISHELL-1/data_aishell/wav/train/S0002/BAC009S0002W0124.wav
|
||||
...
|
||||
```
|
||||
* `text`
|
||||
```
|
||||
BAC009S0002W0122 而 对 楼 市 成 交 抑 制 作 用 最 大 的 限 购
|
||||
BAC009S0002W0123 也 成 为 地 方 政 府 的 眼 中 钉
|
||||
BAC009S0002W0124 自 六 月 底 呼 和 浩 特 市 率 先 宣 布 取 消 限 购 后
|
||||
...
|
||||
```
|
||||
These two files both have two columns, while the first column is wav ids and the second column is the corresponding wav paths/label tokens.
|
||||
|
||||
## Stage 1: Feature Generation
|
||||
This stage extracts FBank features from `wav.scp` and apply speed perturbation as data augmentation according to `speed_perturb`. Users can set `nj` to control the number of jobs for feature generation. The generated features are saved in `$feats_dir/dump/xxx/ark` and the corresponding `feats.scp` files are saved as `$feats_dir/dump/xxx/feats.scp`. An example of `feats.scp` can be seen as follows:
|
||||
* `feats.scp`
|
||||
```
|
||||
...
|
||||
BAC009S0002W0122_sp0.9 /nfs/funasr_data/aishell-1/dump/fbank/train/ark/feats.16.ark:592751055
|
||||
...
|
||||
```
|
||||
Note that samples in this file have already been shuffled randomly. This file contains two columns. The first column is wav ids while the second column is kaldi-ark feature paths. Besides, `speech_shape` and `text_shape` are also generated in this stage, denoting the speech feature shape and text length of each sample. The examples are shown as follows:
|
||||
* `speech_shape`
|
||||
```
|
||||
...
|
||||
BAC009S0002W0122_sp0.9 665,80
|
||||
...
|
||||
```
|
||||
* `text_shape`
|
||||
```
|
||||
...
|
||||
BAC009S0002W0122_sp0.9 15
|
||||
...
|
||||
```
|
||||
These two files have two columns. The first column is wav ids and the second column is the corresponding speech feature shape and text length.
|
||||
|
||||
## Stage 2: Dictionary Preparation
|
||||
This stage processes the dictionary, which is used as a mapping between label characters and integer indices during ASR training. The processed dictionary file is saved as `$feats_dir/data/$lang_toekn_list/$token_type/tokens.txt`. An example of `tokens.txt` is as follows:
|
||||
* `tokens.txt`
|
||||
```
|
||||
<blank>
|
||||
<s>
|
||||
</s>
|
||||
一
|
||||
丁
|
||||
...
|
||||
龚
|
||||
龟
|
||||
<unk>
|
||||
```
|
||||
* `<blank>`: indicates the blank token for CTC
|
||||
* `<s>`: indicates the start-of-sentence token
|
||||
* `</s>`: indicates the end-of-sentence token
|
||||
* `<unk>`: indicates the out-of-vocabulary token
|
||||
|
||||
## Stage 3: Training
|
||||
This stage achieves the training of the specified model. To start training, users should manually set `exp_dir`, `CUDA_VISIBLE_DEVICES` and `gpu_num`, which have already been explained above. By default, the best `$keep_nbest_models` checkpoints on validation dataset will be averaged to generate a better model and adopted for decoding.
|
||||
|
||||
* DDP Training
|
||||
|
||||
We support the DistributedDataParallel (DDP) training and the detail can be found [here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). To enable DDP training, please set `gpu_num` greater than 1. For example, if you set `CUDA_VISIBLE_DEVICES=0,1,5,6,7` and `gpu_num=3`, then the gpus with ids 0, 1 and 5 will be used for training.
|
||||
|
||||
* DataLoader
|
||||
|
||||
We support an optional iterable-style DataLoader based on [Pytorch Iterable-style DataPipes](https://pytorch.org/data/beta/torchdata.datapipes.iter.html) for large dataset and users can set `dataset_type=large` to enable it.
|
||||
|
||||
* Configuration
|
||||
|
||||
The parameters of the training, including model, optimization, dataset, etc., can be set by a YAML file in `conf` directory. Also, users can directly set the parameters in `run.sh` recipe. Please avoid to set the same parameters in both the YAML file and the recipe.
|
||||
|
||||
* Training Steps
|
||||
|
||||
We support two parameters to specify the training steps, namely `max_epoch` and `max_update`. `max_epoch` indicates the total training epochs while `max_update` indicates the total training steps. If these two parameters are specified at the same time, once the training reaches any one of these two parameters, the training will be stopped.
|
||||
|
||||
* Tensorboard
|
||||
|
||||
Users can use tensorboard to observe the loss, learning rate, etc. Please run the following command:
|
||||
```
|
||||
tensorboard --logdir ${exp_dir}/exp/${model_dir}/tensorboard/train
|
||||
```
|
||||
|
||||
## Stage 4: Decoding
|
||||
This stage generates the recognition results and calculates the `CER` to verify the performance of the trained model.
|
||||
|
||||
* Mode Selection
|
||||
|
||||
As we support paraformer, uniasr, conformer and other models in FunASR, a `mode` parameter should be specified as `asr/paraformer/uniasr` according to the trained model.
|
||||
|
||||
* Configuration
|
||||
|
||||
We support CTC decoding, attention decoding and hybrid CTC-attention decoding in FunASR, which can be specified by `ctc_weight` in a YAML file in `conf` directory. Specifically, `ctc_weight=1.0` indicates CTC decoding, `ctc_weight=0.0` indicates attention decoding, `0.0<ctc_weight<1.0` indicates hybrid CTC-attention decoding.
|
||||
|
||||
* CPU/GPU Decoding
|
||||
|
||||
We support CPU and GPU decoding in FunASR. For CPU decoding, you should set `gpu_inference=False` and set `njob` to specify the total number of CPU decoding jobs. For GPU decoding, you should set `gpu_inference=True`. You should also set `gpuid_list` to indicate which GPUs are used for decoding and `njobs` to indicate the number of decoding jobs on each GPU.
|
||||
|
||||
* Performance
|
||||
|
||||
We adopt `CER` to verify the performance. The results are in `$exp_dir/exp/$model_dir/$decoding_yaml_name/$average_model_name/$dset`, namely `text.cer` and `text.cer.txt`. `text.cer` saves the comparison between the recognized text and the reference text while `text.cer.txt` saves the final `CER` result. The following is an example of `text.cer`:
|
||||
* `text.cer`
|
||||
```
|
||||
...
|
||||
BAC009S0764W0213(nwords=11,cor=11,ins=0,del=0,sub=0) corr=100.00%,cer=0.00%
|
||||
ref: 构 建 良 好 的 旅 游 市 场 环 境
|
||||
res: 构 建 良 好 的 旅 游 市 场 环 境
|
||||
...
|
||||
```
|
||||
|
||||
1
docs/runtime/export.md
Symbolic link
1
docs/runtime/export.md
Symbolic link
@ -0,0 +1 @@
|
||||
../../funasr/export/README.md
|
||||
1
docs/runtime/grpc_python.md
Symbolic link
1
docs/runtime/grpc_python.md
Symbolic link
@ -0,0 +1 @@
|
||||
../../funasr/runtime/python/grpc/Readme.md
|
||||
1
docs/runtime/onnxruntime_cpp.md
Symbolic link
1
docs/runtime/onnxruntime_cpp.md
Symbolic link
@ -0,0 +1 @@
|
||||
../../funasr/runtime/onnxruntime/readme.md
|
||||
1
docs/runtime/websocket_python.md
Symbolic link
1
docs/runtime/websocket_python.md
Symbolic link
@ -0,0 +1 @@
|
||||
../../funasr/runtime/python/websocket/README.md
|
||||
@ -1 +0,0 @@
|
||||
../funasr/runtime/python/websocket/README.md
|
||||
Loading…
Reference in New Issue
Block a user