Merge branch 'alibaba-damo-academy:main' into main

This commit is contained in:
Daniel 2023-04-23 08:56:31 +08:00 committed by GitHub
commit 5b2b979634
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 81 additions and 52 deletions

View File

@ -9,11 +9,9 @@ Here we provided several pretrained models on different datasets. The details of
### Speech Recognition Models
#### Paraformer Models
[//]: # (| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |)
[//]: # (|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|)
[//]: # (| [Paraformer-large]&#40;https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary&#41; | CN & EN | Alibaba Speech Data &#40;60000hours&#41; | 8404 | 220M | Offline | Duration of input wav <= 20s |)
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|:-----------------------------------------------------------------------:|:--------:|:----------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
| [Paraformer-large](https://huggingface.co/funasr/paraformer-large) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Duration of input wav <= 20s |
[//]: # (| [Paraformer-large-long]&#40;https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary&#41; | CN & EN | Alibaba Speech Data &#40;60000hours&#41; | 8404 | 220M | Offline | Which ould deal with arbitrary length input wav |)
@ -77,21 +75,17 @@ Here we provided several pretrained models on different datasets. The details of
### Voice Activity Detection Models
[//]: # (| Model Name | Training Data | Parameters | Sampling Rate | Notes |)
[//]: # (|:----------------------------------------------------------------------------------------------:|:----------------------------:|:----------:|:-------------:|:------|)
[//]: # (| [FSMN-VAD]&#40;https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary&#41; | Alibaba Speech Data &#40;5000hours&#41; | 0.4M | 16000 | |)
| Model Name | Training Data | Parameters | Sampling Rate | Notes |
|:----------------------------------------------------:|:----------------------------:|:----------:|:-------------:|:------|
| [FSMN-VAD](https://huggingface.co/funasr/FSMN-VAD) | Alibaba Speech Data (5000hours) | 0.4M | 16000 | |
[//]: # (| [FSMN-VAD]&#40;https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary&#41; | Alibaba Speech Data &#40;5000hours&#41; | 0.4M | 8000 | |)
### Punctuation Restoration Models
[//]: # (| Model Name | Training Data | Parameters | Vocab Size| Offline/Online | Notes |)
[//]: # (|:--------------------------------------------------------------------------------------------------------------------------:|:----------------------------:|:----------:|:----------:|:--------------:|:------|)
[//]: # (| [CT-Transformer]&#40;https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary&#41; | Alibaba Text Data | 70M | 272727 | Offline | offline punctuation model |)
| Model Name | Training Data | Parameters | Vocab Size| Offline/Online | Notes |
|:--------------------------------------------------------------------:|:----------------------------:|:----------:|:----------:|:--------------:|:------|
| [CT-Transformer](https://huggingface.co/funasr/CT-Transformer-punc) | Alibaba Text Data | 70M | 272727 | Offline | offline punctuation model |
[//]: # (| [CT-Transformer]&#40;https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727/summary&#41; | Alibaba Text Data | 70M | 272727 | Online | online punctuation model |)

View File

@ -31,12 +31,7 @@ Overview
./academic_recipe/sd_recipe.md
.. toctree::
:maxdepth: 1
:caption: Model Zoo
./modelscope_models.md
./huggingface_models.md
.. toctree::
:maxdepth: 1
@ -56,11 +51,13 @@ Overview
Undo
.. toctree::
:maxdepth: 1
:caption: Funasr Library
:caption: Model Zoo
./build_task.md
./modelscope_models.md
./huggingface_models.md
.. toctree::
:maxdepth: 1
@ -82,6 +79,13 @@ Overview
./benchmark/benchmark_onnx_cpp.md
./benchmark/benchmark_libtorch.md
.. toctree::
:maxdepth: 1
:caption: Funasr Library
./build_task.md
.. toctree::
:maxdepth: 1
:caption: Papers

View File

@ -13,7 +13,7 @@ Here we provided several pretrained models on different datasets. The details of
|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
| [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Duration of input wav <= 20s |
| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which ould deal with arbitrary length input wav |
| [paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
| [Paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
| [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) | CN & EN | Alibaba Speech Data (50000hours) | 8358 | 68M | Offline | Duration of input wav <= 20s |
| [Paraformer-online](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) | CN & EN | Alibaba Speech Data (50000hours) | 8404 | 68M | Online | Which could deal with streaming input |
| [Paraformer-tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary) | CN | Alibaba Speech Data (200hours) | 544 | 5.2M | Offline | Lightweight Paraformer model which supports Mandarin command words recognition |

View File

@ -1,7 +1,7 @@
# Speech Recognition
> **Note**:
> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take typic model as example to demonstrate the usage.
> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take the typic models as examples to demonstrate the usage.
## Inference
@ -62,10 +62,10 @@ Undo
##### Define pipeline
- `task`: `Tasks.auto_speech_recognition`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Defalut), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Defalut), sets the number of threads used for intraop parallelism on CPU
- `output_dir`: `None` (Defalut), the output path of results if set
- `batch_size`: `1` (Defalut), batch size when decoding
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
##### Infer pipeline
- `audio_in`: the input to decode, which could be:
- wav_path, `e.g.`: asr_example.wav,
@ -79,7 +79,7 @@ Undo
```
In this case of `wav.scp` input, `output_dir` must be set to save the output results
- `audio_fs`: audio sampling rate, only set when audio_in is pcm audio
- `output_dir`: None (Defalut), the output path of results if set
- `output_dir`: None (Default), the output path of results if set
### Inference with multi-thread CPUs or multi GPUs
FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.

View File

@ -1,7 +1,7 @@
# Voice Activity Detection
> **Note**:
> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take model of FSMN-VAD as example to demonstrate the usage.
> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take the model of FSMN-VAD as example to demonstrate the usage.
## Inference
@ -47,10 +47,10 @@ Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/
##### Define pipeline
- `task`: `Tasks.voice_activity_detection`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Defalut), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Defalut), sets the number of threads used for intraop parallelism on CPU
- `output_dir`: `None` (Defalut), the output path of results if set
- `batch_size`: `1` (Defalut), batch size when decoding
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
##### Infer pipeline
- `audio_in`: the input to decode, which could be:
- wav_path, `e.g.`: asr_example.wav,
@ -64,7 +64,7 @@ Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/
```
In this case of `wav.scp` input, `output_dir` must be set to save the output results
- `audio_fs`: audio sampling rate, only set when audio_in is pcm audio
- `output_dir`: None (Defalut), the output path of results if set
- `output_dir`: None (Default), the output path of results if set
### Inference with multi-thread CPUs or multi GPUs
FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/vad/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.

View File

@ -19,7 +19,7 @@ python -m funasr.export.export_model --model-name damo/speech_paraformer-large_a
```
## Install the `funasr_onnx`
## Install `funasr_onnx`
install from pip
```shell
@ -46,16 +46,22 @@ pip install -e ./
from funasr_onnx import Paraformer
model_dir = "./export/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
model = Paraformer(model_dir, batch_size=1)
model = Paraformer(model_dir, batch_size=1, quantize=True)
wav_path = ['./export/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav']
result = model(wav_path)
print(result)
```
- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
- Output: `List[str]`: recognition result
- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- `batch_size`: `1` (Default), the batch size duration inference
- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
Input: wav formt file, support formats: `str, np.ndarray, List[str]`
Output: `List[str]`: recognition result
#### Paraformer-online
@ -71,9 +77,16 @@ model = Fsmn_vad(model_dir)
result = model(wav_path)
print(result)
```
- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
- Output: `List[str]`: recognition result
- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- `batch_size`: `1` (Default), the batch size duration inference
- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
Input: wav formt file, support formats: `str, np.ndarray, List[str]`
Output: `List[str]`: recognition result
#### FSMN-VAD-online
```python
@ -105,9 +118,16 @@ for sample_offset in range(0, speech_length, min(step, speech_length - sample_of
if segments_result:
print(segments_result)
```
- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
- Output: `List[str]`: recognition result
- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- `batch_size`: `1` (Default), the batch size duration inference
- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
Input: wav formt file, support formats: `str, np.ndarray, List[str]`
Output: `List[str]`: recognition result
### Punctuation Restoration
#### CT-Transformer
@ -121,9 +141,15 @@ text_in="跨境河流是养育沿岸人民的生命之源长期以来为帮助
result = model(text_in)
print(result[0])
```
- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
- Output: `List[str]`: recognition result
- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
Input: `str`, raw text of asr result
Output: `List[str]`: recognition result
#### CT-Transformer-online
```python
@ -143,9 +169,14 @@ for vad in vads:
print(rec_result_all)
```
- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
- Output: `List[str]`: recognition result
- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
Input: `str`, raw text of asr result
Output: `List[str]`: recognition result
## Performance benchmark