update docs and readme

This commit is contained in:
shixian.shi 2023-10-13 15:14:00 +08:00
parent d9c808a8df
commit 9a0bc00e5f
7 changed files with 53 additions and 3 deletions

View File

@ -28,6 +28,7 @@
<a name="whats-new"></a>
## What's new:
- 2023/10/10: The ASR-SpeakersDiarization combined pipeline [speech_campplus_speaker-diarization_common](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py) is now released. Experience the model to get recognition results with speaker information.
- 2023/10/07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec.
- 2023/09/01: The offline file transcription service 2.0 (CPU) of Mandarin has been released, with added support for ffmpeg, timestamp, and hotword models. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial.md)).
- 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial_online.md)).

View File

@ -31,8 +31,9 @@ FunASR希望在语音识别的学术研究和工业应用之间架起一座桥
<a name="最新动态"></a>
## 最新动态
- 2023.10.10: [Paraformer-long-Spk](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py)模型发布,支持在长语音识别的基础上获取每句话的说话人标签。
- 2023.10.07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): FunCodec提供开源模型和训练工具可以用于音频离散编码以及基于离散编码的语音识别、语音合成等任务。
- 2023.09.01中文离线文件转写服务2.0 CPU版本发布新增ffmpeg、时间戳与热词模型支持详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_zh.md))
- 2023.09.01: 中文离线文件转写服务2.0 CPU版本发布新增ffmpeg、时间戳与热词模型支持详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_zh.md))
- 2023.08.07: 中文实时语音听写服务一键部署的CPU版本发布详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_online_zh.md))
- 2023.07.17: BAT一种低延迟低内存消耗的RNN-T模型发布详细信息参阅[BAT](egs/aishell/bat)
- 2023.07.03: 中文离线文件转写服务一键部署的CPU版本发布详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_zh.md))

View File

@ -17,7 +17,8 @@ Here we provided several pretrained models on different datasets. The details of
| Model Name | Language | Training Data | Vocab Size | Parameter | Offline/Online | Notes |
|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
| [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Duration of input wav <= 20s |
| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which would deal with arbitrary length input wav |
| [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which would deal with arbitrary length input wav |
| [Paraformer-large-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Supporting speaker diarizatioin for ASR results based on paraformer-large-long |
| [Paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN | Alibaba Speech Data (60000hours) | 8404 | 220M | Offline | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
| [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) | CN & EN | Alibaba Speech Data (50000hours) | 8358 | 68M | Offline | Duration of input wav <= 20s |
| [Paraformer-online](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) | CN & EN | Alibaba Speech Data (50000hours) | 8404 | 68M | Online | Which could deal with streaming input |

View File

@ -17,7 +17,8 @@
| 模型名字 | 语言 | 训练数据 | 词典大小 | 参数量 | 非实时/实时 | 备注 |
|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------------------:|:-----------------:|:----:|:-------:|:---------------------------|
| [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | 中文和英文 | 阿里巴巴语音数据60000小时 | 8404 | 220M | 非实时 | 输入wav文件持续时间不超过20秒 |
| [Paraformer-large长音频版本](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | 中文和英文 | 阿里巴巴语音数据60000小时 | 8404 | 220M | 非实时 || 能够处理任意长度的输入wav文件 |
| [Paraformer-large长音频版本](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | 中文和英文 | 阿里巴巴语音数据60000小时 | 8404 | 220M | 非实时 | 能够处理任意长度的输入wav文件 |
| [Paraformer-large-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) | 中文和英文 | 阿里巴巴语音数据60000小时 | 8404 | 220M | 非实时 | 在长音频功能的基础上添加说话人识别功能 |
| [Paraformer-large热词](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | 中文和英文 | 阿里巴巴语音数据60000小时 | 8404 | 220M | 非实时 | 基于激励增强的热词定制支持可以提高热词的召回率和准确率输入wav文件持续时间不超过20秒 |
| [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary) | 中文和英文 | 阿里巴巴语音数据50000小时 | 8358 | 68M | 离线 | 输入wav文件持续时间不超过20秒 |
| [Paraformer实时](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) | 中文和英文 | 阿里巴巴语音数据 (50000hours) | 8404 | 68M | 实时 | 能够处理流式输入 |

View File

@ -99,6 +99,28 @@ print(rec_result)
```
The decoding mode of `fast` and `normal` is fake streaming, which could be used for evaluating of recognition accuracy.
Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151)
#### [Paraformer-Spk](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)
This model allows user to get recognition results which contain speaker info of each sentence. Refer to [CAM++](https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary) for detailed information about speaker diarization model.
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_speaker_demo.wav'
output_dir = "./results"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn',
model_revision='v0.0.2',
vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large',
output_dir=output_dir,
)
rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000)
print(rec_result)
```
#### [RNN-T-online model]()
Undo

View File

@ -100,6 +100,29 @@ print(rec_result)
fast 和 normal 的解码模式是假流式解码,可用于评估识别准确性。
演示的完整代码,请参见 [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151)
#### [Paraformer-Spk model](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)
返回识别结果的同时返回每个子句的说话人分类结果。关于说话人日志模型的详情请见[CAM++](https://modelscope.cn/models/damo/speech_campplus_speaker-diarization_common/summary)。
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_speaker_demo.wav'
output_dir = "./results"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn',
model_revision='v0.0.2',
vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large',
output_dir=output_dir,
)
rec_result = inference_pipeline(audio_in=audio_in, batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000)
print(rec_result)
```
#### [RNN-T-online 模型]()
Undo

View File

@ -0,0 +1 @@
../asr/TEMPLATE