Merge branch 'main' of github.com:alibaba-damo-academy/FunASR

merge
This commit is contained in:
游雁 2024-09-25 16:34:07 +08:00
commit fc00476a25
2 changed files with 4 additions and 0 deletions

View File

@ -29,6 +29,7 @@
<a name="whats-new"></a>
## What's new:
- 2024/09/25keyword spotting models are new supported. Supports fine-tuning and inference for four models: [fsmn_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [fsmn_kws_mt](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [sanm_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-offline), [sanm_kws_streaming](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online).
- 2024/07/04[SenseVoice](https://github.com/FunAudioLLM/SenseVoice) is a speech foundation model with multiple speech understanding capabilities, including ASR, LID, SER, and AED.
- 2024/07/01: Offline File Transcription Service GPU 1.1 released, optimize BladeDISC model compatibility issues; ref to ([docs](runtime/readme.md))
- 2024/06/27: Offline File Transcription Service GPU 1.0 released, supporting dynamic batch processing and multi-threading concurrency. In the long audio test set, the single-thread RTF is 0.0076, and multi-threads' speedup is 1200+ (compared to 330+ on CPU); ref to ([docs](runtime/readme.md))
@ -100,6 +101,7 @@ FunASR has open-sourced a large number of pre-trained models on industrial data.
| conformer-en <br> ( [](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗](https://huggingface.co/funasr/conformer-en) ) | speech recognition, non-streaming | 50000 hours, English | 220M |
| ct-punc <br> ( [](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗](https://huggingface.co/funasr/ct-punc) ) | punctuation restoration | 100M, Mandarin and English | 290M |
| fsmn-vad <br> ( [](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) ) | voice activity detection | 5000 hours, Mandarin and English | 0.4M |
| fsmn-kws <br> ( [](https://modelscope.cn/models/iic/speech_charctc_kws_phone-xiaoyun/summary) ) | keyword spottingstreaming | 5000 hours, Mandarin | 0.7M |
| fa-zh <br> ( [](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗](https://huggingface.co/funasr/fa-zh) ) | timestamp prediction | 5000 hours, Mandarin | 38M |
| cam++ <br> ( [](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/campplus) ) | speaker verification/diarization | 5000 hours | 7.2M |
| Whisper-large-v2 <br> ([⭐](https://www.modelscope.cn/models/iic/speech_whisper-large_asr_multilingual/summary) [🍀](https://github.com/openai/whisper) ) | speech recognition, with timestamps, non-streaming | multilingual | 1550 M |

View File

@ -33,6 +33,7 @@ FunASR希望在语音识别的学术研究和工业应用之间架起一座桥
<a name="最新动态"></a>
## 最新动态
- 2024/09/25新增语音唤醒模型支持[fsmn_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [fsmn_kws_mt](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online), [sanm_kws](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-offline), [sanm_kws_streaming](https://modelscope.cn/models/iic/speech_sanm_kws_phone-xiaoyun-commands-online) 4个模型的微调和推理。
- 2024/07/04[SenseVoice](https://github.com/FunAudioLLM/SenseVoice) 是一个基础语音理解模型具备多种语音理解能力涵盖了自动语音识别ASR、语言识别LID、情感识别SER以及音频事件检测AED
- 2024/07/01中文离线文件转写服务GPU版本 1.1发布优化bladedisc模型兼容性问题详细信息参阅([部署文档](runtime/readme_cn.md))
- 2024/06/27中文离线文件转写服务GPU版本 1.0发布支持动态batch支持多路并发在长音频测试集上单线RTF为0.0076多线加速比为1200+CPU为330+);详细信息参阅([部署文档](runtime/readme_cn.md))
@ -107,6 +108,7 @@ FunASR开源了大量在工业数据上预训练模型您可以在[模型许
| conformer-en <br> ( [](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗](https://huggingface.co/funasr/conformer-en) ) | 语音识别,非实时 | 50000小时英文 | 220M |
| ct-punc <br> ( [](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗](https://huggingface.co/funasr/ct-punc) ) | 标点恢复 | 100M中文与英文 | 290M |
| fsmn-vad <br> ( [](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) ) | 语音端点检测,实时 | 5000小时中文与英文 | 0.4M |
| fsmn-kws <br> ( [](https://modelscope.cn/models/iic/speech_charctc_kws_phone-xiaoyun/summary) ) | 语音唤醒,实时 | 5000小时中文 | 0.7M |
| fa-zh <br> ( [](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗](https://huggingface.co/funasr/fa-zh) ) | 字级别时间戳预测 | 50000小时中文 | 38M |
| cam++ <br> ( [](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/campplus) ) | 说话人确认/分割 | 5000小时 | 7.2M |
| Whisper-large-v3 <br> ([⭐](https://www.modelscope.cn/models/iic/Whisper-large-v3/summary) [🍀](https://github.com/openai/whisper) ) | 语音识别,带时间戳输出,非实时 | 多语言 | 1550 M |