diff --git a/README.md b/README.md
index 5d6503d1f..3f6b4349e 100644
--- a/README.md
+++ b/README.md
@@ -9,31 +9,32 @@
|
|
|
|
-|:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:|
+|
|
|
|
|
+|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:|
The contributors can be found in [contributors list](./Acknowledge.md)
@@ -91,12 +162,6 @@ The use of pretraining model is subject to [model license](./MODEL_LICENSE)
year={2023},
booktitle={INTERSPEECH},
}
-@inproceedings{wang2023told,
- author={Jiaming Wang and Zhihao Du and Shiliang Zhang},
- title={{TOLD:} {A} Novel Two-Stage Overlap-Aware Framework for Speaker Diarization},
- year={2023},
- booktitle={ICASSP},
-}
@inproceedings{gao22b_interspeech,
author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}},
diff --git a/README_zh.md b/README_zh.md
index 051a4d274..554c0b61d 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -18,8 +18,8 @@ FunASR希望在语音识别的学术研究和工业应用之间架起一座桥
| 安装
| 快速开始
| 教程文档
-| 模型仓库
-| 服务部署
+| 模型仓库
+| 服务部署
| 联系我们
@@ -27,16 +27,17 @@ FunASR希望在语音识别的学术研究和工业应用之间架起一座桥
## 核心功能
- FunASR是一个基础语音识别工具包,提供多种功能,包括语音识别(ASR)、语音端点检测(VAD)、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等。FunASR提供了便捷的脚本和教程,支持预训练好的模型的推理与微调。
-- 我们在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)与[huggingface](https://huggingface.co/FunAudio)上发布了大量开源数据集或者海量工业数据训练的模型,可以通过我们的[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)了解模型的详细信息。代表性的[Paraformer](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)非自回归端到端语音识别模型具有高精度、高效率、便捷部署的优点,支持快速构建语音识别服务,详细信息可以阅读([服务部署文档](funasr/runtime/readme_cn.md))。
+- 我们在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)与[huggingface](https://huggingface.co/FunASR)上发布了大量开源数据集或者海量工业数据训练的模型,可以通过我们的[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)了解模型的详细信息。代表性的[Paraformer](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)非自回归端到端语音识别模型具有高精度、高效率、便捷部署的优点,支持快速构建语音识别服务,详细信息可以阅读([服务部署文档](runtime/readme_cn.md))。
## 最新动态
-- 20223/10/17: 英文离线文件转写服务一键部署的CPU版本发布,详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_en_zh.md))
+- 2023/11/08:中文离线文件转写服务3.0 CPU版本发布,新增标点大模型、Ngram语言模型与wfst热词,详细信息参阅([一键部署文档](runtime/readme_cn.md#中文离线文件转写服务cpu版本))
+- 2023/10/17: 英文离线文件转写服务一键部署的CPU版本发布,详细信息参阅([一键部署文档](runtime/readme_cn.md#英文离线文件转写服务cpu版本))
- 2023/10/13: [SlideSpeech](https://slidespeech.github.io/): 一个大规模的多模态音视频语料库,主要是在线会议或者在线课程场景,包含了大量与发言人讲话实时同步的幻灯片。
- 2023.10.10: [Paraformer-long-Spk](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py)模型发布,支持在长语音识别的基础上获取每句话的说话人标签。
- 2023.10.07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): FunCodec提供开源模型和训练工具,可以用于音频离散编码,以及基于离散编码的语音识别、语音合成等任务。
-- 2023.09.01: 中文离线文件转写服务2.0 CPU版本发布,新增ffmpeg、时间戳与热词模型支持,详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_zh.md))
-- 2023.08.07: 中文实时语音听写服务一键部署的CPU版本发布,详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_online_zh.md))
+- 2023.09.01: 中文离线文件转写服务2.0 CPU版本发布,新增ffmpeg、时间戳与热词模型支持,详细信息参阅([一键部署文档](runtime/readme_cn.md#中文离线文件转写服务cpu版本))
+- 2023.08.07: 中文实时语音听写服务一键部署的CPU版本发布,详细信息参阅([一键部署文档](runtime/readme_cn.md#中文实时语音听写服务cpu版本))
- 2023.07.17: BAT一种低延迟低内存消耗的RNN-T模型发布,详细信息参阅([BAT](egs/aishell/bat))
- 2023.06.26: ASRU2023 多通道多方会议转录挑战赛2.0完成竞赛结果公布,详细信息参阅([M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2_cn/index.html))
@@ -51,17 +52,17 @@ FunASR开源了大量在工业数据上预训练模型,您可以在[模型许
(注:[🤗]()表示Huggingface模型仓库链接,[⭐]()表示ModelScope模型仓库链接)
-| 模型名字 | 任务详情 | 训练数据 | 参数量 |
-|:------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
-| paraformer-zh ([🤗]() [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) ) | 语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
-| paraformer-zh-spk ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) ) | 分角色语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
-| paraformer-zh-online ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) ) | 语音识别,实时 | 60000小时,中文 | 220M |
-| paraformer-en ([🤗]() [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) ) | 分角色语音识别,带时间戳输出,非实时 | 50000小时,英文 | 220M |
-| paraformer-en-spk ([🤗]() [⭐]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
-| conformer-en ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) ) | 语音识别,非实时 | 50000小时,英文 | 220M |
-| ct-punc ([🤗]() [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) ) | 标点恢复,非实时 | 100M,中文与英文 | 1.1G |
-| fsmn-vad ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) ) | 语音端点检测,实时 | 5000小时,中文与英文 | 0.4M |
-| fa-zh ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) ) | 字级别时间戳预测 | 50000小时,中文 | 38M |
+| 模型名字 | 任务详情 | 训练数据 | 参数量 |
+|:---------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
+| paraformer-zh ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) | 语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
+| paraformer-zh-spk ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) [🤗]() ) | 分角色语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
+| paraformer-zh-online ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) | 语音识别,实时 | 60000小时,中文 | 220M |
+| paraformer-en ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
+| paraformer-en-spk ([🤗]() [⭐]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
+| conformer-en ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
+| ct-punc ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() ) | 标点恢复 | 100M,中文与英文 | 1.1G |
+| fsmn-vad ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() ) | 语音端点检测,实时 | 5000小时,中文与英文 | 0.4M |
+| fa-zh ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() ) | 字级别时间戳预测 | 50000小时,中文 | 38M |
@@ -116,7 +117,7 @@ FunASR支持预训练或者进一步微调的模型进行服务部署。目前
- 中文离线文件转写服务(GPU版本),进行中
- 更多支持中
-详细信息可以参阅([服务部署文档](funasr/runtime/readme_cn.md))。
+详细信息可以参阅([服务部署文档](runtime/readme_cn.md))。
diff --git a/docs/index.rst b/docs/index.rst
index b79aee099..bf4268b06 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -71,10 +71,10 @@ Overview
:maxdepth: 1
:caption: Runtime and Service
- ./funasr/runtime/readme.md
- ./funasr/runtime/docs/SDK_tutorial_online.md
- ./funasr/runtime/docs/SDK_tutorial.md
- ./funasr/runtime/html5/readme.md
+ ./runtime/readme.md
+ ./runtime/docs/SDK_tutorial_online.md
+ ./runtime/docs/SDK_tutorial.md
+ ./runtime/html5/readme.md
diff --git a/docs/runtime b/docs/runtime
new file mode 120000
index 000000000..3d1f9906c
--- /dev/null
+++ b/docs/runtime
@@ -0,0 +1 @@
+../runtime
\ No newline at end of file
diff --git a/docs/runtime/demo.gif b/docs/runtime/demo.gif
deleted file mode 100644
index f487f2c66..000000000
Binary files a/docs/runtime/demo.gif and /dev/null differ
diff --git a/docs/runtime/export.md b/docs/runtime/export.md
deleted file mode 120000
index 91f8b98a0..000000000
--- a/docs/runtime/export.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/export/README.md
\ No newline at end of file
diff --git a/docs/runtime/grpc_cpp.md b/docs/runtime/grpc_cpp.md
deleted file mode 120000
index 590a5f701..000000000
--- a/docs/runtime/grpc_cpp.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/runtime/grpc/Readme.md
\ No newline at end of file
diff --git a/docs/runtime/grpc_python.md b/docs/runtime/grpc_python.md
deleted file mode 120000
index ee8d6ea43..000000000
--- a/docs/runtime/grpc_python.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/runtime/python/grpc/Readme.md
\ No newline at end of file
diff --git a/docs/runtime/html5.md b/docs/runtime/html5.md
deleted file mode 120000
index bf47840ed..000000000
--- a/docs/runtime/html5.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/runtime/html5/readme.md
\ No newline at end of file
diff --git a/docs/runtime/img.png b/docs/runtime/img.png
deleted file mode 100644
index 84e2efe62..000000000
Binary files a/docs/runtime/img.png and /dev/null differ
diff --git a/docs/runtime/libtorch_python.md b/docs/runtime/libtorch_python.md
deleted file mode 120000
index e8d628868..000000000
--- a/docs/runtime/libtorch_python.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/runtime/python/libtorch/README.md
\ No newline at end of file
diff --git a/docs/runtime/onnxruntime_cpp.md b/docs/runtime/onnxruntime_cpp.md
deleted file mode 120000
index 3661d18ef..000000000
--- a/docs/runtime/onnxruntime_cpp.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/runtime/onnxruntime/readme.md
\ No newline at end of file
diff --git a/docs/runtime/onnxruntime_python.md b/docs/runtime/onnxruntime_python.md
deleted file mode 120000
index 693bd5dec..000000000
--- a/docs/runtime/onnxruntime_python.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/runtime/python/onnxruntime/README.md
\ No newline at end of file
diff --git a/docs/runtime/websocket_cpp.md b/docs/runtime/websocket_cpp.md
deleted file mode 120000
index 8a87df5e4..000000000
--- a/docs/runtime/websocket_cpp.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/runtime/websocket/readme.md
\ No newline at end of file
diff --git a/docs/runtime/websocket_python.md b/docs/runtime/websocket_python.md
deleted file mode 120000
index 0fabb8547..000000000
--- a/docs/runtime/websocket_python.md
+++ /dev/null
@@ -1 +0,0 @@
-../../funasr/runtime/python/websocket/README.md
\ No newline at end of file
diff --git a/egs_modelscope/asr/TEMPLATE/README_zh.md b/egs_modelscope/asr/TEMPLATE/README_zh.md
index 0754bc672..583e63aba 100644
--- a/egs_modelscope/asr/TEMPLATE/README_zh.md
+++ b/egs_modelscope/asr/TEMPLATE/README_zh.md
@@ -30,12 +30,10 @@ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
- #punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large',
)
-rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav',
- batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000)
+rec_result = inference_pipeline(audio_in='./vad_example.wav')
print(rec_result)
```
其中:
diff --git a/funasr/quick_start.md b/funasr/quick_start.md
index 202c709ec..6108f020c 100644
--- a/funasr/quick_start.md
+++ b/funasr/quick_start.md
@@ -26,7 +26,7 @@ python funasr_wss_server.py --port 10095
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
```
-For more examples, please refer to [docs](runtime/python/websocket/README.md).
+For more examples, please refer to [docs](../runtime/python/websocket/README.md).
### C++ version Example
@@ -47,7 +47,7 @@ Testing [samples](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sam
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass
```
-For more examples, please refer to [docs](runtime/docs/SDK_tutorial_online_zh.md)
+For more examples, please refer to [docs](../runtime/docs/SDK_tutorial_online_zh.md)
#### File Transcription Service, Mandarin (CPU)
@@ -68,7 +68,7 @@ Testing [samples](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sam
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```
-For more examples, please refer to [docs](runtime/docs/SDK_tutorial_zh.md)
+For more examples, please refer to [docs](../runtime/docs/SDK_tutorial_zh.md)
## Industrial Model Egs
diff --git a/funasr/quick_start_zh.md b/funasr/quick_start_zh.md
index a8d20a22f..9a3c2c94f 100644
--- a/funasr/quick_start_zh.md
+++ b/funasr/quick_start_zh.md
@@ -26,7 +26,7 @@ python funasr_wss_server.py --port 10095
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
#python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp"
```
-更多例子可以参考([点击此处](runtime/python/websocket/README.md))
+更多例子可以参考([点击此处](../runtime/python/websocket/README.md))
#### c++版本示例
@@ -46,7 +46,7 @@ sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-ru
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass
```
-更多例子参考([点击此处](runtime/docs/SDK_tutorial_online_zh.md))
+更多例子参考([点击此处](../runtime/docs/SDK_tutorial_online_zh.md))
##### 离线文件转写服务部署
###### 服务端部署
@@ -59,7 +59,7 @@ sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace ./funasr-r
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```
-更多例子参考([点击此处](runtime/docs/SDK_tutorial_zh.md))
+更多例子参考([点击此处](../runtime/docs/SDK_tutorial_zh.md))
diff --git a/funasr/version.txt b/funasr/version.txt
index 100435be1..ee94dd834 100644
--- a/funasr/version.txt
+++ b/funasr/version.txt
@@ -1 +1 @@
-0.8.2
+0.8.3
diff --git a/runtime/docs/SDK_advanced_guide_offline.md b/runtime/docs/SDK_advanced_guide_offline.md
index dd1372683..6dc97984a 100644
--- a/runtime/docs/SDK_advanced_guide_offline.md
+++ b/runtime/docs/SDK_advanced_guide_offline.md
@@ -4,37 +4,28 @@ FunASR provides a Chinese offline file transcription service that can be deploye
This document serves as a development guide for the FunASR offline file transcription service. If you wish to quickly experience the offline file transcription service, please refer to the one-click deployment example for the FunASR offline file transcription service ([docs](./SDK_tutorial.md)).
-## Installation of Docker
+
-The following steps are for manually installing Docker and Docker images. If your Docker image has already been launched, you can ignore this step.
-### Installation of Docker environment
+| TIME | INFO | IMAGE VERSION | IMAGE ID |
+|------------|----------------------------------------------------------------------------------------------------------------------------------|------------------------------|--------------|
+| 2023.11.08 | supporting punc-large model, Ngram model, fst hotwords, server-side loading of hotwords, adaptation to runtime structure changes | funasr-runtime-sdk-cpu-0.3.0 | caa64bddbb43 |
+| 2023.09.19 | supporting ITN model | funasr-runtime-sdk-cpu-0.2.2 | 2c5286be13e9 |
+| 2023.08.22 | integrated ffmpeg to support various audio and video inputs, supporting nn-hotword model and timestamp model | funasr-runtime-sdk-cpu-0.2.0 | 1ad3d19e0707 |
+| 2023.07.03 | 1.0 released | funasr-runtime-sdk-cpu-0.1.0 | 1ad3d19e0707 |
+
+## Quick start
+### Docker install
+If you have already installed Docker, ignore this step!
```shell
-# Ubuntu:
-curl -fsSL https://test.docker.com -o test-docker.sh
-sudo sh test-docker.sh
-# Debian:
-curl -fsSL https://get.docker.com -o get-docker.sh
-sudo sh get-docker.sh
-# CentOS:
-curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
-# MacOS:
-brew install --cask --appdir=/Applications docker
-```
-
-More details could ref to [docs](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)
-
-### Starting Docker
-
-```shell
-sudo systemctl start docker
+curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
+sudo bash install_docker.sh
```
+If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)
### Pulling and launching images
-
Use the following command to pull and launch the Docker image for the FunASR runtime-SDK:
-
```shell
sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.3.0
@@ -46,11 +37,9 @@ Introduction to command parameters:
-p
+
+| 时间 | 详情 | 镜像版本 | 镜像ID |
+|------------|---------------------------------------------------|------------------------------|--------------|
+| 2023.11.08 | 支持标点大模型、支持Ngram模型、支持fst热词、支持服务端加载热词、runtime结构变化适配 | funasr-runtime-sdk-cpu-0.3.0 | caa64bddbb43 |
+| 2023.09.19 | 支持ITN模型 | funasr-runtime-sdk-cpu-0.2.2 | 2c5286be13e9 |
+| 2023.08.22 | 集成ffmpeg支持多种音视频输入、支持热词模型、支持时间戳模型 | funasr-runtime-sdk-cpu-0.2.0 | 1ad3d19e0707 |
+| 2023.07.03 | 1.0 发布 | funasr-runtime-sdk-cpu-0.1.0 | 1ad3d19e0707 |
+
## 服务器配置
用户可以根据自己的业务需求,选择合适的服务器配置,推荐配置为:
@@ -25,6 +34,7 @@ FunASR提供可一键本地或者云端服务器部署的中文离线文件转
curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
sudo bash install_docker.sh
```
+docker安装失败请参考 [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)
### 镜像启动
@@ -38,7 +48,6 @@ sudo docker run -p 10095:10095 -it --privileged=true \
-v $PWD/funasr-runtime-resources/models:/workspace/models \
registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.3.0
```
-如果您没有安装docker,可参考[Docker安装](#Docker安装)
### 服务端启动
@@ -51,15 +60,20 @@ nohup bash run_server.sh \
--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
--lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
- --itn-dir thuduj12/fst_itn_zh > log.out 2>&1 &
+ --itn-dir thuduj12/fst_itn_zh \
+ --hotword /workspace/models/hotwords.txt > log.out 2>&1 &
# 如果您想关闭ssl,增加参数:--certfile 0
# 如果您想使用时间戳或者nn热词模型进行部署,请设置--model-dir为对应模型:
-# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx(时间戳)
-# 或者 damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx(热词)
-
+# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx(时间戳)
+# damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx(nn热词)
+# 如果您想在服务端加载热词,请在宿主机文件./funasr-runtime-resources/models/hotwords.txt配置热词(docker映射地址为/workspace/models/hotwords.txt):
+# 每行一个热词,格式(热词 权重):阿里巴巴 20
```
+如果您想定制ngram,参考文档([如何训练LM](./lm_train_tutorial.md))
+
服务端详细参数介绍可参考[服务端用法详解](#服务端用法详解)
+
### 客户端测试与使用
下载客户端测试工具目录samples
@@ -71,34 +85,6 @@ wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_sa
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```
-------------------
-## Docker安装
-
-下述步骤为手动安装docker环境的步骤:
-
-### docker环境安装
-```shell
-# Ubuntu:
-curl -fsSL https://test.docker.com -o test-docker.sh
-sudo sh test-docker.sh
-# Debian:
-curl -fsSL https://get.docker.com -o get-docker.sh
-sudo sh get-docker.sh
-# CentOS:
-curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
-# MacOS:
-brew install --cask --appdir=/Applications docker
-```
-
-安装详见:https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html
-
-### docker启动
-
-```shell
-sudo systemctl start docker
-```
-
-
## 客户端用法详解
在服务器上完成FunASR服务部署以后,可以通过如下的步骤来测试和使用离线文件转写服务。
@@ -137,7 +123,6 @@ python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \
```
命令参数说明:
-
```text
--server-ip 为FunASR runtime-SDK服务部署机器ip,默认为本机ip(127.0.0.1),如果client与服务不在同一台服务器,
需要改为部署机器ip
@@ -148,13 +133,11 @@ python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \
```
### Html网页版
-
在浏览器中打开 html/static/index.html,即可出现如下页面,支持麦克风输入与文件上传,直接进行体验
### Java-client
-
```shell
FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
```
@@ -228,6 +211,7 @@ kill -9 PID
如果,您希望部署您finetune后的模型(例如10epoch.pb),需要手动将模型重命名为model.pb,并将原modelscope中模型model.pb替换掉,将路径指定为`model_dir`即可。
+------------------
## 如何定制服务部署
@@ -244,9 +228,6 @@ https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/websocke
如果您想定义自己的client,参考[websocket通信协议](./websocket_protocol_zh.md)
-
-```
-
### c++ 服务端:
#### VAD
diff --git a/runtime/docs/SDK_advanced_guide_online.md b/runtime/docs/SDK_advanced_guide_online.md
index 3a26db55a..6c973f132 100644
--- a/runtime/docs/SDK_advanced_guide_online.md
+++ b/runtime/docs/SDK_advanced_guide_online.md
@@ -2,18 +2,32 @@
FunASR provides a real-time speech transcription service that can be easily deployed on local or cloud servers, with the FunASR runtime-SDK as the core. It integrates the speech endpoint detection (VAD), Paraformer-large non-streaming speech recognition (ASR), Paraformer-large streaming speech recognition (ASR), punctuation (PUNC), and other related capabilities open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. The software package can perform real-time speech-to-text transcription, and can also accurately transcribe text at the end of sentences for high-precision output. The output text contains punctuation and supports high-concurrency multi-channel requests.
+
+
+| TIME | INFO | IMAGE VERSION | IMAGE ID |
+|------------|-------------------------------------------------------------------------------------|-------------------------------------|--------------|
+| 2023.11.08 | supporting server-side loading of hotwords, adaptation to runtime structure changes | funasr-runtime-sdk-online-cpu-0.1.4 | 691974017c38 |
+| 2023.09.19 | supporting hotwords, timestamps, and ITN model in 2pass mode | funasr-runtime-sdk-online-cpu-0.1.2 | 7222c5319bcf |
+| 2023.08.11 | addressing some known bugs (including server crashes) | funasr-runtime-sdk-online-cpu-0.1.1 | bdbdd0b27dee |
+| 2023.08.07 | 1.0 released | funasr-runtime-sdk-online-cpu-0.1.0 | bdbdd0b27dee |
+
## Quick Start
-### Pull Docker Image
-
-Use the following command to pull and start the FunASR software package docker image:
-
+### Docker install
+If you have already installed Docker, ignore this step!
```shell
-sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.3
-mkdir -p ./funasr-runtime-resources/models
-sudo docker run -p 10095:10095 -it --privileged=true -v $PWD/funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.3
+curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
+sudo bash install_docker.sh
```
If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)
+### Pull Docker Image
+Use the following command to pull and start the FunASR software package docker image:
+```shell
+sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4
+mkdir -p ./funasr-runtime-resources/models
+sudo docker run -p 10096:10095 -it --privileged=true -v $PWD/funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4
+```
+
### Launching the Server
After Docker is launched, start the funasr-wss-server-2pass service program:
diff --git a/runtime/docs/SDK_advanced_guide_online_zh.md b/runtime/docs/SDK_advanced_guide_online_zh.md
index ce9165552..d921e3d28 100644
--- a/runtime/docs/SDK_advanced_guide_online_zh.md
+++ b/runtime/docs/SDK_advanced_guide_online_zh.md
@@ -5,29 +5,38 @@ FunASR提供可便捷本地或者云端服务器部署的实时语音听写服
本文档为FunASR实时转写服务开发指南。如果您想快速体验实时语音听写服务,可参考[快速上手](#快速上手)。
+
+
+| 时间 | 详情 | 镜像版本 | 镜像ID |
+|:-----------|:----------------------------------|-------------------------------------|--------------|
+| 2023.11.08 | 支持服务端加载热词(更新热词通信协议)、runtime结构变化适配 | funasr-runtime-sdk-online-cpu-0.1.4 | 691974017c38 |
+| 2023.09.19 | 2pass模式支持热词、时间戳、ITN模型 | funasr-runtime-sdk-online-cpu-0.1.2 | 7222c5319bcf |
+| 2023.08.11 | 修复了部分已知的bug(包括server崩溃等) | funasr-runtime-sdk-online-cpu-0.1.1 | bdbdd0b27dee |
+| 2023.08.07 | 1.0 发布 | funasr-runtime-sdk-online-cpu-0.1.0 | bdbdd0b27dee |
+
+
## 快速上手
### docker安装
如果您已安装docker,忽略本步骤!!
通过下述命令在服务器上安装docker:
```shell
-curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
+curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh
sudo bash install_docker.sh
```
+docker安装失败请参考 [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)
### 镜像启动
-
通过下述命令拉取并启动FunASR软件包的docker镜像:
```shell
sudo docker pull \
- registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.3
+ registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4
mkdir -p ./funasr-runtime-resources/models
-sudo docker run -p 10095:10095 -it --privileged=true \
+sudo docker run -p 10096:10095 -it --privileged=true \
-v $PWD/funasr-runtime-resources/models:/workspace/models \
- registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.3
+ registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4
```
-如果您没有安装docker,可参考[Docker安装](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker_zh.html)
### 服务端启动
@@ -40,12 +49,15 @@ nohup bash run_server_2pass.sh \
--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
--punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
- --itn-dir thuduj12/fst_itn_zh > log.out 2>&1 &
+ --itn-dir thuduj12/fst_itn_zh \
+ --hotword /workspace/models/hotwords.txt > log.out 2>&1 &
# 如果您想关闭ssl,增加参数:--certfile 0
-# 如果您想使用时间戳或者热词模型进行部署,请设置--model-dir为对应模型:
-# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx(时间戳)
-# 或者 damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx(热词)
+# 如果您想使用时间戳或者nn热词模型进行部署,请设置--model-dir为对应模型:
+# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx(时间戳)
+# damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx(nn热词)
+# 如果您想在服务端加载热词,请在宿主机文件./funasr-runtime-resources/models/hotwords.txt配置热词(docker映射地址为/workspace/models/hotwords.txt):
+# 每行一个热词,格式(热词 权重):阿里巴巴 20
```
服务端详细参数介绍可参考[服务端用法详解](#服务端用法详解)
### 客户端测试与使用
diff --git a/runtime/docs/images/offline_structure.jpg b/runtime/docs/images/offline_structure.jpg
new file mode 100644
index 000000000..772f7c630
Binary files /dev/null and b/runtime/docs/images/offline_structure.jpg differ
diff --git a/runtime/docs/images/online_structure.png b/runtime/docs/images/online_structure.png
new file mode 100644
index 000000000..53731baca
Binary files /dev/null and b/runtime/docs/images/online_structure.png differ
diff --git a/runtime/docs/images/sdk_roadmap.jpg b/runtime/docs/images/sdk_roadmap.jpg
new file mode 100644
index 000000000..b8e501017
Binary files /dev/null and b/runtime/docs/images/sdk_roadmap.jpg differ
diff --git a/runtime/html5/static/index.html b/runtime/html5/static/index.html
index d9c6be794..d98c62bf9 100644
--- a/runtime/html5/static/index.html
+++ b/runtime/html5/static/index.html
@@ -52,7 +52,7 @@
- 中文离线文件转写服务(CPU版本),已完成
- 中文流式语音识别服务(CPU版本),已完成
@@ -17,20 +19,13 @@ SDK 支持以下几种服务部署:
为了支持不同用户的需求,针对不同场景,准备了不同的图文教程:
### 最新动态
-- 2023/11/08: 英文离线文件转写服务 1.1 发布,runtime结构变化适配(FunASR/funasr/runtime->FunASR/runtime),dokcer镜像版本funasr-runtime-sdk-en-cpu-0.1.1 ()
-- 2023/10/16: 英文离线文件转写服务 1.0 发布,dokcer镜像版本funasr-runtime-sdk-en-cpu-0.1.0 (e0de03eb0163),详细文档参考([点击此处](https://mp.weixin.qq.com/s/DZZUTj-6xwFfi-96ml--4A))
+- 2023/11/08: 英文离线文件转写服务 1.1 发布,runtime结构变化适配(FunASR/funasr/runtime->FunASR/runtime),dokcer镜像版本funasr-runtime-sdk-en-cpu-0.1.1 (27017f70f72a)
+- 2023/10/16: 英文离线文件转写服务 1.0 发布,dokcer镜像版本funasr-runtime-sdk-en-cpu-0.1.0 (e0de03eb0163),原理介绍文档([点击此处](https://mp.weixin.qq.com/s/DZZUTj-6xwFfi-96ml--4A))
-### 便捷部署教程
-适用场景为,对服务部署SDK无修改需求,部署模型来自于ModelScope,或者用户finetune,详细教程参考([点击此处](./docs/SDK_tutorial_en_zh.md))
+### 部署与开发文档
-### 开发指南
-
-适用场景为,对服务部署SDK有修改需求,部署模型来自于ModelScope,或者用户finetune,详细文档参考([点击此处](./docs/SDK_advanced_guide_offline_en_zh.md))
-
-### 技术原理揭秘
-
-文档介绍了背后技术原理,识别准确率,计算效率等,以及核心优势介绍:便捷、高精度、高效率、长音频链路,详细文档参考([点击此处](https://mp.weixin.qq.com/s/DZZUTj-6xwFfi-96ml--4A))
+部署模型来自于ModelScope,或者用户finetune,支持用户定制服务,详细文档参考([点击此处](./docs/SDK_advanced_guide_offline_en_zh.md))
## 中文实时语音听写服务(CPU版本)
@@ -38,23 +33,16 @@ FunASR实时语音听写服务软件包,既可以实时地进行语音转文
为了支持不同用户的需求,针对不同场景,准备了不同的图文教程:
### 最新动态
-- 2023/11/08: 中文实时语音听写服务 1.4 发布,支持服务端加载热词(更新热词通信协议)、runtime结构变化适配(FunASR/funasr/runtime->FunASR/runtime),dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.4 ()
+- 2023/11/08: 中文实时语音听写服务 1.4 发布,支持服务端加载热词(更新热词通信协议)、runtime结构变化适配(FunASR/funasr/runtime->FunASR/runtime),dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.4 (691974017c38)
- 2023/09/19: 中文实时语音听写服务 1.2 发布,2pass模式支持热词、时间戳、ITN模型,dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.2 (7222c5319bcf)
- 2023/08/11: 中文实时语音听写服务 1.1 发布,修复了部分已知的bug(包括server崩溃等),dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.1 (bdbdd0b27dee)
-- 2023/08/07: 中文实时语音听写服务 1.0 发布,dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.0 (bdbdd0b27dee),详细文档参考([点击此处](https://mp.weixin.qq.com/s/8He081-FM-9IEI4D-lxZ9w))
-
-### 便捷部署教程
-
-适用场景为,对服务部署SDK无修改需求,部署模型来自于ModelScope,或者用户finetune,详细教程参考([点击此处](./docs/SDK_tutorial_online_zh.md))
+- 2023/08/07: 中文实时语音听写服务 1.0 发布,dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.0 (bdbdd0b27dee),原理介绍文档([点击此处](https://mp.weixin.qq.com/s/8He081-FM-9IEI4D-lxZ9w))
-### 开发指南
+### 部署与开发文档
-适用场景为,对服务部署SDK有修改需求,部署模型来自于ModelScope,或者用户finetune,详细文档参考([点击此处](./docs/SDK_advanced_guide_online_zh.md))
+部署模型来自于ModelScope,或者用户finetune,支持用户定制服务,详细文档参考([点击此处](./docs/SDK_advanced_guide_online_zh.md))
-### 技术原理揭秘
-
-文档介绍了背后技术原理,识别准确率,计算效率等,以及核心优势介绍:便捷、高精度、高效率、长音频链路,详细文档参考([点击此处](https://mp.weixin.qq.com/s/8He081-FM-9IEI4D-lxZ9w))
## 中文离线文件转写服务(CPU版本)
@@ -63,20 +51,14 @@ FunASR实时语音听写服务软件包,既可以实时地进行语音转文
为了支持不同用户的需求,针对不同场景,准备了不同的图文教程:
### 最新动态
-- 2023/11/08: 中文离线文件转写服务 3.0 发布,支持标点大模型、支持Ngram模型、支持fst热词(更新热词通信协议)、支持服务端加载热词、runtime结构变化适配(FunASR/funasr/runtime->FunASR/runtime),dokcer镜像版本funasr-runtime-sdk-cpu-0.3.0 (),详细文档参考([点击此处]())
+
+- 2023/11/08: 中文离线文件转写服务 3.0 发布,支持标点大模型、支持Ngram模型、支持fst热词(更新热词通信协议)、支持服务端加载热词、runtime结构变化适配(FunASR/funasr/runtime->FunASR/runtime),dokcer镜像版本funasr-runtime-sdk-cpu-0.3.0 (caa64bddbb43),原理介绍文档([点击此处](https://mp.weixin.qq.com/s/jSbnKw_m31BUUbTukPSOIw))
- 2023/09/19: 中文离线文件转写服务 2.2 发布,支持ITN模型,dokcer镜像版本funasr-runtime-sdk-cpu-0.2.2 (2c5286be13e9)
-- 2023/08/22: 中文离线文件转写服务 2.0 发布,集成ffmpeg支持多种音视频输入、支持热词模型、支持时间戳模型,dokcer镜像版本funasr-runtime-sdk-cpu-0.2.0 (1ad3d19e0707),详细文档参考([点击此处](https://mp.weixin.qq.com/s/oJHe0MKDqTeuIFH-F7GHMg))
-- 2023/07/03: 中文离线文件转写服务 1.0 发布,dokcer镜像版本funasr-runtime-sdk-cpu-0.1.0 (1ad3d19e0707),详细文档参考([点击此处](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww))
+- 2023/08/22: 中文离线文件转写服务 2.0 发布,集成ffmpeg支持多种音视频输入、支持热词模型、支持时间戳模型,dokcer镜像版本funasr-runtime-sdk-cpu-0.2.0 (1ad3d19e0707),原理介绍文档([点击此处](https://mp.weixin.qq.com/s/oJHe0MKDqTeuIFH-F7GHMg))
+- 2023/07/03: 中文离线文件转写服务 1.0 发布,dokcer镜像版本funasr-runtime-sdk-cpu-0.1.0 (1ad3d19e0707),原理介绍文档([点击此处](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww))
-### 便捷部署教程
+### 部署与开发文档
-适用场景为,对服务部署SDK无修改需求,部署模型来自于ModelScope,或者用户finetune,详细教程参考([点击此处](./docs/SDK_tutorial_zh.md))
+部署模型来自于ModelScope,或者用户finetune,支持用户定制服务,详细文档参考([点击此处](./docs/SDK_advanced_guide_offline_zh.md))
-### 开发指南
-
-适用场景为,对服务部署SDK有修改需求,部署模型来自于ModelScope,或者用户finetune,详细文档参考([点击此处](./docs/SDK_advanced_guide_offline_zh.md))
-
-### 技术原理揭秘
-
-文档介绍了背后技术原理,识别准确率,计算效率等,以及核心优势介绍:便捷、高精度、高效率、长音频链路,详细文档参考([点击此处](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww))