funasr streaming sdk

This commit is contained in:
游雁 2023-08-07 09:23:17 +08:00
parent c4ad81ff6f
commit 6836bfb77f
7 changed files with 772 additions and 36 deletions

View File

@ -28,7 +28,7 @@
<a name="whats-new"></a>
## What's new:
- 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial_online_zh.md)).
- 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial_online.md)).
- 2023/07/17: BAT is released, which is a low-latency and low-memory-consumption RNN-T model. For more details, please refer to ([BAT](egs/aishell/bat)).
- 2023/07/03: The offline file transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial.md)).
- 2023/06/26: ASRU2023 Multi-Channel Multi-Party Meeting Transcription Challenge 2.0 completed the competition and announced the results. For more details, please refer to ([M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2/index.html)).

View File

@ -0,0 +1,259 @@
# Advanced Development Guide (File transcription service)
FunASR provides a Chinese offline file transcription service that can be deployed locally or on a cloud server with just one click. The core of the service is the FunASR runtime SDK, which has been open-sourced. FunASR-runtime combines various capabilities such as speech endpoint detection (VAD), large-scale speech recognition (ASR) using Paraformer-large, and punctuation detection (PUNC), which have all been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. This enables accurate and efficient high-concurrency transcription of audio files.
This document serves as a development guide for the FunASR offline file transcription service. If you wish to quickly experience the offline file transcription service, please refer to the one-click deployment example for the FunASR offline file transcription service ([docs](./SDK_tutorial.md)).
## Installation of Docker
The following steps are for manually installing Docker and Docker images. If your Docker image has already been launched, you can ignore this step.
### Installation of Docker environment
```shell
# Ubuntu
curl -fsSL https://test.docker.com -o test-docker.sh
sudo sh test-docker.sh
# Debian
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# CentOS
curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
# MacOS
brew install --cask --appdir=/Applications docker
```
More details could ref to [docs](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)
### Starting Docker
```shell
sudo systemctl start docker
```
### Pulling and launching images
Use the following command to pull and launch the Docker image for the FunASR runtime-SDK:
```shell
sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
```
Introduction to command parameters:
```text
-p <host port>:<mapped docker port>: In the example, host machine (ECS) port 10095 is mapped to port 10095 in the Docker container. Make sure that port 10095 is open in the ECS security rules.
-v <host path>:<mounted Docker path>: In the example, the host machine path /root is mounted to the Docker path /workspace/models.
```
## Starting the server
Use the flollowing script to start the server
```shell
./run_server.sh --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
```
More details about the script run_server.sh:
The FunASR-wss-server supports downloading models from Modelscope. You can set the model download address (--download-model-dir, default is /workspace/models) and the model ID (--model-dir, --vad-dir, --punc-dir). Here is an example:
```shell
cd /workspace/FunASR/funasr/runtime/websocket/build/bin
./funasr-wss-server \
--download-model-dir /workspace/models \
--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
--decoder-thread-num 32 \
--io-thread-num 8 \
--port 10095 \
--certfile ../../../ssl_key/server.crt \
--keyfile ../../../ssl_key/server.key
```
Introduction to command parameters:
```text
--download-model-dir: Model download address, download models from Modelscope by setting the model ID.
--model-dir: Modelscope model ID.
--quantize: True for quantized ASR model, False for non-quantized ASR model. Default is True.
--vad-dir: Modelscope model ID.
--vad-quant: True for quantized VAD model, False for non-quantized VAD model. Default is True.
--punc-dir: Modelscope model ID.
--punc-quant: True for quantized PUNC model, False for non-quantized PUNC model. Default is True.
--port: Port number that the server listens on. Default is 10095.
--decoder-thread-num: Number of inference threads that the server starts. Default is 8.
--io-thread-num: Number of IO threads that the server starts. Default is 1.
--certfile <string>: SSL certificate file. Default is ../../../ssl_key/server.crt.
--keyfile <string>: SSL key file. Default is ../../../ssl_key/server.key.
```
The FunASR-wss-server also supports loading models from a local path (see Preparing Model Resources for detailed instructions on preparing local model resources). Here is an example:
```shell
cd /workspace/FunASR/funasr/runtime/websocket/build/bin
./funasr-wss-server \
--model-dir /workspace/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--vad-dir /workspace/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--punc-dir /workspace/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
--decoder-thread-num 32 \
--io-thread-num 8 \
--port 10095 \
--certfile ../../../ssl_key/server.crt \
--keyfile ../../../ssl_key/server.key
```
## Preparing Model Resources
If you choose to download models from Modelscope through the FunASR-wss-server, you can skip this step. The vad, asr, and punc model resources in the offline file transcription service of FunASR are all from Modelscope. The model addresses are shown in the table below:
| Model | Modelscope url |
|-------|------------------------------------------------------------------------------------------------------------------|
| VAD | https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary |
| ASR | https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary |
| PUNC | https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary |
The offline file transcription service deploys quantized ONNX models. Below are instructions on how to export ONNX models and their quantization. You can choose to export ONNX models from Modelscope, local files, or finetuned resources:
### Exporting ONNX models from Modelscope
Download the corresponding model with the given model name from the Modelscope website, and then export the quantized ONNX model
```shell
python -m funasr.export.export_model \
--export-dir ./export \
--type onnx \
--quantize True \
--model-name damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
--model-name damo/speech_fsmn_vad_zh-cn-16k-common-pytorch \
--model-name damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch
```
Introduction to command parameters:
```text
--model-name: The name of the model on Modelscope, for example: damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
--export-dir: The export directory of ONNX model.
--type: Model type, currently supports ONNX and torch.
--quantize: Quantize the int8 model.
```
### Exporting ONNX models from local files
Set the model name to the local path of the model, and export the quantized ONNX model:
```shell
python -m funasr.export.export_model --model-name /workspace/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
```
### Exporting models from finetuned resources
If you want to deploy a finetuned model, you can follow these steps:
Rename the model you want to deploy after finetuning (for example, 10epoch.pb) to model.pb, and replace the original model.pb in Modelscope with this one. If the path of the replaced model is /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch, use the following command to convert the finetuned model to an ONNX model:
```shell
python -m funasr.export.export_model --model-name /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
```
## Starting the client
After completing the deployment of FunASR offline file transcription service on the server, you can test and use the service by following these steps. Currently, FunASR-bin supports multiple ways to start the client. The following are command-line examples based on python-client, c++-client, and custom client Websocket communication protocol:
### python-client
```shell
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
```
Introduction to command parameters:
```text
--host: the IP address of the server. It can be set to 127.0.0.1 for local testing.
--port: the port number of the server listener.
--audio_in: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
--output_dir: the path to the recognition result output.
--ssl: whether to use SSL encryption. The default is to use SSL.
--mode: offline mode.
```
### c++-client
```shell
. /funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path test.wav --thread-num 1 --is-ssl 1
```
Introduction to command parameters:
```text
--host: the IP address of the server. It can be set to 127.0.0.1 for local testing.
--port: the port number of the server listener.
--audio_in: the audio input. Input can be a path to a wav file or a wav.scp file (a Kaldi-formatted wav list in which each line includes a wav_id followed by a tab and a wav_path).
--output_dir: the path to the recognition result output.
--ssl: whether to use SSL encryption. The default is to use SSL.
--mode: offline mode.
```
### Custom client
If you want to define your own client, the Websocket communication protocol is as follows:
```text
# First communication
{"mode": "offline", "wav_name": wav_name, "is_speaking": True}
# Send wav data
Bytes data
# Send end flag
{"is_speaking": False}
```
## How to customize service deployment
The code for FunASR-runtime is open source. If the server and client cannot fully meet your needs, you can further develop them based on your own requirements:
### C++ client
https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/websocket
### Python client
https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket
### C++ server
#### VAD
```c++
// The use of the VAD model consists of two steps: FsmnVadInit and FsmnVadInfer:
FUNASR_HANDLE vad_hanlde=FsmnVadInit(model_path, thread_num);
// Where: model_path contains "model-dir" and "quantize", thread_num is the ONNX thread count;
FUNASR_RESULT result=FsmnVadInfer(vad_hanlde, wav_file.c_str(), NULL, 16000);
// Where: vad_hanlde is the return value of FunOfflineInit, wav_file is the path to the audio file, and sampling_rate is the sampling rate (default 16k).
```
See the usage example for details [docs](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline-vad.cpp)
#### ASR
```text
// The use of the ASR model consists of two steps: FunOfflineInit and FunOfflineInfer:
FUNASR_HANDLE asr_hanlde=FunOfflineInit(model_path, thread_num);
// Where: model_path contains "model-dir" and "quantize", thread_num is the ONNX thread count;
FUNASR_RESULT result=FunOfflineInfer(asr_hanlde, wav_file.c_str(), RASR_NONE, NULL, 16000);
// Where: asr_hanlde is the return value of FunOfflineInit, wav_file is the path to the audio file, and sampling_rate is the sampling rate (default 16k).
```
See the usage example for details, [docs](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline.cpp)
#### PUNC
```text
// The use of the PUNC model consists of two steps: CTTransformerInit and CTTransformerInfer:
FUNASR_HANDLE punc_hanlde=CTTransformerInit(model_path, thread_num);
// Where: model_path contains "model-dir" and "quantize", thread_num is the ONNX thread count;
FUNASR_RESULT result=CTTransformerInfer(punc_hanlde, txt_str.c_str(), RASR_NONE, NULL);
// Where: punc_hanlde is the return value of CTTransformerInit, txt_str is the text
```
See the usage example for details, [docs](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline-punc.cpp)

View File

@ -0,0 +1,289 @@
# FunASR实时语音转写服务开发指南
FunASR提供可便捷本地或者云端服务器部署的实时语音转写服务内核为FunASR已开源runtime-SDK。
集成了达摩院语音实验室在Modelscope社区开源的语音端点检测(VAD)、Paraformer-large非流式语音识别(ASR)、Paraformer-large流式语音识别(ASR)、标点恢复(PUNC) 等相关能力。软件包既可以实时地进行语音转文字,而且能够在说话句尾用高精度的转写文字修正输出,输出文字带有标点,支持高并发多路请求
本文档为FunASR离线文件转写服务开发指南。如果您想快速体验实时语音转写服务可参考[快速上手](#快速上手)。
## 快速上手
### 镜像启动
通过下述命令拉取并启动FunASR runtime-SDK的docker镜像
```shell
sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.0
```
如果您没有安装docker可参考[Docker安装](#Docker安装)
### 服务端启动
docker启动之后启动 funasr-wss-server服务程序
```shell
cd FunASR/funasr/runtime
./run_server.sh \
--download-model-dir /workspace/models \
--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
```
服务端详细参数介绍可参考[服务端参数介绍](#服务端参数介绍)
### 客户端测试与使用
下载客户端测试工具目录samples
```shell
wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz
```
我们以Python语言客户端为例进行说明支持多种音频格式输入.wav, .pcm, .mp3等也支持视频输入(.mp4等)以及多文件列表wav.scp输入其他版本客户端请参考文档[点击此处](#客户端用法详解)),定制服务部署请参考[如何定制服务部署](#如何定制服务部署)
```shell
python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass
```
------------------
## Docker安装
下述步骤为手动安装docker环境的步骤
### docker环境安装
```shell
# Ubuntu
curl -fsSL https://test.docker.com -o test-docker.sh
sudo sh test-docker.sh
# Debian
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# CentOS
curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
# MacOS
brew install --cask --appdir=/Applications docker
```
安装详见https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html
### docker启动
```shell
sudo systemctl start docker
```
## 客户端用法详解
在服务器上完成FunASR服务部署以后可以通过如下的步骤来测试和使用离线文件转写服务。
目前分别支持以下几种编程语言客户端
- [Python](#python-client)
- [CPP](#cpp-client)
- [html网页版本](#Html网页版)
- [Java](#Java-client)
### python-client
若想直接运行client进行测试可参考如下简易说明以python版本为例
```shell
python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --output_dir "./results"
```
命令参数说明:
```text
--host 为FunASR runtime-SDK服务部署机器ip默认为本机ip127.0.0.1如果client与服务不在同一台服务器需要改为部署机器ip
--port 10095 部署端口号
--mode offline表示离线文件转写
--audio_in 需要进行转写的音频文件支持文件路径文件列表wav.scp
--output_dir 识别结果保存路径
```
### cpp-client
进入samples/cpp目录后可以用cpp进行测试指令如下
```shell
./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav
```
命令参数说明:
```text
--server-ip 为FunASR runtime-SDK服务部署机器ip默认为本机ip127.0.0.1如果client与服务不在同一台服务器需要改为部署机器ip
--port 10095 部署端口号
--wav-path 需要进行转写的音频文件,支持文件路径
```
### Html网页版
在浏览器中打开 html/static/index.html即可出现如下页面支持麦克风输入与文件上传直接进行体验
<img src="images/html.png" width="900"/>
### Java-client
```shell
FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
```
详细可以参考文档([点击此处](../java/readme.md)
## 服务端参数介绍:
funasr-wss-server支持从Modelscope下载模型设置模型下载地址--download-model-dir默认为/workspace/models及model ID--model-dir、--vad-dir、--punc-dir,示例如下:
```shell
cd /workspace/FunASR/funasr/runtime/websocket/build/bin
./funasr-wss-server \
--download-model-dir /workspace/models \
--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--punc-dir damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
--decoder-thread-num 32 \
--io-thread-num 8 \
--port 10095 \
--certfile ../../../ssl_key/server.crt \
--keyfile ../../../ssl_key/server.key
```
命令参数介绍:
```text
--download-model-dir 模型下载地址通过设置model ID从Modelscope下载模型
--model-dir modelscope model ID
--quantize True为量化ASR模型False为非量化ASR模型默认是True
--vad-dir modelscope model ID
--vad-quant True为量化VAD模型False为非量化VAD模型默认是True
--punc-dir modelscope model ID
--punc-quant True为量化PUNC模型False为非量化PUNC模型默认是True
--port 服务端监听的端口号,默认为 10095
--decoder-thread-num 服务端启动的推理线程数,默认为 8
--io-thread-num 服务端启动的IO线程数默认为 1
--certfile ssl的证书文件默认为../../../ssl_key/server.crt
--keyfile ssl的密钥文件默认为../../../ssl_key/server.key
```
funasr-wss-server同时也支持从本地路径加载模型本地模型资源准备详见[模型资源准备](#模型资源准备))示例如下:
```shell
cd /workspace/FunASR/funasr/runtime/websocket/build/bin
./funasr-wss-server \
--model-dir /workspace/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--vad-dir /workspace/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--punc-dir /workspace/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
--decoder-thread-num 32 \
--io-thread-num 8 \
--port 10095 \
--certfile ../../../ssl_key/server.crt \
--keyfile ../../../ssl_key/server.key
```
命令参数介绍:
```text
--model-dir ASR模型路径默认为/workspace/models/asr
--quantize True为量化ASR模型False为非量化ASR模型默认是True
--vad-dir VAD模型路径默认为/workspace/models/vad
--vad-quant True为量化VAD模型False为非量化VAD模型默认是True
--punc-dir PUNC模型路径默认为/workspace/models/punc
--punc-quant True为量化PUNC模型False为非量化PUNC模型默认是True
--port 服务端监听的端口号,默认为 10095
--decoder-thread-num 服务端启动的推理线程数,默认为 8
--io-thread-num 服务端启动的IO线程数默认为 1
--certfile ssl的证书文件默认为../../../ssl_key/server.crt
--keyfile ssl的密钥文件默认为../../../ssl_key/server.key
```
## 模型资源准备
如果您选择通过funasr-wss-server从Modelscope下载模型可以跳过本步骤。
FunASR离线文件转写服务中的vad、asr和punc模型资源均来自Modelscope模型地址详见下表
| 模型 | Modelscope链接 |
|------|---------------------------------------------------------------------------------------------------------------|
| VAD | https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx/summary |
| ASR | https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary |
| PUNC | https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary |
离线文件转写服务中部署的是量化后的ONNX模型下面介绍下如何导出ONNX模型及其量化您可以选择从Modelscope导出ONNX模型、从finetune后的资源导出模型
### 从Modelscope导出ONNX模型
从Modelscope网站下载对应model name的模型然后导出量化后的ONNX模型
```shell
python -m funasr.export.export_model \
--export-dir ./export \
--type onnx \
--quantize True \
--model-name damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
--model-name damo/speech_fsmn_vad_zh-cn-16k-common-pytorch \
--model-name damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch
```
命令参数介绍:
```text
--model-name Modelscope上的模型名称例如damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
--export-dir ONNX模型导出地址
--type 模型类型,目前支持 ONNX、torch
--quantize int8模型量化
```
### 从finetune后的资源导出模型
假如您想部署finetune后的模型可以参考如下步骤
将您finetune后需要部署的模型例如10epoch.pb重命名为model.pb并将原modelscope中模型model.pb替换掉假如替换后的模型路径为/path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch通过下述命令把finetune后的模型转成onnx模型
```shell
python -m funasr.export.export_model --model-name /path/to/finetune/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
```
## 如何定制服务部署
FunASR-runtime的代码已开源如果服务端和客户端不能很好的满足您的需求您可以根据自己的需求进行进一步的开发
### c++ 客户端:
https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/websocket
### python 客户端:
https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket
### 自定义客户端:
如果您想定义自己的clientwebsocket通信协议为
```text
# 首次通信
{"mode": "offline", "wav_name": wav_name, "is_speaking": True}
# 发送wav数据
bytes数据
# 发送结束标志
{"is_speaking": False}
```
### c++ 服务端:
#### VAD
```c++
// VAD模型的使用分为FsmnVadInit和FsmnVadInfer两个步骤
FUNASR_HANDLE vad_hanlde=FsmnVadInit(model_path, thread_num);
// 其中model_path 包含"model-dir"、"quantize"thread_num为onnx线程数
FUNASR_RESULT result=FsmnVadInfer(vad_hanlde, wav_file.c_str(), NULL, 16000);
// 其中vad_hanlde为FunOfflineInit返回值wav_file为音频路径sampling_rate为采样率(默认16k)
```
使用示例详见https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-online-vad.cpp
#### ASR
```text
// ASR模型的使用分为FunOfflineInit和FunOfflineInfer两个步骤
FUNASR_HANDLE asr_hanlde=FunOfflineInit(model_path, thread_num);
// 其中model_path 包含"model-dir"、"quantize"thread_num为onnx线程数
FUNASR_RESULT result=FunOfflineInfer(asr_hanlde, wav_file.c_str(), RASR_NONE, NULL, 16000);
// 其中asr_hanlde为FunOfflineInit返回值wav_file为音频路径sampling_rate为采样率(默认16k)
```
使用示例详见https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-offline.cpp
#### PUNC
```text
// PUNC模型的使用分为CTTransformerInit和CTTransformerInfer两个步骤
FUNASR_HANDLE punc_hanlde=CTTransformerInit(model_path, thread_num);
// 其中model_path 包含"model-dir"、"quantize"thread_num为onnx线程数
FUNASR_RESULT result=CTTransformerInfer(punc_hanlde, txt_str.c_str(), RASR_NONE, NULL);
// 其中punc_hanlde为CTTransformerInit返回值txt_str为文本
```
使用示例详见https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/onnxruntime/bin/funasr-onnx-online-punc.cpp

View File

@ -0,0 +1,196 @@
([简体中文](./SDK_tutorial_online_zh.md)|English)
# FunASR实时语音转写便捷部署教程
FunASR offers a real-time speech-to-text service that can be easily deployed locally or on cloud servers. The service integrates various capabilities including voice activity detection (VAD) developed by the speech laboratory of DAMO Academy on the ModelScope, Paraformer-large non-streaming automatic speech recognition (ASR), Paraformer-large streaming ASR, and punctuation recovery (PUNC). The software package not only performs real-time speech-to-text conversion, but also allows high-precision transcription text correction at the end of each sentence and outputs text with punctuation, supporting high-concurrency multiple requests.
## Server Configuration
Users can choose appropriate server configurations based on their business needs. The recommended configurations are:
- Configuration 1: (X86, computing-type) 4-core vCPU, 8GB memory, and a single machine can support about 32 requests.
- Configuration 2: (X86, computing-type) 16-core vCPU, 32GB memory, and a single machine can support about 64 requests.
- Configuration 3: (X86, computing-type) 64-core vCPU, 128GB memory, and a single machine can support about 200 requests.
Detailed performance [report](./benchmark_onnx_cpp.md)
Cloud service providers offer a 3-month free trial for new users. Application tutorial ([docs](./aliyun_server_tutorial.md)).
## Quick Start
### Server Startup
`Note`: The one-click deployment tool process includes installing Docker, downloading Docker images, and starting the service. If the user wants to start from the FunASR Docker image, please refer to the development guide ([docs](./SDK_advanced_guide_online.md).
Download the deployment tool `funasr-runtime-deploy-online-cpu-zh.sh`
```shell
curl -O https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funasr/runtime/deploy_tools/funasr-runtime-deploy-online-cpu-en.sh;
# If there is a network problem, users in mainland China can use the following command:
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-en.sh;
```
Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_offline.md)).
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
```
### Client Testing and Usage
After running the above installation instructions, the client testing tool directory samples will be downloaded in the default installation directory ./funasr-runtime-resources ([download click](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz)).
We take the Python language client as an example to explain that it supports multiple audio format inputs (such as .wav, .pcm, .mp3, etc.), video inputs (.mp4, etc.), and multiple file list wav.scp inputs. For other client versions, please refer to the [documentation](#Detailed-Description-of-Client-Usage).
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass
```
## Detailed Description of Client Usage
After completing the FunASR runtime-SDK service deployment on the server, you can test and use the offline file transcription service through the following steps. Currently, the following programming language client versions are supported:
- [Python](#python-client)
- [CPP](#cpp-client)
- [html](#html-client)
- [java](#java-client)
For more client version support, please refer to the [websocket_protocol](./websocket_protocol_zh.md).
### python-client
If you want to run the client directly for testing, you can refer to the following simple instructions, using the Python version as an example:
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```
Command parameter instructions:
```text
--host is the IP address of the FunASR runtime-SDK service deployment machine, which defaults to the local IP address (127.0.0.1). If the client and the service are not on the same server, it needs to be changed to the deployment machine IP address.
--port 10095 deployment port number
--mode: `offline` indicates that the inference mode is one-sentence recognition; `online` indicates that the inference mode is real-time speech recognition; `2pass` indicates real-time speech recognition, and offline models are used for error correction at the end of each sentence.
--chunk_size: indicates the latency configuration of the streaming model. [5,10,5] indicates that the current audio is 600ms, with a lookback of 300ms and a lookahead of 300ms.
--audio_in is the audio file that needs to be transcribed, supporting file paths and file list wav.scp
--thread_num sets the number of concurrent sending threads, default is 1
--ssl sets whether to enable SSL certificate verification, default is 1 to enable, and 0 to disable
```
### cpp-client
After entering the samples/cpp directory, you can test it with CPP. The command is as follows:
```shell
./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav
```
Command parameter description:
```text
--server-ip specifies the IP address of the machine where the FunASR runtime-SDK service is deployed. The default value is the local IP address (127.0.0.1). If the client and the service are not on the same server, the IP address needs to be changed to the IP address of the deployment machine.
--port specifies the deployment port number as 10095.
--mode: `offline` indicates that the inference mode is one-sentence recognition; `online` indicates that the inference mode is real-time speech recognition; `2pass` indicates real-time speech recognition, and offline models are used for error correction at the end of each sentence.
--chunk_size: indicates the latency configuration of the streaming model. [5,10,5] indicates that the current audio is 600ms, with a lookback of 300ms and a lookahead of 300ms.
--wav-path specifies the audio file to be transcribed, and supports file paths.
--thread_num sets the number of concurrent send threads, with a default value of 1.
--ssl sets whether to enable SSL certificate verification, with a default value of 1 for enabling and 0 for disabling.
```
### html-client
To experience it directly, open `html/static/index.html` in your browser. You will see the following page, which supports microphone input and file upload.
<img src="images/html.png" width="900"/>
### java-client
```shell
FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
```
For more details, please refer to the [docs](../java/readme.md)
## Server Usage Details
### Start the deployed FunASR service
If you have restarted the computer or shut down Docker after one-click deployment, you can start the FunASR service directly with the following command. The startup configuration is the same as the last one-click deployment.
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh start
```
### Set SSL
SSL verification is enabled by default. If you need to disable it, you can set it when starting.
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh --ssl 0
```
### Stop the FunASR service
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh stop
```
### Release the FunASR service
Release the deployed FunASR service.
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh remove
```
### Restart the FunASR service
Restart the FunASR service with the same configuration as the last one-click deployment.
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh restart
```
### Replace the model and restart the FunASR service
Replace the currently used model, and restart the FunASR service. The model must be an ASR/VAD/PUNC model in ModelScope, or a finetuned model from ModelScope.
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--asr_model | --vad_model | --punc_model] <model_id or local model path>
e.g
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update --asr_model damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
```
### Update parameters and restart the FunASR service
Update the configured parameters and restart the FunASR service to take effect. The parameters that can be updated include the host and Docker port numbers, as well as the number of inference and IO threads.
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--host_port | --docker_port] <port number>
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--decode_thread_num | --io_thread_num] <the number of threads>
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--workspace] <workspace in local>
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--ssl] <0: close SSL; 1: open SSL, default:1>
e.g
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update --decode_thread_num 32
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update --workspace ./funasr-runtime-resources
```
## Contact Us
If you encounter any problems during use, please join our user group for feedback.
| DingDing Group | Wechat |
|:----------------------------------------------------------------------------:|:--------------------------------------------------------------:|
| <div align="left"><img src="../../../docs/images/dingding.jpg" width="250"/> | <img src="../../../docs/images/wechat.png" width="232"/></div> |

View File

@ -115,6 +115,13 @@ FunasrWsClient --host localhost --port 10095 --mode 2pass
sudo bash funasr-runtime-deploy-online-cpu-zh.sh start
```
### 设置SSL
默认开启SSL校验如果需要关闭可以在启动时设置
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh start --ssl 0
```
### 关闭FunASR服务
```shell
@ -162,39 +169,6 @@ sudo bash funasr-runtime-deploy-online-cpu-zh.sh update --workspace ./funasr-run
```
## 服务端启动过程配置详解
### 选择FunASR Docker镜像
推荐选择1)使用我们的最新发布版镜像,也可选择历史版本。
```text
[1/5]
Getting the list of docker images, please wait a few seconds.
[DONE]
Please choose the Docker image.
1) registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0
Enter your choice, default(1):
You have chosen the Docker image: registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0
```
### 设置宿主机提供给FunASR的端口
设置提供给Docker的宿主机端口默认为10095。请保证此端口可用。
```text
[2/5]
Please input the opened port in the host used for FunASR server.
Setting the opened host port [1-65535], default(10095):
The port of the host is 10095
The port in Docker for FunASR server is 10095
```
### 设置SSL
默认开启SSL校验如果需要关闭可以在启动时设置
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh start --ssl 0
```
## 联系我们
在您使用过程中,如果遇到问题,欢迎加入用户群进行反馈

View File

@ -52,7 +52,7 @@ message为采用json序列化
#### 首次通信
message为需要用json序列化
```text
{"mode": "offline", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5]
{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5]
```
参数介绍:
```text

View File

@ -5,11 +5,29 @@ FunASR is a speech recognition framework developed by the Speech Lab of DAMO Aca
It has attracted many developers to participate in experiencing and developing. To solve the last mile of industrial landing and integrate models into business, we have developed the FunASR runtime-SDK. The SDK supports several service deployments, including:
- File transcription service, Mandarin, CPU version, done
- The real-time transcription service, Mandarin (CPU), done
- File transcription service, Mandarin, GPU version, in progress
- File transcription service, English, in progress
- Streaming speech recognition service, is in progress
- and more.
## The real-time transcription service, Mandarin (CPU)
The FunASR real-time speech-to-text service software package not only performs real-time speech-to-text conversion, but also allows high-precision transcription text correction at the end of each sentence and outputs text with punctuation, supporting high-concurrency multiple requests.
In order to meet the needs of different users for different scenarios, different tutorials are prepared:
### Convenient Deployment Tutorial
This is suitable for scenarios where there is no need to modify the service deployment SDK and the deployed model comes from ModelScope or is finetuned by the user. For detailed tutorials, please refer to [docs](./docs/SDK_tutorial_online.md)
### Development Guide
This is suitable for scenarios where there is a need to modify the service deployment SDK and the deployed model comes from ModelScope or is finetuned by the user. For detailed documentation, please refer to [docs](./docs/SDK_advanced_guide_online.md)
### Technology Principles Revealed
The document introduces the technology principles behind the service, recognition accuracy, computing efficiency, and core advantages: convenience, high precision, high efficiency, and long audio chain. For detailed documentation, please refer to [docs]().
## File Transcription Service, Mandarin (CPU)