FunASR/funasr/runtime/docs/SDK_advanced_guide_online.md
Yabin Li 61ed60695a
coauthor:duj12, add itn;add timestamp、hotword to 2pass; (#966)
* Add ITN,include openfst/gflags in onnxruntime/third_party.

* 2pass server support Hotword and Timestamp. The start_time of each segment need to be fix.

* add global time start and end of each frame(both online and offline), support two-pass timestamp(both segment and token level).

* update websocket cmake.

* 2pass server support itn, hw and tp.

* Add local build and run. Add timestamp in 2pass server, update cmakelist.

* fix filemode bug in h5, avoid 2pass wss server close before final.

* offline server add itn.

* offline server add ITN.

* update hotword model dir.

* Add Acknowledgement to WeTextProcessing(https://github.com/wenet-e2e/WeTextProcessing)

* adapted to original FunASR.

* adapted to itn timestamp hotword

* merge from main (#949)

* fix empty timestamp list inference

* punc large

* fix decoding_ind none bug

* fix decoding_ind none bug

* docs

* setup

* change eng punc in offline model

* update contextual export

* update proc for oov in hotword onnx inference

* add python http code (#940)

* funasr-onnx 0.2.2

* funasr-onnx 0.2.3

* bug fix in timestamp inference

* fix bug in timestamp inference

* Update preprocessor.py

---------

Co-authored-by: shixian.shi <shixian.shi@alibaba-inc.com>
Co-authored-by: 游雁 <zhifu.gzf@alibaba-inc.com>
Co-authored-by: haoneng.lhn <haoneng.lhn@alibaba-inc.com>
Co-authored-by: mengzhe.cmz <mengzhe.cmz@alibaba-inc.com>
Co-authored-by: Xian Shi <40013335+R1ckShi@users.noreply.github.com>
Co-authored-by: chenmengzheAAA <123789350+chenmengzheAAA@users.noreply.github.com>
Co-authored-by: 夜雨飘零 <yeyupiaoling@foxmail.com>

* update docs

* update deploy_tools

---------

Co-authored-by: dujing <dujing@xmov.ai>
Co-authored-by: Jean Du <37294470+duj12@users.noreply.github.com>
Co-authored-by: shixian.shi <shixian.shi@alibaba-inc.com>
Co-authored-by: 游雁 <zhifu.gzf@alibaba-inc.com>
Co-authored-by: haoneng.lhn <haoneng.lhn@alibaba-inc.com>
Co-authored-by: mengzhe.cmz <mengzhe.cmz@alibaba-inc.com>
Co-authored-by: Xian Shi <40013335+R1ckShi@users.noreply.github.com>
Co-authored-by: chenmengzheAAA <123789350+chenmengzheAAA@users.noreply.github.com>
Co-authored-by: 夜雨飘零 <yeyupiaoling@foxmail.com>
2023-09-19 10:09:58 +08:00

5.5 KiB
Raw Blame History

Real-time Speech Transcription Service Development Guide

FunASR provides a real-time speech transcription service that can be easily deployed on local or cloud servers, with the FunASR runtime-SDK as the core. It integrates the speech endpoint detection (VAD), Paraformer-large non-streaming speech recognition (ASR), Paraformer-large streaming speech recognition (ASR), punctuation (PUNC), and other related capabilities open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. The software package can perform real-time speech-to-text transcription, and can also accurately transcribe text at the end of sentences for high-precision output. The output text contains punctuation and supports high-concurrency multi-channel requests.

Quick Start

Pull Docker Image

Use the following command to pull and start the FunASR software package docker image:

sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.1
mkdir -p ./funasr-runtime-resources/models
sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.1

If you do not have Docker installed, please refer to Docker Installation

Launching the Server

After Docker is launched, start the funasr-wss-server-2pass service program:

cd FunASR/funasr/runtime
nohup bash run_server_2pass.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \
  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
  --itn-dir thuduj12/fst_itn_zh > log.out 2>&1 &

# If you want to close sslplease add--certfile 0

For a more detailed description of server parameters, please refer to Server Introduction

Client Testing and Usage

Download the client testing tool directory samples:

wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz

For illustration, we will use the Python language client, which supports audio formats (.wav, .pcm) and a multi-file list wav.scp input. For other client versions, please refer to the documentation.

python3 wss_client_asr.py --host "127.0.0.1" --port 10095 --mode 2pass

Client Usage Details

After completing the FunASR service deployment on the server, you can test and use the offline file transcription service by following these steps. Currently, the following programming language client versions are supported:

For more detailed usage, please click on the links above. For more client version support, please refer to WebSocket/GRPC Protocol.

Server Introduction:

funasr-wss-server-2pass supports downloading models from Modelscope or starting from a local directory path, as shown below:

cd /workspace/FunASR/funasr/runtime/websocket/build/bin
./funasr-wss-server-2pass  \
  --decoder-thread-num 32 \
  --io-thread-num  8 \
  --port 10095 

Command parameter introduction:

--download-model-dir Model download address, download models from Modelscope by setting model id
--model-dir modelscope model ID
--online-model-dir modelscope model ID
--quantize True for quantized ASR models, False for non-quantized ASR models, default is True
--vad-dir modelscope model ID
--vad-quant True for quantized VAD models, False for non-quantized VAD models, default is True
--punc-dir modelscope model ID
--punc-quant True for quantized PUNC models, False for non-quantized PUNC models, default is True
--itn-dir modelscope model ID
--port Port number that the server should listen on, default is 10095
--decoder-thread-num The number of inference threads the server should start, default is 8
--io-thread-num The number of IO threads the server should start, default is 1
--certfile SSL certificate file, the default is: ../../../ssl_key/server.crt, set to "" to disable
--keyfile SSL key file, the default is: ../../../ssl_key/server.key, set to "" to disable

After executing the above command, the real-time speech transcription service will be started. If the model is specified as a ModelScope model id, the following models will be automatically downloaded from ModelScope: FSMN-VAD model Paraformer-lagre online Paraformer-lagre CT-Transformer FST-ITN

If you wish to deploy your fine-tuned model (e.g., 10epoch.pb), you need to manually rename the model to model.pb and replace the original model.pb in ModelScope. Then, specify the path as model_dir.