mirror of https://github.com/modelscope/FunASR synced 2025-09-15 14:48:36 +08:00

coauthor:duj12, add itn;add timestamp、hotword to 2pass; (#966 )

* Add ITN，include openfst/gflags in onnxruntime/third_party.

* 2pass server support Hotword and Timestamp. The start_time of each segment need to be fix.

* add global time start and end of each frame(both online and offline), support two-pass timestamp(both segment and token level).

* update websocket cmake.

* 2pass server support itn, hw and tp.

* Add local build and run. Add timestamp in 2pass server, update cmakelist.

* fix filemode bug in h5, avoid 2pass wss server close before final.

* offline server add itn.

* offline server add ITN.

* update hotword model dir.

* Add Acknowledgement to WeTextProcessing(https://github.com/wenet-e2e/WeTextProcessing)

* adapted to original FunASR.

* adapted to itn timestamp hotword

* merge from main (#949)

* fix empty timestamp list inference

* punc large

* fix decoding_ind none bug

* fix decoding_ind none bug

* docs

* setup

* change eng punc in offline model

* update contextual export

* update proc for oov in hotword onnx inference

* add python http code (#940)

* funasr-onnx 0.2.2

* funasr-onnx 0.2.3

* bug fix in timestamp inference

* fix bug in timestamp inference

* Update preprocessor.py

---------

Co-authored-by: shixian.shi <shixian.shi@alibaba-inc.com>
Co-authored-by: 游雁 <zhifu.gzf@alibaba-inc.com>
Co-authored-by: haoneng.lhn <haoneng.lhn@alibaba-inc.com>
Co-authored-by: mengzhe.cmz <mengzhe.cmz@alibaba-inc.com>
Co-authored-by: Xian Shi <40013335+R1ckShi@users.noreply.github.com>
Co-authored-by: chenmengzheAAA <123789350+chenmengzheAAA@users.noreply.github.com>
Co-authored-by: 夜雨飘零 <yeyupiaoling@foxmail.com>

* update docs

* update deploy_tools

---------

Co-authored-by: dujing <dujing@xmov.ai>
Co-authored-by: Jean Du <37294470+duj12@users.noreply.github.com>
Co-authored-by: shixian.shi <shixian.shi@alibaba-inc.com>
Co-authored-by: 游雁 <zhifu.gzf@alibaba-inc.com>
Co-authored-by: haoneng.lhn <haoneng.lhn@alibaba-inc.com>
Co-authored-by: mengzhe.cmz <mengzhe.cmz@alibaba-inc.com>
Co-authored-by: Xian Shi <40013335+R1ckShi@users.noreply.github.com>
Co-authored-by: chenmengzheAAA <123789350+chenmengzheAAA@users.noreply.github.com>
Co-authored-by: 夜雨飘零 <yeyupiaoling@foxmail.com>

2023-09-19 10:09:58 +08:00

10 KiB

Raw Blame History

(简体中文|English)

FunASR Realtime Transcribe Service

FunASR offers a real-time speech-to-text service that can be easily deployed locally or on cloud servers. The service integrates various capabilities developed by the speech laboratory of DAMO Academy on the ModelScope, including voice activity detection (VAD), Paraformer-large non-streaming automatic speech recognition (ASR), Paraformer-large streaming ASR, and punctuation prediction (PUNC). The software package supports realtime speech-to-text service as well as high-precision transcription text correction at the end of each sentence and outputs text with punctuation.

Server Configurations

Users can choose appropriate server configurations based on their business needs. The recommended configurations are:

Configuration 1: (X86, computing-type) 4-core vCPU, 8GB memory, and a single machine can support about 32 requests.
Configuration 2: (X86, computing-type) 16-core vCPU, 32GB memory, and a single machine can support about 64 requests.
Configuration 3: (X86, computing-type) 64-core vCPU, 128GB memory, and a single machine can support about 200 requests.

Detailed performance report

Cloud service providers offer a 3-month free trial for new users. Application tutorial (docs).

Quick Start

Server Startup

Note: The one-click deployment tool process includes installing Docker, downloading Docker images, and starting the service. If the user wants to start from the FunASR Docker image, please refer to the development guide (docs.

Download the deployment tool funasr-runtime-deploy-online-cpu-zh.sh

curl -O https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funasr/runtime/deploy_tools/funasr-runtime-deploy-online-cpu-zh.sh;
# If there is a network problem, users in mainland China can use the following command:
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-zh.sh;

Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide (docs).

sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources

Note: If you need to deploy the timestamp model or hotword model, select the corresponding model in step 2 of the installation and deployment process, where 1 is the paraformer-large model, 2 is the paraformer-large timestamp model, and 3 is the paraformer-large hotword model.

Client Testing and Usage

After running the above installation instructions, the client testing tool directory samples will be downloaded in the default installation directory ./funasr-runtime-resources (download click). We take the Python language client as an example to explain that it supports multiple audio format inputs (such as .wav, .pcm, .mp3, etc.), video inputs (.mp4, etc.), and multiple file list wav.scp inputs. For other client versions, please refer to the documentation.

python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass

Detailed Description of Client Usage

After completing the FunASR runtime-SDK service deployment on the server, you can test and use the offline file transcription service through the following steps. Currently, the following programming language client versions are supported:

Python
CPP
html
java

For more client version support, please refer to the websocket_protocol.

python-client

If you want to run the client directly for testing, you can refer to the following simple instructions, using the Python version as an example:

python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"

Command parameter instructions:

--host is the IP address of the FunASR runtime-SDK service deployment machine, which defaults to the local IP address (127.0.0.1). If the client and the service are not on the same server, it needs to be changed to the deployment machine IP address.
--port 10095 deployment port number
--mode: `offline` indicates that the inference mode is one-sentence recognition; `online` indicates that the inference mode is real-time speech recognition; `2pass` indicates real-time speech recognition, and offline models are used for error correction at the end of each sentence.
--chunk_size: indicates the latency configuration of the streaming model. [5,10,5] indicates that the current audio is 600ms, with a lookback of 300ms and a lookahead of 300ms.
--audio_in is the audio file that needs to be transcribed, supporting file paths and file list wav.scp
--thread_num sets the number of concurrent sending threads, default is 1
--ssl sets whether to enable SSL certificate verification, default is 1 to enable, and 0 to disable
--hotword: If am is hotword model, setting hotword: *.txt(one hotword perline) or hotwords seperate by space (could be: 阿里巴巴 达摩院)
--use_itn: whether to use itn, the default value is 1 for enabling and 0 for disabling.

cpp-client

After entering the samples/cpp directory, you can test it with CPP. The command is as follows:

./funasr-wss-client-2pass --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav

Command parameter description:

--server-ip specifies the IP address of the machine where the FunASR runtime-SDK service is deployed. The default value is the local IP address (127.0.0.1). If the client and the service are not on the same server, the IP address needs to be changed to the IP address of the deployment machine.
--port specifies the deployment port number as 10095.
--mode: `offline` indicates that the inference mode is one-sentence recognition; `online` indicates that the inference mode is real-time speech recognition; `2pass` indicates real-time speech recognition, and offline models are used for error correction at the end of each sentence.
--chunk-size: indicates the latency configuration of the streaming model. [5,10,5] indicates that the current audio is 600ms, with a lookback of 300ms and a lookahead of 300ms.
--wav-path specifies the audio file to be transcribed, and supports file paths.
--threa-num sets the number of concurrent send threads, with a default value of 1.
--is-ssl sets whether to enable SSL certificate verification, with a default value of 1 for enabling and 0 for disabling.
--hotword: If am is hotword model, setting hotword: *.txt(one hotword perline) or hotwords seperate by space (could be: 阿里巴巴 达摩院)
--use-itn: whether to use itn, the default value is 1 for enabling and 0 for disabling.

html-client

To experience it directly, open html/static/index.html in your browser. You will see the following page, which supports microphone input and file upload.

java-client

FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline

For more details, please refer to the docs

Server Usage Details

Start the deployed FunASR service

If you have restarted the computer or shut down Docker after one-click deployment, you can start the FunASR service directly with the following command. The startup configuration is the same as the last one-click deployment.

sudo bash funasr-runtime-deploy-online-cpu-zh.sh start

Stop the FunASR service

sudo bash funasr-runtime-deploy-online-cpu-zh.sh stop

Release the FunASR service

Release the deployed FunASR service.

sudo bash funasr-runtime-deploy-online-cpu-zh.sh remove

Restart the FunASR service

Restart the FunASR service with the same configuration as the last one-click deployment.

sudo bash funasr-runtime-deploy-online-cpu-zh.sh restart

Replace the model and restart the FunASR service

Replace the currently used model, and restart the FunASR service. The model must be an ASR/VAD/PUNC model in ModelScope, or a finetuned model from ModelScope.

sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--asr_model | --vad_model | --punc_model] <model_id or local model path>

e.g
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update --asr_model damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch

Update parameters and restart the FunASR service

Update the configured parameters and restart the FunASR service to take effect. The parameters that can be updated include the host and Docker port numbers, as well as the number of inference and IO threads.

sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--host_port | --docker_port] <port number>
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--decode_thread_num | --io_thread_num] <the number of threads>
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--workspace] <workspace in local>
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update [--ssl] <0: close SSL; 1: open SSL, default:1>

e.g
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update --decode_thread_num 32
sudo bash funasr-runtime-deploy-online-cpu-zh.sh update --workspace ./funasr-runtime-resources

Set SSL

SSL verification is enabled by default. If you need to disable it, you can set it when starting.

sudo bash funasr-runtime-deploy-online-cpu-zh.sh update --ssl 0

Contact Us

If you encounter any problems during use, please join our user group for feedback.

DingDing Group	Wechat

10 KiB Raw Blame History