speech_recognition/FunASR

Fork 0

mirror of https://github.com/modelscope/FunASR synced 2025-09-15 14:48:36 +08:00

zhifu gao 99a6d81160

Update quick_start.md

2023-11-21 19:19:16 +08:00

6.9 KiB

Raw Blame History

(简体中文|English)

Quick Start

You can use FunASR in the following ways:

Service Deployment SDK
Industrial model egs
Academic model egs

Service Deployment SDK

Python version Example

Supports real-time streaming speech recognition, uses non-streaming models for error correction, and outputs text with punctuation. Currently, only single client is supported. For multi-concurrency, please refer to the C++ version service deployment SDK below.

Server Deployment

cd runtime/python/websocket
python funasr_wss_server.py --port 10095

Client Testing

python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"

For more examples, please refer to docs.

Service Deployment Software

Both high-precision, high-efficiency, and high-concurrency file transcription, as well as low-latency real-time speech recognition, are supported. It also supports Docker deployment and multiple concurrent requests.

Docker Installation (optional)

If you have already installed Docker, skip this step.

curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
sudo bash install_docker.sh

Real-time Speech Recognition Service Deployment

Docker Image Download and Launch

Use the following command to pull and launch the FunASR software package Docker image（Get the latest image version）：

sudo docker pull \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.5
mkdir -p ./funasr-runtime-resources/models
sudo docker run -p 10096:10095 -it --privileged=true \
  -v $PWD/funasr-runtime-resources/models:/workspace/models \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.5

Server Start

After Docker is started, start the funasr-wss-server-2pass service program:

cd FunASR/runtime
nohup bash run_server_2pass.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \
  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
  --itn-dir thuduj12/fst_itn_zh \
  --hotword /workspace/models/hotwords.txt > log.out 2>&1 &

# If you want to disable SSL, add the parameter: --certfile 0
# If you want to deploy with a timestamp or nn hotword model, please set --model-dir to the corresponding model:
#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx (timestamp)
#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx (nn hotword)
# If you want to load hotwords on the server side, please configure the hotwords in the host file ./funasr-runtime-resources/models/hotwords.txt (docker mapping address is /workspace/models/hotwords.txt):
#   One hotword per line, format (hotword weight): Alibaba 20

Client Testing

Testing samples

python3 funasr_wss_client.py --host "127.0.0.1" --port 10096 --mode 2pass

For more examples, please refer to docs

File Transcription Service, Mandarin (CPU)

Docker Image Download and Launch

Use the following command to pull and launch the FunASR software package Docker image（Get the latest image version）：

sudo docker pull \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.3.0
mkdir -p ./funasr-runtime-resources/models
sudo docker run -p 10095:10095 -it --privileged=true \
  -v $PWD/funasr-runtime-resources/models:/workspace/models \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.3.0

Server Start

After Docker is started, start the funasr-wss-server service program:

cd FunASR/runtime
nohup bash run_server.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
  --itn-dir thuduj12/fst_itn_zh \
  --hotword /workspace/models/hotwords.txt > log.out 2>&1 &

# If you want to disable SSL, add the parameter: --certfile 0
# If you want to use timestamp or nn hotword models for deployment, please set --model-dir to the corresponding model:
#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx (timestamp)
#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx (nn hotword)
# If you want to load hotwords on the server side, please configure the hotwords in the host machine file ./funasr-runtime-resources/models/hotwords.txt (docker mapping address is /workspace/models/hotwords.txt):
#   One hotword per line, format (hotword weight): Alibaba 20

Client Testing

Testing samples

python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"

For more examples, please refer to docs

Industrial Model Egs

If you want to use the pre-trained industrial models in ModelScope for inference or fine-tuning training, you can refer to the following command:

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
)

rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
print(rec_result)
# {'text': '欢迎大家来体验达摩院推出的语音识别模型'}

More examples could be found in docs

Academic model egs

If you want to train from scratch, usually for academic models, you can start training and inference with the following command:

cd egs/aishell/paraformer
. ./run.sh --CUDA_VISIBLE_DEVICES="0,1" --gpu_num=2

More examples could be found in docs

6.9 KiB Raw Blame History Unescape Escape

Quick Start

Service Deployment SDK

Python version Example

Server Deployment

Client Testing

Service Deployment Software

Docker Installation (optional)

If you have already installed Docker, skip this step.

Real-time Speech Recognition Service Deployment

Docker Image Download and Launch

Server Start

Client Testing

File Transcription Service, Mandarin (CPU)

Docker Image Download and Launch

Server Start

Client Testing

Industrial Model Egs

Academic model egs

6.9 KiB

Raw Blame History