FunASR/README.md at 4e4eed605e4abf5aef7b912e559dc41915455f73

speech_recognition/FunASR

Fork 0

mirror of https://github.com/modelscope/FunASR synced 2025-09-15 14:48:36 +08:00

root 554a99600c add README

2023-02-27 10:14:55 +00:00

1.8 KiB

Raw Blame History

Inference with Triton

Steps:

Refer here to get model.onnx
Follow below instructions to using triton

# using docker image Dockerfile/Dockerfile.server
docker build . -f Dockerfile/Dockerfile.server -t triton-paraformer:23.01 
docker run -it --rm --name "paraformer_triton_server" --gpus all -v <path_host/funasr/runtime/>:/workspace --shm-size 1g --net host triton-paraformer:23.01 
# inside the docker container, prepare previous exported model.onnx
mv <path_model.onnx> /workspace/triton_gpu/model_repo_paraformer_large_offline/encoder/1/

model_repo_paraformer_large_offline/
|-- encoder
|   |-- 1
|   |   `-- model.onnx
|   `-- config.pbtxt
|-- feature_extractor
|   |-- 1
|   |   `-- model.py
|   |-- config.pbtxt
|   `-- config.yaml
|-- infer_pipeline
|   |-- 1
|   `-- config.pbtxt
`-- scoring
    |-- 1
    |   `-- model.py
    |-- config.pbtxt
    `-- token_list.pkl

8 directories, 9 files

# launch the service 
tritonserver --model-repository ./model_repo_paraformer_large_offline \
             --pinned-memory-pool-byte-size=512000000 \
             --cuda-memory-pool-byte-size=0:1024000000

Performance benchmark

Benchmark speech_paraformer based on Aishell1 test set with a single V100, the total audio duration is 36108.919 seconds.

(Note: The service has been fully warm up.)

concurrent-tasks	processing time(s)	RTF
60 (onnx fp32)	116.0	0.0032

Acknowledge

This part originates from NVIDIA CISI project. We also have TTS and NLP solutions deployed on triton inference server. If you are interested, please contact us.

1.8 KiB Raw Blame History

Inference with Triton

Steps:

Performance benchmark

Acknowledge

1.8 KiB

Raw Blame History