FunASR

mirror of https://github.com/modelscope/FunASR synced 2025-09-15 14:48:36 +08:00

History

root 554a99600c add README		2023-02-27 10:14:55 +00:00
..
client	add triton	2023-02-27 09:57:53 +00:00
Dockerfile	add triton	2023-02-27 09:57:53 +00:00
model_repo_paraformer_large_offline	add triton	2023-02-27 09:57:53 +00:00
README.md	add README	2023-02-27 10:14:55 +00:00

README.md

Inference with Triton

Steps:

Refer here to get model.onnx
Follow below instructions to using triton

# using docker image Dockerfile/Dockerfile.server
docker build . -f Dockerfile/Dockerfile.server -t triton-paraformer:23.01 
docker run -it --rm --name "paraformer_triton_server" --gpus all -v <path_host/funasr/runtime/>:/workspace --shm-size 1g --net host triton-paraformer:23.01 
# inside the docker container, prepare previous exported model.onnx
mv <path_model.onnx> /workspace/triton_gpu/model_repo_paraformer_large_offline/encoder/1/

model_repo_paraformer_large_offline/
|-- encoder
|   |-- 1
|   |   `-- model.onnx
|   `-- config.pbtxt
|-- feature_extractor
|   |-- 1
|   |   `-- model.py
|   |-- config.pbtxt
|   `-- config.yaml
|-- infer_pipeline
|   |-- 1
|   `-- config.pbtxt
`-- scoring
    |-- 1
    |   `-- model.py
    |-- config.pbtxt
    `-- token_list.pkl

8 directories, 9 files

# launch the service 
tritonserver --model-repository ./model_repo_paraformer_large_offline \
             --pinned-memory-pool-byte-size=512000000 \
             --cuda-memory-pool-byte-size=0:1024000000

Performance benchmark

Benchmark speech_paraformer based on Aishell1 test set with a single V100, the total audio duration is 36108.919 seconds.

(Note: The service has been fully warm up.)

concurrent-tasks	processing time(s)	RTF
60 (onnx fp32)	116.0	0.0032

Acknowledge

This part originates from NVIDIA CISI project. We also have TTS and NLP solutions deployed on triton inference server. If you are interested, please contact us.