mirror of
https://github.com/modelscope/FunASR
synced 2025-09-15 14:48:36 +08:00
|
|
||
|---|---|---|
| .. | ||
| client | ||
| Dockerfile | ||
| model_repo_paraformer_large_offline | ||
| model_repo_paraformer_large_online | ||
| model_repo_sense_voice_small | ||
| docker-compose.yml | ||
| README_paraformer_offline.md | ||
| README_paraformer_online.md | ||
| README.md | ||
Triton Inference Serving Best Practice for SenseVoice
Quick Start
Directly launch the service using docker compose.
docker compose up --build
Build Image
Build the docker image from scratch.
# build from scratch, cd to the parent dir of Dockerfile.server
docker build . -f Dockerfile/Dockerfile.sensevoice -t soar97/triton-sensevoice:24.05
Create Docker Container
your_mount_dir=/mnt:/mnt
docker run -it --name "sensevoice-server" --gpus all --net host -v $your_mount_dir --shm-size=2g soar97/triton-sensevoice:24.05
Export SenseVoice Model to Onnx
Please follow the official guide of FunASR to export the sensevoice onnx file. Also, you need to download the tokenizer file by yourself.
Launch Server
Log of directory tree:
model_repo_sense_voice_small
|-- encoder
| |-- 1
| | `-- model.onnx -> /your/path/model.onnx
| `-- config.pbtxt
|-- feature_extractor
| |-- 1
| | `-- model.py
| |-- am.mvn
| |-- config.pbtxt
| `-- config.yaml
|-- scoring
| |-- 1
| | `-- model.py
| |-- chn_jpn_yue_eng_ko_spectok.bpe.model -> /your/path/chn_jpn_yue_eng_ko_spectok.bpe.model
| `-- config.pbtxt
`-- sensevoice
|-- 1
`-- config.pbtxt
8 directories, 10 files
# launch the service
tritonserver --model-repository /workspace/model_repo_sensevoice_small \
--pinned-memory-pool-byte-size=512000000 \
--cuda-memory-pool-byte-size=0:1024000000
Benchmark using Dataset
git clone https://github.com/yuekaizhang/Triton-ASR-Client.git
cd Triton-ASR-Client
num_task=32
python3 client.py \
--server-addr localhost \
--server-port 10086 \
--model-name sensevoice \
--compute-cer \
--num-tasks $num_task \
--batch-size 16 \
--manifest-dir ./datasets/aishell1_test
Benchmark results below were based on Aishell1 test set with a single V100, the total audio duration is 36108.919 seconds.
| concurrent-tasks | batch-size-per-task | processing time(s) | RTF |
|---|---|---|---|
| 32 (onnx fp32) | 16 | 67.09 | 0.0019 |
| 32 (onnx fp32) | 1 | 82.04 | 0.0023 |
(Note: for batch-size-per-task=1 cases, tritonserver could use dynamic batching to improve throughput.)
Acknowledge
This part originates from NVIDIA CISI project. We also have TTS and NLP solutions deployed on triton inference server. If you are interested, please contact us.