GPU Benchmark (libtorch-cpp)

Configuration

Data set:

A long audio test set(Non-open source) containing 103 audio files, with durations ranging from 2 to 30 minutes.

FSMN-VAD + Paraformer-large + CT-Transformer

./funasr-onnx-offline-rtf \
    --model-dir    ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
    --vad-dir   ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
    --punc-dir  ./damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
    --gpu \
    --thread-num 20 \
    --bladedisc true \
    --batch-size 20 \
    --wav-path     ./long_test.scp

Node: run in docker, ref to (docs)

Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni, GPU @ A10

concurrent-tasks	batch	RTF	Speedup Rate
1	1	0.0076	130
1	20	0.0048	208
5	20	0.0011	850
10	20	0.0008	1200+
20	20	0.0008	1200+

Node: On CPUs, the single-thread RTF is 0.066, and 32-threads' speedup is 330+

1.5 KiB Raw Blame History

GPU Benchmark (libtorch-cpp)

Configuration

Data set:

FSMN-VAD + Paraformer-large + CT-Transformer

Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni, GPU @ A10

1.5 KiB

Raw Blame History