# GPU Benchmark (libtorch-cpp)

## Configuration
### Data set:
A long audio test set(Non-open source) containing 103 audio files, with durations ranging from 2 to 30 minutes.

## [FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary) + [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript/summary) + [CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary) 

```shell
./funasr-onnx-offline-rtf \
    --model-dir    ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-torchscript \
    --vad-dir   ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
    --punc-dir  ./damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
    --gpu \
    --thread-num 20 \
    --bladedisc true \
    --batch-size 20 \
    --wav-path     ./long_test.scp
```
Node: run in docker, ref to ([docs](./SDK_advanced_guide_offline_gpu_zh.md))

### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni, GPU @ A10

| concurrent-tasks | batch  |   RTF  | Speedup Rate |
|------------------|:------:|:------:|:------------:|
| 1                |   1    | 0.0076 |      130     |
| 1                |   20   | 0.0048 |      208     |
| 5                |   20   | 0.0011 |      850     |
| 10               |   20   | 0.0008 |      1200+   |
| 20               |   20   | 0.0008 |      1200+   |

Node: On CPUs, the single-thread RTF is 0.066, and 32-threads' speedup is 330+