FunASR/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common
2023-03-24 11:11:41 +08:00
..
infer_online.py support max_end_sil 2023-03-24 11:11:41 +08:00
infer.py Update infer.py 2023-03-14 13:19:24 +08:00
README.md fix vad results bug 2023-02-16 22:11:18 +08:00

ModelScope Model

How to finetune and infer using a pretrained ModelScope Model

Inference

Or you can use the finetuned model for inference directly.

  • Setting parameters in infer.py

    • audio_in: # support wav, url, bytes, and parsed audio format.
    • output_dir: # If the input format is wav.scp, it needs to be set.
  • Then you can run the pipeline to infer with:

    python infer.py

Modify inference related parameters in vad.yaml.

  • max_end_silence_time: The end-point silence duration to judge the end of sentence, the parameter range is 500ms~6000ms, and the default value is 800ms
  • speech_noise_thres: The balance of speech and silence scores, the parameter range is (-1,1)
    • The value tends to -1, the greater probability of noise being judged as speech
    • The value tends to 1, the greater probability of speech being judged as noise