Speech Recognition

.. HINT::

The modelscope pipeline supports all the models in [model zoo] to inference and finetine. Here we take model of Paraformer and Paraformer-online as example to demonstrate the usage.

Inference

Quick start

Paraformer model

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
)

rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
print(rec_result)

Paraformer-online model

inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online',
    )
import soundfile
speech, sample_rate = soundfile.read("example/asr_example.wav")

param_dict = {"cache": dict(), "is_final": False}
chunk_stride = 7680# 480ms
# first chunk, 480ms
speech_chunk = speech[0:chunk_stride] 
rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
# next chunk, 480ms
speech_chunk = speech[chunk_stride:chunk_stride+chunk_stride]
rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)

print(rec_result)

Full code of demo, please ref to demo

API-reference

define pipeline

task: Tasks.auto_speech_recognition
model: model name in model zoo, or model path in local disk
ngpu: 1 (Defalut), decoding on GPU. If ngpu=0, decoding on CPU
ncpu: 1 (Defalut), sets the number of threads used for intraop parallelism on CPU
output_dir: None (Defalut), the output path of results if set
batch_size: 1 (Defalut), batch size when decoding

infer pipeline

audio_in: the input to decode, which could be:
- wav_path, e.g.: asr_example.wav,
- pcm_path, e.g.: asr_example.pcm,
- audio bytes stream, e.g.: bytes data from a microphone
- audio sample point，e.g.: audio, rate = soundfile.read("asr_example_zh.wav"), the dtype is numpy.ndarray or torch.Tensor
- wav.scp, kaldi style wav list (wav_id \t wav_path``), e.g.`:
```
asr_example1  ./audios/asr_example1.wav
asr_example2  ./audios/asr_example2.wav
```
In this case of wav.scp input, output_dir must be set to save the output results
audio_fs: audio sampling rate, only set when audio_in is pcm audio

3.0 KiB

Raw Blame History

Speech Recognition

Inference

Quick start

Paraformer model

Paraformer-online model

API-reference

define pipeline

infer pipeline

Inference with you data

Inference with multi-threads on CPU

Inference with multi GPU

Finetune with pipeline

Quick start

Finetune with your data

Inference with your finetuned model

3.0 KiB Raw Blame History Unescape Escape

Speech Recognition

Inference

Quick start

Paraformer model

Paraformer-online model

API-reference

define pipeline

infer pipeline

Inference with you data

Inference with multi-threads on CPU

Inference with multi GPU

Finetune with pipeline

Quick start

Finetune with your data

Inference with your finetuned model

3.0 KiB

Raw Blame History