mirror of
https://github.com/modelscope/FunASR
synced 2025-09-15 14:48:36 +08:00
4.9 KiB
4.9 KiB
Speaker Verification
Note
: The modelscope pipeline supports all the models in model zoo to inference and finetine. Here we take the model of xvector_sv as example to demonstrate the usage.
Inference with pipeline
Quick start
Speaker verification
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_sv_pipline = pipeline(
task=Tasks.speaker_verification,
model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
)
# The same speaker
rec_result = inference_sv_pipline(audio_in=(
'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav',
'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_same.wav'))
print("Similarity", rec_result["scores"])
# Different speakers
rec_result = inference_sv_pipline(audio_in=(
'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav',
'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav'))
print("Similarity", rec_result["scores"])
Speaker embedding extraction
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
# Define extraction pipeline
inference_sv_pipline = pipeline(
task=Tasks.speaker_verification,
model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
)
# Extract speaker embedding
rec_result = inference_sv_pipline(
audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav')
speaker_embedding = rec_result["spk_embedding"]
Full code of demo, please ref to infer.py.
API-reference
Define pipeline
task:Tasks.speaker_verificationmodel: model name in model zoo, or model path in local diskngpu:1(Default), decoding on GPU. If ngpu=0, decoding on CPUoutput_dir:None(Default), the output path of results if setbatch_size:1(Default), batch size when decodingsv_threshold:0.9465(Default), the similarity threshold to determine whether utterances belong to the same speaker (it should be in (0, 1))
Infer pipeline for speaker embedding extraction
audio_in: the input to process, which could be:- url (str):
e.g.: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav - local_path:
e.g.: path/to/a.wav - wav.scp:
e.g.: path/to/wav1.scpwav.scp test1 path/to/enroll1.wav test2 path/to/enroll2.wav - bytes:
e.g.: raw bytes data from a microphone - fbank1.scp,speech,kaldi_ark:
e.g.: extracted 80-dimensional fbank features with kaldi toolkits.
- url (str):
Infer pipeline for speaker verification
audio_in: the input to process, which could be:- Tuple(url1, url2):
e.g.: (https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav, https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav) - Tuple(local_path1, local_path2):
e.g.: (path/to/a.wav, path/to/b.wav) - Tuple(wav1.scp, wav2.scp):
e.g.: (path/to/wav1.scp, path/to/wav2.scp)wav1.scp test1 path/to/enroll1.wav test2 path/to/enroll2.wav wav2.scp test1 path/to/same1.wav test2 path/to/diff2.wav - Tuple(bytes, bytes):
e.g.: raw bytes data from a microphone - Tuple("fbank1.scp,speech,kaldi_ark", "fbank2.scp,speech,kaldi_ark"):
e.g.: extracted 80-dimensional fbank features with kaldi toolkits.
- Tuple(url1, url2):
Inference with you data
Use wav1.scp or fbank.scp to organize your own data to extract speaker embeddings or perform speaker verification.
In this case, the output_dir should be set to save all the embeddings or scores.
Inference with multi-threads on CPU
You can inference with multi-threads on CPU as follow steps:
- Set
ngpu=0while defining the pipeline ininfer.py. - Split wav.scp to several files
e.g.: 4
split -l $((`wc -l < wav.scp`/4+1)) --numeric-suffixes wav.scp splits/wav.scp.
- Start to extract embeddings
for wav_scp in `ls splits/wav.scp.*`; do
infer.py ${wav_scp} outputs/$((basename ${wav_scp}))
done
- The embeddings will be saved in
outputs/*
Inference with multi GPU
Similar to inference on CPU, the difference are as follows:
Step 1. Set ngpu=1 while defining the pipeline in infer.py.
Step 3. specify the gpu device with CUDA_VISIBLE_DEVICES:
for wav_scp in `ls splits/wav.scp.*`; do
CUDA_VISIBLE_DEVICES=1 infer.py ${wav_scp} outputs/$((basename ${wav_scp}))
done