diff --git a/README.md b/README.md
index 665f42592..414eb9b89 100644
--- a/README.md
+++ b/README.md
@@ -28,7 +28,7 @@ For the release notes, please ref to [news](https://github.com/alibaba-damo-acad
## Highlights
- FunASR supports speech recognition(ASR), Multi-talker ASR, Voice Activity Detection(VAD), Punctuation Restoration, Language Models, Speaker Verification and Speaker diarization.
-- We have released large number of academic and industrial pretrained models on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)
+- We have released large number of academic and industrial pretrained models on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), ref to [Model Zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html)
- The pretrained model [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) obtains the best performance on many tasks in [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard)
- FunASR supplies a easy-to-use pipeline to finetune pretrained models from [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)
- Compared to [Espnet](https://github.com/espnet/espnet) framework, the training speed of large-scale datasets in FunASR is much faster owning to the optimized dataloader.
diff --git a/docs/index.rst b/docs/index.rst
index e6aff5fab..b8fcacdeb 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -70,6 +70,7 @@ Overview
./runtime/grpc_python.md
./runtime/grpc_cpp.md
./runtime/websocket_python.md
+ ./runtime/websocket_cpp.md
.. toctree::
:maxdepth: 1
diff --git a/docs/modelscope_pipeline/punc_pipeline.md b/docs/modelscope_pipeline/punc_pipeline.md
deleted file mode 100644
index a0203d707..000000000
--- a/docs/modelscope_pipeline/punc_pipeline.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# Punctuation Restoration
-
-## Inference with pipeline
-
-### Quick start
-
-### Inference with you data
-
-### Inference with multi-threads on CPU
-
-### Inference with multi GPU
-
-## Finetune with pipeline
-
-### Quick start
-
-### Finetune with your data
-
-## Inference with your finetuned model
-
diff --git a/docs/modelscope_pipeline/punc_pipeline.md b/docs/modelscope_pipeline/punc_pipeline.md
new file mode 120000
index 000000000..4ef4711b8
--- /dev/null
+++ b/docs/modelscope_pipeline/punc_pipeline.md
@@ -0,0 +1 @@
+../../egs_modelscope/punctuation/TEMPLATE/README.md
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/sd_pipeline.md b/docs/modelscope_pipeline/sd_pipeline.md
deleted file mode 100644
index 1330fe6f7..000000000
--- a/docs/modelscope_pipeline/sd_pipeline.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# Speaker Diarization
-
-## Inference with pipeline
-
-### Quick start
-
-### Inference with you data
-
-### Inference with multi-threads on CPU
-
-### Inference with multi GPU
-
-## Finetune with pipeline
-
-### Quick start
-
-### Finetune with your data
-
-## Inference with your finetuned model
-
diff --git a/docs/modelscope_pipeline/sd_pipeline.md b/docs/modelscope_pipeline/sd_pipeline.md
new file mode 120000
index 000000000..9c3ac9876
--- /dev/null
+++ b/docs/modelscope_pipeline/sd_pipeline.md
@@ -0,0 +1 @@
+../../egs_modelscope/speaker_diarization/TEMPLATE/README.md
\ No newline at end of file
diff --git a/docs/modelscope_pipeline/sv_pipeline.md b/docs/modelscope_pipeline/sv_pipeline.md
deleted file mode 100644
index c57db3890..000000000
--- a/docs/modelscope_pipeline/sv_pipeline.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# Speaker Verification
-
-## Inference with pipeline
-
-### Quick start
-
-### Inference with you data
-
-### Inference with multi-threads on CPU
-
-### Inference with multi GPU
-
-## Finetune with pipeline
-
-### Quick start
-
-### Finetune with your data
-
-## Inference with your finetuned model
-
diff --git a/docs/modelscope_pipeline/sv_pipeline.md b/docs/modelscope_pipeline/sv_pipeline.md
new file mode 120000
index 000000000..321735574
--- /dev/null
+++ b/docs/modelscope_pipeline/sv_pipeline.md
@@ -0,0 +1 @@
+../../egs_modelscope/speaker_verification/TEMPLATE/README.md
\ No newline at end of file
diff --git a/docs/runtime/websocket_cpp.md b/docs/runtime/websocket_cpp.md
new file mode 120000
index 000000000..8a87df5e4
--- /dev/null
+++ b/docs/runtime/websocket_cpp.md
@@ -0,0 +1 @@
+../../funasr/runtime/websocket/readme.md
\ No newline at end of file
diff --git a/egs_modelscope/asr/TEMPLATE/README.md b/egs_modelscope/asr/TEMPLATE/README.md
index 83c462d98..30ae8c990 100644
--- a/egs_modelscope/asr/TEMPLATE/README.md
+++ b/egs_modelscope/asr/TEMPLATE/README.md
@@ -19,22 +19,24 @@ inference_pipeline = pipeline(
rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
print(rec_result)
```
-#### [Paraformer-online Model](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary)
+#### [Paraformer-online Model](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary)
```python
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
- model='damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online',
+ model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online',
+ model_revision='v1.0.4'
)
import soundfile
speech, sample_rate = soundfile.read("example/asr_example.wav")
-param_dict = {"cache": dict(), "is_final": False}
-chunk_stride = 7680# 480ms
-# first chunk, 480ms
+chunk_size = [5, 10, 5] #[5, 10, 5] 600ms, [8, 8, 4] 480ms
+param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size}
+chunk_stride = chunk_size[1] * 960 # 600ms、480ms
+# first chunk, 600ms
speech_chunk = speech[0:chunk_stride]
rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
print(rec_result)
-# next chunk, 480ms
+# next chunk, 600ms
speech_chunk = speech[chunk_stride:chunk_stride+chunk_stride]
rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
print(rec_result)
@@ -74,15 +76,15 @@ rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyu
print(rec_result)
```
-#### API-reference
-##### Define pipeline
+### API-reference
+#### Define pipeline
- `task`: `Tasks.auto_speech_recognition`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
-##### Infer pipeline
+#### Infer pipeline
- `audio_in`: the input to decode, which could be:
- wav_path, `e.g.`: asr_example.wav,
- pcm_path, `e.g.`: asr_example.pcm,
@@ -100,20 +102,20 @@ print(rec_result)
### Inference with multi-thread CPUs or multi GPUs
FunASR also offer recipes [egs_modelscope/asr/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
-- Setting parameters in `infer.sh`
- - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- - `data_dir`: the dataset dir needs to include `wav.scp`. If `${data_dir}/text` is also exists, CER will be computed
- - `output_dir`: output dir of the recognition results
- - `batch_size`: `64` (Default), batch size of inference on gpu
- - `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
- - `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
- - `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
- - `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
- - `checkpoint_name`: only used for infer finetuned models, `valid.cer_ctc.ave.pb` (Default), which checkpoint is used to infer
- - `decoding_mode`: `normal` (Default), decoding mode for UniASR model(fast、normal、offline)
- - `hotword_txt`: `None` (Default), hotword file for contextual paraformer model(the hotword file name ends with .txt")
+#### Settings of `infer.sh`
+- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+- `data_dir`: the dataset dir needs to include `wav.scp`. If `${data_dir}/text` is also exists, CER will be computed
+- `output_dir`: output dir of the recognition results
+- `batch_size`: `64` (Default), batch size of inference on gpu
+- `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
+- `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
+- `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
+- `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
+- `checkpoint_name`: only used for infer finetuned models, `valid.cer_ctc.ave.pb` (Default), which checkpoint is used to infer
+- `decoding_mode`: `normal` (Default), decoding mode for UniASR model(fast、normal、offline)
+- `hotword_txt`: `None` (Default), hotword file for contextual paraformer model(the hotword file name ends with .txt")
-- Decode with multi GPUs:
+#### Decode with multi GPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
@@ -123,7 +125,7 @@ FunASR also offer recipes [egs_modelscope/asr/TEMPLATE/infer.sh](https://github.
--gpu_inference true \
--gpuid_list "0,1"
```
-- Decode with multi-thread CPUs:
+#### Decode with multi-thread CPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
@@ -133,7 +135,7 @@ FunASR also offer recipes [egs_modelscope/asr/TEMPLATE/infer.sh](https://github.
--njob 64
```
-- Results
+#### Results
The decoding results can be found in `$output_dir/1best_recog/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md
deleted file mode 100644
index c68a8cd4f..000000000
--- a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md
+++ /dev/null
@@ -1,30 +0,0 @@
-# ModelScope Model
-
-## How to finetune and infer using a pretrained Paraformer-large Model
-
-### Finetune
-
-- Modify finetune training related parameters in `finetune.py`
- - output_dir: # result dir
- - data_dir: # the dataset dir needs to include files: train/wav.scp, train/text; validation/wav.scp, validation/text.
- - batch_bins: # batch size
- - max_epoch: # number of training epoch
- - lr: # learning rate
-
-- Then you can run the pipeline to finetune with:
-```python
- python finetune.py
-```
-
-### Inference
-
-Or you can use the finetuned model for inference directly.
-
-- Setting parameters in `infer.py`
- - audio_in: # support wav, url, bytes, and parsed audio format.
- - output_dir: # If the input format is wav.scp, it needs to be set.
-
-- Then you can run the pipeline to infer with:
-```python
- python infer.py
-```
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md
new file mode 120000
index 000000000..bb55ab52e
--- /dev/null
+++ b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md
@@ -0,0 +1 @@
+../../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/demo.py b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/demo.py
new file mode 100644
index 000000000..87bb65299
--- /dev/null
+++ b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/demo.py
@@ -0,0 +1,14 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == '__main__':
+ audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
+ output_dir = None
+ inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model="damo/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch",
+ output_dir=output_dir,
+ )
+ rec_result = inference_pipeline(audio_in=audio_in)
+ print(rec_result)
+
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
deleted file mode 100644
index 3594815f7..000000000
--- a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
+++ /dev/null
@@ -1,14 +0,0 @@
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-if __name__ == '__main__':
- audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
- output_dir = None
- inference_pipline = pipeline(
- task=Tasks.auto_speech_recognition,
- model="damo/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch",
- output_dir=output_dir,
- )
- rec_result = inference_pipline(audio_in=audio_in)
- print(rec_result)
-
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
new file mode 120000
index 000000000..128fc31c2
--- /dev/null
+++ b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
@@ -0,0 +1 @@
+../../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.sh b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.sh
new file mode 120000
index 000000000..5e59f1841
--- /dev/null
+++ b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.sh
@@ -0,0 +1 @@
+../../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/README.md b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/README.md
new file mode 120000
index 000000000..bb55ab52e
--- /dev/null
+++ b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/README.md
@@ -0,0 +1 @@
+../../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/demo.py b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/demo.py
new file mode 100644
index 000000000..3b0164a46
--- /dev/null
+++ b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/demo.py
@@ -0,0 +1,13 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == "__main__":
+ audio_in = "https://modelscope.oss-cn-beijing.aliyuncs.com/test/audios/asr_example.wav"
+ output_dir = "./results"
+ inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model="damo/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch",
+ output_dir=output_dir,
+ )
+ rec_result = inference_pipeline(audio_in=audio_in)
+ print(rec_result)
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
deleted file mode 100644
index b55b59f41..000000000
--- a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
+++ /dev/null
@@ -1,13 +0,0 @@
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-if __name__ == "__main__":
- audio_in = "https://modelscope.oss-cn-beijing.aliyuncs.com/test/audios/asr_example.wav"
- output_dir = "./results"
- inference_pipline = pipeline(
- task=Tasks.auto_speech_recognition,
- model="damo/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch",
- output_dir=output_dir,
- )
- rec_result = inference_pipline(audio_in=audio_in)
- print(rec_result)
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
new file mode 120000
index 000000000..128fc31c2
--- /dev/null
+++ b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
@@ -0,0 +1 @@
+../../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.sh b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.sh
new file mode 120000
index 000000000..5e59f1841
--- /dev/null
+++ b/egs_modelscope/asr/conformer/speech_conformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.sh
@@ -0,0 +1 @@
+../../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/asr/data2vec/speech_data2vec_pretrain-paraformer-zh-cn-aishell2-16k/infer.py b/egs_modelscope/asr/data2vec/speech_data2vec_pretrain-paraformer-zh-cn-aishell2-16k/infer.py
index 77b2cbd23..7a6b750e1 100644
--- a/egs_modelscope/asr/data2vec/speech_data2vec_pretrain-paraformer-zh-cn-aishell2-16k/infer.py
+++ b/egs_modelscope/asr/data2vec/speech_data2vec_pretrain-paraformer-zh-cn-aishell2-16k/infer.py
@@ -16,13 +16,13 @@ def modelscope_infer_core(output_dir, split_dir, njob, idx):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_list[gpu_id])
else:
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_data2vec_pretrain-paraformer-zh-cn-aishell2-16k",
output_dir=output_dir_job,
)
audio_in = os.path.join(split_dir, "wav.{}.scp".format(idx))
- inference_pipline(audio_in=audio_in)
+ inference_pipeline(audio_in=audio_in)
def modelscope_infer(params):
diff --git a/egs_modelscope/asr/data2vec/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/infer.py b/egs_modelscope/asr/data2vec/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/infer.py
index 0d06377e0..f07f308c2 100644
--- a/egs_modelscope/asr/data2vec/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/infer.py
+++ b/egs_modelscope/asr/data2vec/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/infer.py
@@ -16,13 +16,13 @@ def modelscope_infer_core(output_dir, split_dir, njob, idx):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_list[gpu_id])
else:
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch",
output_dir=output_dir_job,
)
audio_in = os.path.join(split_dir, "wav.{}.scp".format(idx))
- inference_pipline(audio_in=audio_in)
+ inference_pipeline(audio_in=audio_in)
def modelscope_infer(params):
diff --git a/egs_modelscope/asr/mfcca/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/demo.py b/egs_modelscope/asr/mfcca/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/demo.py
new file mode 100644
index 000000000..f6026d631
--- /dev/null
+++ b/egs_modelscope/asr/mfcca/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/demo.py
@@ -0,0 +1,11 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model='NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950',
+ model_revision='v3.0.0'
+)
+
+rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
+print(rec_result)
\ No newline at end of file
diff --git a/egs_modelscope/asr/mfcca/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/infer_after_finetune.py b/egs_modelscope/asr/mfcca/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/infer_after_finetune.py
deleted file mode 100755
index 333b66a72..000000000
--- a/egs_modelscope/asr/mfcca/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/infer_after_finetune.py
+++ /dev/null
@@ -1,67 +0,0 @@
-import json
-import os
-import shutil
-
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-from funasr.utils.compute_wer import compute_wer
-
-
-def modelscope_infer_after_finetune(params):
- # prepare for decoding
- pretrained_model_path = os.path.join(os.environ["HOME"], ".cache/modelscope/hub", params["modelscope_model_name"])
- for file_name in params["required_files"]:
- if file_name == "configuration.json":
- with open(os.path.join(pretrained_model_path, file_name)) as f:
- config_dict = json.load(f)
- config_dict["model"]["am_model_name"] = params["decoding_model_name"]
- with open(os.path.join(params["output_dir"], "configuration.json"), "w") as f:
- json.dump(config_dict, f, indent=4, separators=(',', ': '))
- else:
- shutil.copy(os.path.join(pretrained_model_path, file_name),
- os.path.join(params["output_dir"], file_name))
- decoding_path = os.path.join(params["output_dir"], "decode_results")
- if os.path.exists(decoding_path):
- shutil.rmtree(decoding_path)
- os.mkdir(decoding_path)
-
- # decoding
- inference_pipeline = pipeline(
- task=Tasks.auto_speech_recognition,
- model=params["output_dir"],
- output_dir=decoding_path,
- batch_size=1
- )
- audio_in = os.path.join(params["data_dir"], "wav.scp")
- inference_pipeline(audio_in=audio_in)
-
- # computer CER if GT text is set
- text_in = os.path.join(params["data_dir"], "text")
- if text_in is not None:
- text_proc_file = os.path.join(decoding_path, "1best_recog/token")
- text_proc_file2 = os.path.join(decoding_path, "1best_recog/token_nosep")
- with open(text_proc_file, 'r') as hyp_reader:
- with open(text_proc_file2, 'w') as hyp_writer:
- for line in hyp_reader:
- new_context = line.strip().replace("src","").replace(" "," ").replace(" "," ").strip()
- hyp_writer.write(new_context+'\n')
- text_in2 = os.path.join(decoding_path, "1best_recog/ref_text_nosep")
- with open(text_in, 'r') as ref_reader:
- with open(text_in2, 'w') as ref_writer:
- for line in ref_reader:
- new_context = line.strip().replace("src","").replace(" "," ").replace(" "," ").strip()
- ref_writer.write(new_context+'\n')
-
-
- compute_wer(text_in, text_proc_file, os.path.join(decoding_path, "text.sp.cer"))
- compute_wer(text_in2, text_proc_file2, os.path.join(decoding_path, "text.nosp.cer"))
-
-if __name__ == '__main__':
- params = {}
- params["modelscope_model_name"] = "NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950"
- params["required_files"] = ["feats_stats.npz", "decoding.yaml", "configuration.json"]
- params["output_dir"] = "./checkpoint"
- params["data_dir"] = "./example_data/validation"
- params["decoding_model_name"] = "valid.acc.ave.pb"
- modelscope_infer_after_finetune(params)
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/README.md b/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/README.md
deleted file mode 100644
index 49c0aeb5d..000000000
--- a/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/README.md
+++ /dev/null
@@ -1,19 +0,0 @@
-# ModelScope Model
-
-## How to infer using a pretrained Paraformer-large Model
-
-### Inference
-
-You can use the pretrain model for inference directly.
-
-- Setting parameters in `infer.py`
- - audio_in: # Support wav, url, bytes, and parsed audio format.
- - output_dir: # If the input format is wav.scp, it needs to be set.
- - batch_size: # Set batch size in inference.
- - param_dict: # Set the hotword list in inference.
-
-- Then you can run the pipeline to infer with:
-```python
- python infer.py
-```
-
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/README.md b/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/README.md
new file mode 120000
index 000000000..92088a21d
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/README.md
@@ -0,0 +1 @@
+../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer.sh b/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer.sh
deleted file mode 100644
index e60f6d973..000000000
--- a/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer.sh
+++ /dev/null
@@ -1,105 +0,0 @@
-#!/usr/bin/env bash
-
-set -e
-set -u
-set -o pipefail
-
-stage=1
-stop_stage=2
-model="damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404"
-data_dir="./data/test"
-output_dir="./results"
-batch_size=64
-gpu_inference=true # whether to perform gpu decoding
-gpuid_list="0,1" # set gpus, e.g., gpuid_list="0,1"
-njob=64 # the number of jobs for CPU decoding, if gpu_inference=false, use CPU decoding, please set njob
-checkpoint_dir=
-checkpoint_name="valid.cer_ctc.ave.pb"
-hotword_txt=None
-
-. utils/parse_options.sh || exit 1;
-
-if ${gpu_inference} == "true"; then
- nj=$(echo $gpuid_list | awk -F "," '{print NF}')
-else
- nj=$njob
- batch_size=1
- gpuid_list=""
- for JOB in $(seq ${nj}); do
- gpuid_list=$gpuid_list"-1,"
- done
-fi
-
-mkdir -p $output_dir/split
-split_scps=""
-for JOB in $(seq ${nj}); do
- split_scps="$split_scps $output_dir/split/wav.$JOB.scp"
-done
-perl utils/split_scp.pl ${data_dir}/wav.scp ${split_scps}
-
-if [ -n "${checkpoint_dir}" ]; then
- python utils/prepare_checkpoint.py ${model} ${checkpoint_dir} ${checkpoint_name}
- model=${checkpoint_dir}/${model}
-fi
-
-if [ $stage -le 1 ] && [ $stop_stage -ge 1 ];then
- echo "Decoding ..."
- gpuid_list_array=(${gpuid_list//,/ })
- for JOB in $(seq ${nj}); do
- {
- id=$((JOB-1))
- gpuid=${gpuid_list_array[$id]}
- mkdir -p ${output_dir}/output.$JOB
- python infer.py \
- --model ${model} \
- --audio_in ${output_dir}/split/wav.$JOB.scp \
- --output_dir ${output_dir}/output.$JOB \
- --batch_size ${batch_size} \
- --gpuid ${gpuid} \
- --hotword_txt ${hotword_txt}
- }&
- done
- wait
-
- mkdir -p ${output_dir}/1best_recog
- for f in token score text; do
- if [ -f "${output_dir}/output.1/1best_recog/${f}" ]; then
- for i in $(seq "${nj}"); do
- cat "${output_dir}/output.${i}/1best_recog/${f}"
- done | sort -k1 >"${output_dir}/1best_recog/${f}"
- fi
- done
-fi
-
-if [ $stage -le 2 ] && [ $stop_stage -ge 2 ];then
- echo "Computing WER ..."
- cp ${output_dir}/1best_recog/text ${output_dir}/1best_recog/text.proc
- cp ${data_dir}/text ${output_dir}/1best_recog/text.ref
- python utils/compute_wer.py ${output_dir}/1best_recog/text.ref ${output_dir}/1best_recog/text.proc ${output_dir}/1best_recog/text.cer
- tail -n 3 ${output_dir}/1best_recog/text.cer
-fi
-
-if [ $stage -le 3 ] && [ $stop_stage -ge 3 ];then
- echo "SpeechIO TIOBE textnorm"
- echo "$0 --> Normalizing REF text ..."
- ./utils/textnorm_zh.py \
- --has_key --to_upper \
- ${data_dir}/text \
- ${output_dir}/1best_recog/ref.txt
-
- echo "$0 --> Normalizing HYP text ..."
- ./utils/textnorm_zh.py \
- --has_key --to_upper \
- ${output_dir}/1best_recog/text.proc \
- ${output_dir}/1best_recog/rec.txt
- grep -v $'\t$' ${output_dir}/1best_recog/rec.txt > ${output_dir}/1best_recog/rec_non_empty.txt
-
- echo "$0 --> computing WER/CER and alignment ..."
- ./utils/error_rate_zh \
- --tokenizer char \
- --ref ${output_dir}/1best_recog/ref.txt \
- --hyp ${output_dir}/1best_recog/rec_non_empty.txt \
- ${output_dir}/1best_recog/DETAILS.txt | tee ${output_dir}/1best_recog/RESULTS.txt
- rm -rf ${output_dir}/1best_recog/rec.txt ${output_dir}/1best_recog/rec_non_empty.txt
-fi
-
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer.sh b/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer.sh
new file mode 120000
index 000000000..0b3b38b6f
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer.sh
@@ -0,0 +1 @@
+../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer_aishell1_subtest_demo.py b/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer_aishell1_subtest_demo.py
index c3e18b43e..97e9fce67 100644
--- a/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer_aishell1_subtest_demo.py
+++ b/egs_modelscope/asr/paraformer/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/infer_aishell1_subtest_demo.py
@@ -1,4 +1,3 @@
-from itertools import count
import os
import tempfile
import codecs
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/infer.py b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/infer.py
new file mode 100644
index 000000000..4fd4cdf9c
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/infer.py
@@ -0,0 +1,39 @@
+import os
+import logging
+import torch
+import soundfile
+
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+from modelscope.utils.logger import get_logger
+
+logger = get_logger(log_level=logging.CRITICAL)
+logger.setLevel(logging.CRITICAL)
+
+os.environ["MODELSCOPE_CACHE"] = "./"
+inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online',
+ model_revision='v1.0.4'
+)
+
+model_dir = os.path.join(os.environ["MODELSCOPE_CACHE"], "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online")
+speech, sample_rate = soundfile.read(os.path.join(model_dir, "example/asr_example.wav"))
+speech_length = speech.shape[0]
+
+sample_offset = 0
+chunk_size = [5, 10, 5] #[5, 10, 5] 600ms, [8, 8, 4] 480ms
+stride_size = chunk_size[1] * 960
+param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size}
+final_result = ""
+
+for sample_offset in range(0, speech_length, min(stride_size, speech_length - sample_offset)):
+ if sample_offset + stride_size >= speech_length - 1:
+ stride_size = speech_length - sample_offset
+ param_dict["is_final"] = True
+ rec_result = inference_pipeline(audio_in=speech[sample_offset: sample_offset + stride_size],
+ param_dict=param_dict)
+ if len(rec_result) != 0:
+ final_result += rec_result['text'][0]
+ print(rec_result)
+print(final_result)
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md
deleted file mode 100644
index c740f7187..000000000
--- a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md
+++ /dev/null
@@ -1,76 +0,0 @@
-# ModelScope Model
-
-## How to finetune and infer using a pretrained Paraformer-large Model
-
-### Finetune
-
-- Modify finetune training related parameters in `finetune.py`
- - output_dir: # result dir
- - data_dir: # the dataset dir needs to include files: `train/wav.scp`, `train/text`; `validation/wav.scp`, `validation/text`
- - dataset_type: # for dataset larger than 1000 hours, set as `large`, otherwise set as `small`
- - batch_bins: # batch size. For dataset_type is `small`, `batch_bins` indicates the feature frames. For dataset_type is `large`, `batch_bins` indicates the duration in ms
- - max_epoch: # number of training epoch
- - lr: # learning rate
-
-- Then you can run the pipeline to finetune with:
-```python
- python finetune.py
-```
-
-### Inference
-
-Or you can use the finetuned model for inference directly.
-
-- Setting parameters in `infer.sh`
- - model: # model name on ModelScope
- - data_dir: # the dataset dir needs to include `${data_dir}/wav.scp`. If `${data_dir}/text` is also exists, CER will be computed
- - output_dir: # result dir
- - batch_size: # batchsize of inference
- - gpu_inference: # whether to perform gpu decoding, set false for cpu decoding
- - gpuid_list: # set gpus, e.g., gpuid_list="0,1"
- - njob: # the number of jobs for CPU decoding, if `gpu_inference`=false, use CPU decoding, please set `njob`
-
-- Decode with multi GPUs:
-```shell
- bash infer.sh \
- --model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
- --data_dir "./data/test" \
- --output_dir "./results" \
- --batch_size 64 \
- --gpu_inference true \
- --gpuid_list "0,1"
-```
-
-- Decode with multi-thread CPUs:
-```shell
- bash infer.sh \
- --model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
- --data_dir "./data/test" \
- --output_dir "./results" \
- --gpu_inference false \
- --njob 64
-```
-
-- Results
-
-The decoding results can be found in `${output_dir}/1best_recog/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.
-
-If you decode the SpeechIO test sets, you can use textnorm with `stage`=3, and `DETAILS.txt`, `RESULTS.txt` record the results and CER after text normalization.
-
-### Inference using local finetuned model
-
-- Modify inference related parameters in `infer_after_finetune.py`
- - modelscope_model_name: # model name on ModelScope
- - output_dir: # result dir
- - data_dir: # the dataset dir needs to include `test/wav.scp`. If `test/text` is also exists, CER will be computed
- - decoding_model_name: # set the checkpoint name for decoding, e.g., `valid.cer_ctc.ave.pb`
- - batch_size: # batchsize of inference
-
-- Then you can run the pipeline to finetune with:
-```python
- python infer_after_finetune.py
-```
-
-- Results
-
-The decoding results can be found in `$output_dir/decoding_results/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md
new file mode 120000
index 000000000..92088a21d
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md
@@ -0,0 +1 @@
+../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh
deleted file mode 100644
index ef49d7a60..000000000
--- a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh
+++ /dev/null
@@ -1,103 +0,0 @@
-#!/usr/bin/env bash
-
-set -e
-set -u
-set -o pipefail
-
-stage=1
-stop_stage=2
-model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
-data_dir="./data/test"
-output_dir="./results"
-batch_size=64
-gpu_inference=true # whether to perform gpu decoding
-gpuid_list="0,1" # set gpus, e.g., gpuid_list="0,1"
-njob=64 # the number of jobs for CPU decoding, if gpu_inference=false, use CPU decoding, please set njob
-checkpoint_dir=
-checkpoint_name="valid.cer_ctc.ave.pb"
-
-. utils/parse_options.sh || exit 1;
-
-if ${gpu_inference} == "true"; then
- nj=$(echo $gpuid_list | awk -F "," '{print NF}')
-else
- nj=$njob
- batch_size=1
- gpuid_list=""
- for JOB in $(seq ${nj}); do
- gpuid_list=$gpuid_list"-1,"
- done
-fi
-
-mkdir -p $output_dir/split
-split_scps=""
-for JOB in $(seq ${nj}); do
- split_scps="$split_scps $output_dir/split/wav.$JOB.scp"
-done
-perl utils/split_scp.pl ${data_dir}/wav.scp ${split_scps}
-
-if [ -n "${checkpoint_dir}" ]; then
- python utils/prepare_checkpoint.py ${model} ${checkpoint_dir} ${checkpoint_name}
- model=${checkpoint_dir}/${model}
-fi
-
-if [ $stage -le 1 ] && [ $stop_stage -ge 1 ];then
- echo "Decoding ..."
- gpuid_list_array=(${gpuid_list//,/ })
- for JOB in $(seq ${nj}); do
- {
- id=$((JOB-1))
- gpuid=${gpuid_list_array[$id]}
- mkdir -p ${output_dir}/output.$JOB
- python infer.py \
- --model ${model} \
- --audio_in ${output_dir}/split/wav.$JOB.scp \
- --output_dir ${output_dir}/output.$JOB \
- --batch_size ${batch_size} \
- --gpuid ${gpuid}
- }&
- done
- wait
-
- mkdir -p ${output_dir}/1best_recog
- for f in token score text; do
- if [ -f "${output_dir}/output.1/1best_recog/${f}" ]; then
- for i in $(seq "${nj}"); do
- cat "${output_dir}/output.${i}/1best_recog/${f}"
- done | sort -k1 >"${output_dir}/1best_recog/${f}"
- fi
- done
-fi
-
-if [ $stage -le 2 ] && [ $stop_stage -ge 2 ];then
- echo "Computing WER ..."
- cp ${output_dir}/1best_recog/text ${output_dir}/1best_recog/text.proc
- cp ${data_dir}/text ${output_dir}/1best_recog/text.ref
- python utils/compute_wer.py ${output_dir}/1best_recog/text.ref ${output_dir}/1best_recog/text.proc ${output_dir}/1best_recog/text.cer
- tail -n 3 ${output_dir}/1best_recog/text.cer
-fi
-
-if [ $stage -le 3 ] && [ $stop_stage -ge 3 ];then
- echo "SpeechIO TIOBE textnorm"
- echo "$0 --> Normalizing REF text ..."
- ./utils/textnorm_zh.py \
- --has_key --to_upper \
- ${data_dir}/text \
- ${output_dir}/1best_recog/ref.txt
-
- echo "$0 --> Normalizing HYP text ..."
- ./utils/textnorm_zh.py \
- --has_key --to_upper \
- ${output_dir}/1best_recog/text.proc \
- ${output_dir}/1best_recog/rec.txt
- grep -v $'\t$' ${output_dir}/1best_recog/rec.txt > ${output_dir}/1best_recog/rec_non_empty.txt
-
- echo "$0 --> computing WER/CER and alignment ..."
- ./utils/error_rate_zh \
- --tokenizer char \
- --ref ${output_dir}/1best_recog/ref.txt \
- --hyp ${output_dir}/1best_recog/rec_non_empty.txt \
- ${output_dir}/1best_recog/DETAILS.txt | tee ${output_dir}/1best_recog/RESULTS.txt
- rm -rf ${output_dir}/1best_recog/rec.txt ${output_dir}/1best_recog/rec_non_empty.txt
-fi
-
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh
new file mode 120000
index 000000000..0b3b38b6f
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh
@@ -0,0 +1 @@
+../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer_after_finetune.py b/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer_after_finetune.py
deleted file mode 100644
index 2d311ddc6..000000000
--- a/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer_after_finetune.py
+++ /dev/null
@@ -1,48 +0,0 @@
-import json
-import os
-import shutil
-
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-from modelscope.hub.snapshot_download import snapshot_download
-
-from funasr.utils.compute_wer import compute_wer
-
-def modelscope_infer_after_finetune(params):
- # prepare for decoding
-
- try:
- pretrained_model_path = snapshot_download(params["modelscope_model_name"], cache_dir=params["output_dir"])
- except BaseException:
- raise BaseException(f"Please download pretrain model from ModelScope firstly.")
- shutil.copy(os.path.join(params["output_dir"], params["decoding_model_name"]), os.path.join(pretrained_model_path, "model.pb"))
- decoding_path = os.path.join(params["output_dir"], "decode_results")
- if os.path.exists(decoding_path):
- shutil.rmtree(decoding_path)
- os.mkdir(decoding_path)
-
- # decoding
- inference_pipeline = pipeline(
- task=Tasks.auto_speech_recognition,
- model=pretrained_model_path,
- output_dir=decoding_path,
- batch_size=params["batch_size"]
- )
- audio_in = os.path.join(params["data_dir"], "wav.scp")
- inference_pipeline(audio_in=audio_in)
-
- # computer CER if GT text is set
- text_in = os.path.join(params["data_dir"], "text")
- if os.path.exists(text_in):
- text_proc_file = os.path.join(decoding_path, "1best_recog/text")
- compute_wer(text_in, text_proc_file, os.path.join(decoding_path, "text.cer"))
-
-
-if __name__ == '__main__':
- params = {}
- params["modelscope_model_name"] = "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
- params["output_dir"] = "./checkpoint"
- params["data_dir"] = "./data/test"
- params["decoding_model_name"] = "valid.acc.ave_10best.pb"
- params["batch_size"] = 64
- modelscope_infer_after_finetune(params)
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/infer.py b/egs_modelscope/asr/paraformer/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/infer.py
index d1fbca22d..00be7935f 100644
--- a/egs_modelscope/asr/paraformer/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/infer.py
+++ b/egs_modelscope/asr/paraformer/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/infer.py
@@ -16,14 +16,14 @@ def modelscope_infer_core(output_dir, split_dir, njob, idx):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_list[gpu_id])
else:
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch",
output_dir=output_dir_job,
batch_size=64
)
audio_in = os.path.join(split_dir, "wav.{}.scp".format(idx))
- inference_pipline(audio_in=audio_in)
+ inference_pipeline(audio_in=audio_in)
def modelscope_infer(params):
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md
deleted file mode 100644
index c68a8cd4f..000000000
--- a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md
+++ /dev/null
@@ -1,30 +0,0 @@
-# ModelScope Model
-
-## How to finetune and infer using a pretrained Paraformer-large Model
-
-### Finetune
-
-- Modify finetune training related parameters in `finetune.py`
- - output_dir: # result dir
- - data_dir: # the dataset dir needs to include files: train/wav.scp, train/text; validation/wav.scp, validation/text.
- - batch_bins: # batch size
- - max_epoch: # number of training epoch
- - lr: # learning rate
-
-- Then you can run the pipeline to finetune with:
-```python
- python finetune.py
-```
-
-### Inference
-
-Or you can use the finetuned model for inference directly.
-
-- Setting parameters in `infer.py`
- - audio_in: # support wav, url, bytes, and parsed audio format.
- - output_dir: # If the input format is wav.scp, it needs to be set.
-
-- Then you can run the pipeline to infer with:
-```python
- python infer.py
-```
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md
new file mode 120000
index 000000000..92088a21d
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/README.md
@@ -0,0 +1 @@
+../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/demo.py b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/demo.py
new file mode 100644
index 000000000..2863c1ada
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/demo.py
@@ -0,0 +1,15 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == '__main__':
+ audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
+ output_dir = None
+ inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model="damo/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch",
+ output_dir=output_dir,
+ batch_size=1,
+ )
+ rec_result = inference_pipeline(audio_in=audio_in)
+ print(rec_result)
+
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
deleted file mode 100644
index 8a6c87bbc..000000000
--- a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
+++ /dev/null
@@ -1,15 +0,0 @@
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-if __name__ == '__main__':
- audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
- output_dir = None
- inference_pipline = pipeline(
- task=Tasks.auto_speech_recognition,
- model="damo/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch",
- output_dir=output_dir,
- batch_size=32,
- )
- rec_result = inference_pipline(audio_in=audio_in)
- print(rec_result)
-
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
new file mode 120000
index 000000000..f05fbbb8b
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
@@ -0,0 +1 @@
+../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.sh b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.sh
new file mode 120000
index 000000000..0b3b38b6f
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.sh
@@ -0,0 +1 @@
+../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/README.md b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/README.md
new file mode 120000
index 000000000..92088a21d
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/README.md
@@ -0,0 +1 @@
+../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/demo.py b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/demo.py
new file mode 100644
index 000000000..f2db74e8b
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/demo.py
@@ -0,0 +1,13 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == "__main__":
+ audio_in = "https://modelscope.oss-cn-beijing.aliyuncs.com/test/audios/asr_example.wav"
+ output_dir = "./results"
+ inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model="damo/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch",
+ output_dir=output_dir,
+ )
+ rec_result = inference_pipeline(audio_in=audio_in)
+ print(rec_result)
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
deleted file mode 100644
index dec7de041..000000000
--- a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
+++ /dev/null
@@ -1,13 +0,0 @@
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-if __name__ == "__main__":
- audio_in = "https://modelscope.oss-cn-beijing.aliyuncs.com/test/audios/asr_example.wav"
- output_dir = "./results"
- inference_pipline = pipeline(
- task=Tasks.auto_speech_recognition,
- model="damo/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch",
- output_dir=output_dir,
- )
- rec_result = inference_pipline(audio_in=audio_in)
- print(rec_result)
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
new file mode 120000
index 000000000..f05fbbb8b
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
@@ -0,0 +1 @@
+../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.sh b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.sh
new file mode 120000
index 000000000..0b3b38b6f
--- /dev/null
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.sh
@@ -0,0 +1 @@
+../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/infer.py b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/infer.py
index 2eb9cc8bf..0066c7b6f 100644
--- a/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/infer.py
+++ b/egs_modelscope/asr/paraformer/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/infer.py
@@ -14,24 +14,26 @@ os.environ["MODELSCOPE_CACHE"] = "./"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online',
- model_revision='v1.0.2')
+ model_revision='v1.0.4'
+)
model_dir = os.path.join(os.environ["MODELSCOPE_CACHE"], "damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online")
speech, sample_rate = soundfile.read(os.path.join(model_dir, "example/asr_example.wav"))
speech_length = speech.shape[0]
sample_offset = 0
-step = 4800 #300ms
-param_dict = {"cache": dict(), "is_final": False}
+chunk_size = [8, 8, 4] #[5, 10, 5] 600ms, [8, 8, 4] 480ms
+stride_size = chunk_size[1] * 960
+param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size}
final_result = ""
-for sample_offset in range(0, speech_length, min(step, speech_length - sample_offset)):
- if sample_offset + step >= speech_length - 1:
- step = speech_length - sample_offset
+for sample_offset in range(0, speech_length, min(stride_size, speech_length - sample_offset)):
+ if sample_offset + stride_size >= speech_length - 1:
+ stride_size = speech_length - sample_offset
param_dict["is_final"] = True
- rec_result = inference_pipeline(audio_in=speech[sample_offset: sample_offset + step],
+ rec_result = inference_pipeline(audio_in=speech[sample_offset: sample_offset + stride_size],
param_dict=param_dict)
- if len(rec_result) != 0 and rec_result['text'] != "sil" and rec_result['text'] != "waiting_for_more_voice":
- final_result += rec_result['text']
- print(rec_result)
+ if len(rec_result) != 0:
+ final_result += rec_result['text'][0]
+ print(rec_result)
print(final_result)
diff --git a/egs_modelscope/asr/paraformerbert/speech_paraformerbert_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py b/egs_modelscope/asr/paraformerbert/speech_paraformerbert_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
index df1890311..f4c4fc2fb 100644
--- a/egs_modelscope/asr/paraformerbert/speech_paraformerbert_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
+++ b/egs_modelscope/asr/paraformerbert/speech_paraformerbert_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch/infer.py
@@ -4,11 +4,11 @@ from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_paraformerbert_asr_nat-zh-cn-16k-aishell1-vocab4234-pytorch",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in)
+ rec_result = inference_pipeline(audio_in=audio_in)
print(rec_result)
diff --git a/egs_modelscope/asr/paraformerbert/speech_paraformerbert_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py b/egs_modelscope/asr/paraformerbert/speech_paraformerbert_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
index 83d680583..63bed40a0 100644
--- a/egs_modelscope/asr/paraformerbert/speech_paraformerbert_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
+++ b/egs_modelscope/asr/paraformerbert/speech_paraformerbert_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://modelscope.oss-cn-beijing.aliyuncs.com/test/audios/asr_example.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_paraformerbert_asr_nat-zh-cn-16k-aishell2-vocab5212-pytorch",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in)
+ rec_result = inference_pipeline(audio_in=audio_in)
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-offline/infer.py
index c15114934..862f88198 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_cantonese-CHS.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/infer.py
index ac73adf72..d4f8d762f 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_cantonese-CHS.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-offline/infer.py
index 227f4bf28..347d31694 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-offline/infer.py
@@ -4,11 +4,11 @@ from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in)
+ rec_result = inference_pipeline(audio_in=audio_in)
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-online/infer.py
index 74d97643b..936d6d7ba 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-online/infer.py
@@ -4,11 +4,11 @@ from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in)
+ rec_result = inference_pipeline(audio_in=audio_in)
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-offline/infer.py
index 5ace7e4cf..f82c1f4c4 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_de.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/infer.py
index f8d91b833..48b48071e 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_de.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-offline/infer.py
index 49b884b2f..98f31b602 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_en.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-online/infer.py
index 57a3afdf9..423c503ed 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_en.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-offline/infer.py
index 510f00828..75e22a0e9 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_es.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-online/infer.py
index 2ec59402c..cb1b4fa99 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_es.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-offline/infer.py
index 040265d22..e6c39c2b8 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-offline/infer.py
@@ -16,14 +16,14 @@ def modelscope_infer_core(output_dir, split_dir, njob, idx):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_list[gpu_id])
else:
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-offline",
output_dir=output_dir_job,
batch_size=1
)
audio_in = os.path.join(split_dir, "wav.{}.scp".format(idx))
- inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
def modelscope_infer(params):
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/infer.py
index 055e4ebdb..124d5ed05 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/infer.py
@@ -16,14 +16,14 @@ def modelscope_infer_core(output_dir, split_dir, njob, idx):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_list[gpu_id])
else:
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online",
output_dir=output_dir_job,
batch_size=1
)
audio_in = os.path.join(split_dir, "wav.{}.scp".format(idx))
- inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
def modelscope_infer(params):
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-offline/infer.py
index 6aedeeaa8..627d132fc 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_fr.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/infer.py
index 2f3e8330c..305d990c8 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_fr.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/infer.py
index c54ab8c83..e0d1a4d35 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_he.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-offline/infer.py
index 219c9ec42..e53c37e60 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_id.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online/infer.py
index ad2671a3e..75ec783de 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_id.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-offline/infer.py
index 1a174bbca..68cc41d54 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_ja.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-online/infer.py
index f15bc2d2b..a741e18e7 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_ja.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-offline/infer.py
index 618b3f601..b87bcbb84 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_ko.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-online/infer.py
index 135e8f8b9..9be791ceb 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_ko.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/infer.py
index cfd869f04..b3a905859 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_my.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline/infer.py
index 2dcb6638a..4a43e7ce5 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_pt.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-online/infer.py
index aff2a9a51..7029fd9c8 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_pt.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-offline/infer.py
index 95f447d13..3c9d364e9 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_ru.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-online/infer.py
index 88c06b4c6..95da47935 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_ru.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/infer.py
index e8c5524f0..04b02fe16 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_ur.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-offline/infer.py
index 9472104e5..4218f3d7a 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-offline/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_vi.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"offline"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/infer.py
index 4a844fc82..355e412c4 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/infer.py
@@ -4,10 +4,10 @@ from modelscope.utils.constant import Tasks
if __name__ == "__main__":
audio_in = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_vi.wav"
output_dir = "./results"
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
+ rec_result = inference_pipeline(audio_in=audio_in, param_dict={"decoding_model":"normal"})
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/infer.py
index 40686acca..35209896c 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/infer.py
@@ -4,11 +4,11 @@ from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in)
+ rec_result = inference_pipeline(audio_in=audio_in)
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/infer.py
index dfe934d67..a3e2a002f 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/infer.py
@@ -4,11 +4,11 @@ from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in)
+ rec_result = inference_pipeline(audio_in=audio_in)
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/infer.py
index ce8988ef6..13d2a2e37 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/infer.py
@@ -16,14 +16,14 @@ def modelscope_infer_core(output_dir, split_dir, njob, idx):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_list[gpu_id])
else:
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline",
output_dir=output_dir_job,
batch_size=1
)
audio_in = os.path.join(split_dir, "wav.{}.scp".format(idx))
- inference_pipline(audio_in=audio_in)
+ inference_pipeline(audio_in=audio_in)
def modelscope_infer(params):
# prepare for multi-GPU decoding
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/infer.py
index 8b4a04dd3..876d51cc9 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/infer.py
@@ -16,14 +16,14 @@ def modelscope_infer_core(output_dir, split_dir, njob, idx):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_list[gpu_id])
else:
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online",
output_dir=output_dir_job,
batch_size=1
)
audio_in = os.path.join(split_dir, "wav.{}.scp".format(idx))
- inference_pipline(audio_in=audio_in, param_dict={"decoding_model": "normal"})
+ inference_pipeline(audio_in=audio_in, param_dict={"decoding_model": "normal"})
def modelscope_infer(params):
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline/infer.py
index 1c1e303f3..8ec42885d 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline/infer.py
@@ -4,11 +4,11 @@ from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in)
+ rec_result = inference_pipeline(audio_in=audio_in)
print(rec_result)
diff --git a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-online/infer.py b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-online/infer.py
index 94c1b6818..3ab16ea72 100644
--- a/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-online/infer.py
+++ b/egs_modelscope/asr/uniasr/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-online/infer.py
@@ -4,11 +4,11 @@ from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model="damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-online",
output_dir=output_dir,
)
- rec_result = inference_pipline(audio_in=audio_in)
+ rec_result = inference_pipeline(audio_in=audio_in)
print(rec_result)
diff --git a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md
index 94144efa7..83c462d98 100644
--- a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md
+++ b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/README.md
@@ -1,46 +1,246 @@
-# ModelScope Model
+# Speech Recognition
-## How to finetune and infer using a pretrained Paraformer-large Model
+> **Note**:
+> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take the typic models as examples to demonstrate the usage.
-### Finetune
+## Inference
-- Modify finetune training related parameters in `finetune.py`
- - output_dir: # result dir
- - data_dir: # the dataset dir needs to include files: train/wav.scp, train/text; validation/wav.scp, validation/text.
- - batch_bins: # batch size
- - max_epoch: # number of training epoch
- - lr: # learning rate
-
-- Then you can run the pipeline to finetune with:
+### Quick start
+#### [Paraformer Model](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)
```python
- python finetune.py
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
+)
+
+rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
+print(rec_result)
+```
+#### [Paraformer-online Model](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary)
+```python
+inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model='damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online',
+ )
+import soundfile
+speech, sample_rate = soundfile.read("example/asr_example.wav")
+
+param_dict = {"cache": dict(), "is_final": False}
+chunk_stride = 7680# 480ms
+# first chunk, 480ms
+speech_chunk = speech[0:chunk_stride]
+rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
+print(rec_result)
+# next chunk, 480ms
+speech_chunk = speech[chunk_stride:chunk_stride+chunk_stride]
+rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
+print(rec_result)
+```
+Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/241)
+
+#### [UniASR Model](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary)
+There are three decoding mode for UniASR model(`fast`、`normal`、`offline`), for more model detailes, please refer to [docs](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary)
+```python
+decoding_model = "fast" # "fast"、"normal"、"offline"
+inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model='damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825',
+ param_dict={"decoding_model": decoding_model})
+
+rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
+print(rec_result)
+```
+The decoding mode of `fast` and `normal` is fake streaming, which could be used for evaluating of recognition accuracy.
+Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151)
+#### [RNN-T-online model]()
+Undo
+
+#### [MFCCA Model](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary)
+For more model detailes, please refer to [docs](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary)
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model='NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950',
+ model_revision='v3.0.0'
+)
+
+rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
+print(rec_result)
```
-### Inference
+#### API-reference
+##### Define pipeline
+- `task`: `Tasks.auto_speech_recognition`
+- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
+- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
+- `output_dir`: `None` (Default), the output path of results if set
+- `batch_size`: `1` (Default), batch size when decoding
+##### Infer pipeline
+- `audio_in`: the input to decode, which could be:
+ - wav_path, `e.g.`: asr_example.wav,
+ - pcm_path, `e.g.`: asr_example.pcm,
+ - audio bytes stream, `e.g.`: bytes data from a microphone
+ - audio sample point,`e.g.`: `audio, rate = soundfile.read("asr_example_zh.wav")`, the dtype is numpy.ndarray or torch.Tensor
+ - wav.scp, kaldi style wav list (`wav_id \t wav_path`), `e.g.`:
+ ```text
+ asr_example1 ./audios/asr_example1.wav
+ asr_example2 ./audios/asr_example2.wav
+ ```
+ In this case of `wav.scp` input, `output_dir` must be set to save the output results
+- `audio_fs`: audio sampling rate, only set when audio_in is pcm audio
+- `output_dir`: None (Default), the output path of results if set
-Or you can use the finetuned model for inference directly.
+### Inference with multi-thread CPUs or multi GPUs
+FunASR also offer recipes [egs_modelscope/asr/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
-- Setting parameters in `infer.py`
- - audio_in: # support wav, url, bytes, and parsed audio format.
- - output_dir: # If the input format is wav.scp, it needs to be set.
+- Setting parameters in `infer.sh`
+ - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+ - `data_dir`: the dataset dir needs to include `wav.scp`. If `${data_dir}/text` is also exists, CER will be computed
+ - `output_dir`: output dir of the recognition results
+ - `batch_size`: `64` (Default), batch size of inference on gpu
+ - `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
+ - `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
+ - `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
+ - `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
+ - `checkpoint_name`: only used for infer finetuned models, `valid.cer_ctc.ave.pb` (Default), which checkpoint is used to infer
+ - `decoding_mode`: `normal` (Default), decoding mode for UniASR model(fast、normal、offline)
+ - `hotword_txt`: `None` (Default), hotword file for contextual paraformer model(the hotword file name ends with .txt")
-- Then you can run the pipeline to infer with:
-```python
- python infer.py
+- Decode with multi GPUs:
+```shell
+ bash infer.sh \
+ --model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
+ --data_dir "./data/test" \
+ --output_dir "./results" \
+ --batch_size 64 \
+ --gpu_inference true \
+ --gpuid_list "0,1"
```
-
-### Inference using local finetuned model
-
-- Modify inference related parameters in `infer_after_finetune.py`
- - output_dir: # result dir
- - data_dir: # the dataset dir needs to include `test/wav.scp`. If `test/text` is also exists, CER will be computed
- - decoding_model_name: # set the checkpoint name for decoding, e.g., `valid.cer_ctc.ave.pb`
-
-- Then you can run the pipeline to finetune with:
-```python
- python infer_after_finetune.py
+- Decode with multi-thread CPUs:
+```shell
+ bash infer.sh \
+ --model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
+ --data_dir "./data/test" \
+ --output_dir "./results" \
+ --gpu_inference false \
+ --njob 64
```
- Results
-The decoding results can be found in `$output_dir/decoding_results/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.
+The decoding results can be found in `$output_dir/1best_recog/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.
+
+If you decode the SpeechIO test sets, you can use textnorm with `stage`=3, and `DETAILS.txt`, `RESULTS.txt` record the results and CER after text normalization.
+
+
+## Finetune with pipeline
+
+### Quick start
+[finetune.py](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/finetune.py)
+```python
+import os
+from modelscope.metainfo import Trainers
+from modelscope.trainers import build_trainer
+from modelscope.msdatasets.audio.asr_dataset import ASRDataset
+
+def modelscope_finetune(params):
+ if not os.path.exists(params.output_dir):
+ os.makedirs(params.output_dir, exist_ok=True)
+ # dataset split ["train", "validation"]
+ ds_dict = ASRDataset.load(params.data_path, namespace='speech_asr')
+ kwargs = dict(
+ model=params.model,
+ data_dir=ds_dict,
+ dataset_type=params.dataset_type,
+ work_dir=params.output_dir,
+ batch_bins=params.batch_bins,
+ max_epoch=params.max_epoch,
+ lr=params.lr)
+ trainer = build_trainer(Trainers.speech_asr_trainer, default_args=kwargs)
+ trainer.train()
+
+
+if __name__ == '__main__':
+ from funasr.utils.modelscope_param import modelscope_args
+ params = modelscope_args(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch")
+ params.output_dir = "./checkpoint" # 模型保存路径
+ params.data_path = "speech_asr_aishell1_trainsets" # 数据路径,可以为modelscope中已上传数据,也可以是本地数据
+ params.dataset_type = "small" # 小数据量设置small,若数据量大于1000小时,请使用large
+ params.batch_bins = 2000 # batch size,如果dataset_type="small",batch_bins单位为fbank特征帧数,如果dataset_type="large",batch_bins单位为毫秒,
+ params.max_epoch = 50 # 最大训练轮数
+ params.lr = 0.00005 # 设置学习率
+
+ modelscope_finetune(params)
+```
+
+```shell
+python finetune.py &> log.txt &
+```
+
+### Finetune with your data
+
+- Modify finetune training related parameters in [finetune.py](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/finetune.py)
+ - `output_dir`: result dir
+ - `data_dir`: the dataset dir needs to include files: `train/wav.scp`, `train/text`; `validation/wav.scp`, `validation/text`
+ - `dataset_type`: for dataset larger than 1000 hours, set as `large`, otherwise set as `small`
+ - `batch_bins`: batch size. For dataset_type is `small`, `batch_bins` indicates the feature frames. For dataset_type is `large`, `batch_bins` indicates the duration in ms
+ - `max_epoch`: number of training epoch
+ - `lr`: learning rate
+
+- Training data formats:
+```sh
+cat ./example_data/text
+BAC009S0002W0122 而 对 楼 市 成 交 抑 制 作 用 最 大 的 限 购
+BAC009S0002W0123 也 成 为 地 方 政 府 的 眼 中 钉
+english_example_1 hello world
+english_example_2 go swim 去 游 泳
+
+cat ./example_data/wav.scp
+BAC009S0002W0122 /mnt/data/wav/train/S0002/BAC009S0002W0122.wav
+BAC009S0002W0123 /mnt/data/wav/train/S0002/BAC009S0002W0123.wav
+english_example_1 /mnt/data/wav/train/S0002/english_example_1.wav
+english_example_2 /mnt/data/wav/train/S0002/english_example_2.wav
+```
+
+- Then you can run the pipeline to finetune with:
+```shell
+python finetune.py
+```
+If you want finetune with multi-GPUs, you could:
+```shell
+CUDA_VISIBLE_DEVICES=1,2 python -m torch.distributed.launch --nproc_per_node 2 finetune.py > log.txt 2>&1
+```
+## Inference with your finetuned model
+
+- Setting parameters in [egs_modelscope/asr/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) is the same with [docs](https://github.com/alibaba-damo-academy/FunASR/tree/main/egs_modelscope/asr/TEMPLATE#inference-with-multi-thread-cpus-or-multi-gpus), `model` is the model name from modelscope, which you finetuned.
+
+- Decode with multi GPUs:
+```shell
+ bash infer.sh \
+ --model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
+ --data_dir "./data/test" \
+ --output_dir "./results" \
+ --batch_size 64 \
+ --gpu_inference true \
+ --gpuid_list "0,1" \
+ --checkpoint_dir "./checkpoint" \
+ --checkpoint_name "valid.cer_ctc.ave.pb"
+```
+- Decode with multi-thread CPUs:
+```shell
+ bash infer.sh \
+ --model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
+ --data_dir "./data/test" \
+ --output_dir "./results" \
+ --gpu_inference false \
+ --njob 64 \
+ --checkpoint_dir "./checkpoint" \
+ --checkpoint_name "valid.cer_ctc.ave.pb"
+```
diff --git a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/demo.py b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/demo.py
new file mode 100644
index 000000000..2fce734ed
--- /dev/null
+++ b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/demo.py
@@ -0,0 +1,16 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == '__main__':
+ audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
+ output_dir = None
+ inference_pipeline = pipeline(
+ task=Tasks.auto_speech_recognition,
+ model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
+ vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
+ punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
+ output_dir=output_dir
+ )
+ rec_result = inference_pipeline(audio_in=audio_in)
+ print(rec_result)
+
diff --git a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.py b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.py
index df471d6dc..5bc205cda 100644
--- a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.py
+++ b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.py
@@ -1,19 +1,28 @@
+import os
+import shutil
+import argparse
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
-if __name__ == '__main__':
- audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav'
- output_dir = None
+def modelscope_infer(args):
+ os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpuid)
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
- model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
- model_revision="v1.2.1",
- vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
- vad_model_revision="v1.1.8",
- punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
- punc_model_revision="v1.1.6",
- ngpu=1,
+ model=args.model,
+ output_dir=args.output_dir,
+ batch_size=args.batch_size,
+ param_dict={"decoding_model": args.decoding_mode, "hotword": args.hotword_txt}
)
- rec_result = inference_pipeline(audio_in=audio_in)
- print(rec_result)
+ inference_pipeline(audio_in=args.audio_in)
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--model', type=str, default="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch")
+ parser.add_argument('--audio_in', type=str, default="./data/test/wav.scp")
+ parser.add_argument('--output_dir', type=str, default="./results/")
+ parser.add_argument('--decoding_mode', type=str, default="normal")
+ parser.add_argument('--hotword_txt', type=str, default=None)
+ parser.add_argument('--batch_size', type=int, default=64)
+ parser.add_argument('--gpuid', type=str, default="0")
+ args = parser.parse_args()
+ modelscope_infer(args)
diff --git a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh
new file mode 100644
index 000000000..ef49d7a60
--- /dev/null
+++ b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer.sh
@@ -0,0 +1,103 @@
+#!/usr/bin/env bash
+
+set -e
+set -u
+set -o pipefail
+
+stage=1
+stop_stage=2
+model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
+data_dir="./data/test"
+output_dir="./results"
+batch_size=64
+gpu_inference=true # whether to perform gpu decoding
+gpuid_list="0,1" # set gpus, e.g., gpuid_list="0,1"
+njob=64 # the number of jobs for CPU decoding, if gpu_inference=false, use CPU decoding, please set njob
+checkpoint_dir=
+checkpoint_name="valid.cer_ctc.ave.pb"
+
+. utils/parse_options.sh || exit 1;
+
+if ${gpu_inference} == "true"; then
+ nj=$(echo $gpuid_list | awk -F "," '{print NF}')
+else
+ nj=$njob
+ batch_size=1
+ gpuid_list=""
+ for JOB in $(seq ${nj}); do
+ gpuid_list=$gpuid_list"-1,"
+ done
+fi
+
+mkdir -p $output_dir/split
+split_scps=""
+for JOB in $(seq ${nj}); do
+ split_scps="$split_scps $output_dir/split/wav.$JOB.scp"
+done
+perl utils/split_scp.pl ${data_dir}/wav.scp ${split_scps}
+
+if [ -n "${checkpoint_dir}" ]; then
+ python utils/prepare_checkpoint.py ${model} ${checkpoint_dir} ${checkpoint_name}
+ model=${checkpoint_dir}/${model}
+fi
+
+if [ $stage -le 1 ] && [ $stop_stage -ge 1 ];then
+ echo "Decoding ..."
+ gpuid_list_array=(${gpuid_list//,/ })
+ for JOB in $(seq ${nj}); do
+ {
+ id=$((JOB-1))
+ gpuid=${gpuid_list_array[$id]}
+ mkdir -p ${output_dir}/output.$JOB
+ python infer.py \
+ --model ${model} \
+ --audio_in ${output_dir}/split/wav.$JOB.scp \
+ --output_dir ${output_dir}/output.$JOB \
+ --batch_size ${batch_size} \
+ --gpuid ${gpuid}
+ }&
+ done
+ wait
+
+ mkdir -p ${output_dir}/1best_recog
+ for f in token score text; do
+ if [ -f "${output_dir}/output.1/1best_recog/${f}" ]; then
+ for i in $(seq "${nj}"); do
+ cat "${output_dir}/output.${i}/1best_recog/${f}"
+ done | sort -k1 >"${output_dir}/1best_recog/${f}"
+ fi
+ done
+fi
+
+if [ $stage -le 2 ] && [ $stop_stage -ge 2 ];then
+ echo "Computing WER ..."
+ cp ${output_dir}/1best_recog/text ${output_dir}/1best_recog/text.proc
+ cp ${data_dir}/text ${output_dir}/1best_recog/text.ref
+ python utils/compute_wer.py ${output_dir}/1best_recog/text.ref ${output_dir}/1best_recog/text.proc ${output_dir}/1best_recog/text.cer
+ tail -n 3 ${output_dir}/1best_recog/text.cer
+fi
+
+if [ $stage -le 3 ] && [ $stop_stage -ge 3 ];then
+ echo "SpeechIO TIOBE textnorm"
+ echo "$0 --> Normalizing REF text ..."
+ ./utils/textnorm_zh.py \
+ --has_key --to_upper \
+ ${data_dir}/text \
+ ${output_dir}/1best_recog/ref.txt
+
+ echo "$0 --> Normalizing HYP text ..."
+ ./utils/textnorm_zh.py \
+ --has_key --to_upper \
+ ${output_dir}/1best_recog/text.proc \
+ ${output_dir}/1best_recog/rec.txt
+ grep -v $'\t$' ${output_dir}/1best_recog/rec.txt > ${output_dir}/1best_recog/rec_non_empty.txt
+
+ echo "$0 --> computing WER/CER and alignment ..."
+ ./utils/error_rate_zh \
+ --tokenizer char \
+ --ref ${output_dir}/1best_recog/ref.txt \
+ --hyp ${output_dir}/1best_recog/rec_non_empty.txt \
+ ${output_dir}/1best_recog/DETAILS.txt | tee ${output_dir}/1best_recog/RESULTS.txt
+ rm -rf ${output_dir}/1best_recog/rec.txt ${output_dir}/1best_recog/rec_non_empty.txt
+fi
+
diff --git a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer_after_finetune.py b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer_after_finetune.py
deleted file mode 100644
index 473019c70..000000000
--- a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/infer_after_finetune.py
+++ /dev/null
@@ -1,47 +0,0 @@
-import json
-import os
-import shutil
-
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-from modelscope.hub.snapshot_download import snapshot_download
-
-from funasr.utils.compute_wer import compute_wer
-
-def modelscope_infer_after_finetune(params):
- # prepare for decoding
-
- try:
- pretrained_model_path = snapshot_download(params["modelscope_model_name"], cache_dir=params["output_dir"])
- except BaseException:
- raise BaseException(f"Please download pretrain model from ModelScope firstly.")shutil.copy(os.path.join(params["output_dir"], params["decoding_model_name"]), os.path.join(pretrained_model_path, "model.pb"))
- decoding_path = os.path.join(params["output_dir"], "decode_results")
- if os.path.exists(decoding_path):
- shutil.rmtree(decoding_path)
- os.mkdir(decoding_path)
-
- # decoding
- inference_pipeline = pipeline(
- task=Tasks.auto_speech_recognition,
- model=pretrained_model_path,
- output_dir=decoding_path,
- batch_size=params["batch_size"]
- )
- audio_in = os.path.join(params["data_dir"], "wav.scp")
- inference_pipeline(audio_in=audio_in)
-
- # computer CER if GT text is set
- text_in = os.path.join(params["data_dir"], "text")
- if os.path.exists(text_in):
- text_proc_file = os.path.join(decoding_path, "1best_recog/token")
- compute_wer(text_in, text_proc_file, os.path.join(decoding_path, "text.cer"))
-
-
-if __name__ == '__main__':
- params = {}
- params["modelscope_model_name"] = "damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
- params["output_dir"] = "./checkpoint"
- params["data_dir"] = "./data/test"
- params["decoding_model_name"] = "valid.acc.ave_10best.pb"
- params["batch_size"] = 64
- modelscope_infer_after_finetune(params)
\ No newline at end of file
diff --git a/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/utils b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/utils
new file mode 120000
index 000000000..3d3dd06b0
--- /dev/null
+++ b/egs_modelscope/asr_vad_punc/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/utils
@@ -0,0 +1 @@
+../../asr/TEMPLATE/utils
\ No newline at end of file
diff --git a/egs_modelscope/lm/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/infer.py b/egs_modelscope/lm/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/infer.py
index ec309b2ce..628cdd86b 100644
--- a/egs_modelscope/lm/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/infer.py
+++ b/egs_modelscope/lm/speech_transformer_lm_zh-cn-common-vocab8404-pytorch/infer.py
@@ -6,12 +6,12 @@ inputs = "hello 大 家 好 呀"
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
-inference_pipline = pipeline(
+inference_pipeline = pipeline(
task=Tasks.language_score_prediction,
model='damo/speech_transformer_lm_zh-cn-common-vocab8404-pytorch',
output_dir="./tmp/"
)
-rec_result = inference_pipline(text_in=inputs)
+rec_result = inference_pipeline(text_in=inputs)
print(rec_result)
diff --git a/egs_modelscope/punctuation/TEMPLATE/README.md b/egs_modelscope/punctuation/TEMPLATE/README.md
new file mode 100644
index 000000000..dfbe04480
--- /dev/null
+++ b/egs_modelscope/punctuation/TEMPLATE/README.md
@@ -0,0 +1,110 @@
+# Punctuation Restoration
+
+> **Note**:
+> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetune. Here we take the model of the punctuation model of CT-Transformer as example to demonstrate the usage.
+
+## Inference
+
+### Quick start
+#### [CT-Transformer model](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary)
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+inference_pipeline = pipeline(
+ task=Tasks.punctuation,
+ model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
+ model_revision=None)
+
+rec_result = inference_pipeline(text_in='example/punc_example.txt')
+print(rec_result)
+```
+- text二进制数据,例如:用户直接从文件里读出bytes数据
+```python
+rec_result = inference_pipeline(text_in='我们都是木头人不会讲话不会动')
+```
+- text文件url,例如:https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt
+```python
+rec_result = inference_pipeline(text_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt')
+```
+
+#### [CT-Transformer Realtime model](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727/summary)
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+inference_pipeline = pipeline(
+ task=Tasks.punctuation,
+ model='damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727',
+ model_revision=None,
+)
+
+inputs = "跨境河流是养育沿岸|人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员|在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险|向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流问题上的关切|愿意进一步完善双方联合工作机制|凡是|中方能做的我们|都会去做而且会做得更好我请印度朋友们放心中国在上游的|任何开发利用都会经过科学|规划和论证兼顾上下游的利益"
+vads = inputs.split("|")
+rec_result_all="outputs:"
+param_dict = {"cache": []}
+for vad in vads:
+ rec_result = inference_pipeline(text_in=vad, param_dict=param_dict)
+ rec_result_all += rec_result['text']
+
+print(rec_result_all)
+```
+Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/238)
+
+
+### API-reference
+#### Define pipeline
+- `task`: `Tasks.punctuation`
+- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
+- `output_dir`: `None` (Default), the output path of results if set
+- `model_revision`: `None` (Default), setting the model version
+
+#### Infer pipeline
+- `text_in`: the input to decode, which could be:
+ - text bytes, `e.g.`: "我们都是木头人不会讲话不会动"
+ - text file, `e.g.`: example/punc_example.txt
+ In this case of `text file` input, `output_dir` must be set to save the output results
+- `param_dict`: reserving the cache which is necessary in realtime mode.
+
+### Inference with multi-thread CPUs or multi GPUs
+FunASR also offer recipes [egs_modelscope/punctuation/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/punctuation/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs. It is an offline recipe and only support offline model.
+
+#### Settings of `infer.sh`
+- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+- `data_dir`: the dataset dir needs to include `punc.txt`
+- `output_dir`: output dir of the recognition results
+- `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
+- `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
+- `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
+- `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
+- `checkpoint_name`: only used for infer finetuned models, `punc.pb` (Default), which checkpoint is used to infer
+
+#### Decode with multi GPUs:
+```shell
+ bash infer.sh \
+ --model "damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch" \
+ --data_dir "./data/test" \
+ --output_dir "./results" \
+ --batch_size 1 \
+ --gpu_inference true \
+ --gpuid_list "0,1"
+```
+#### Decode with multi-thread CPUs:
+```shell
+ bash infer.sh \
+ --model "damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch" \
+ --data_dir "./data/test" \
+ --output_dir "./results" \
+ --gpu_inference false \
+ --njob 1
+```
+
+## Finetune with pipeline
+
+### Quick start
+
+### Finetune with your data
+
+## Inference with your finetuned model
+
diff --git a/egs_modelscope/punctuation/TEMPLATE/infer.py b/egs_modelscope/punctuation/TEMPLATE/infer.py
new file mode 100644
index 000000000..edcefbeda
--- /dev/null
+++ b/egs_modelscope/punctuation/TEMPLATE/infer.py
@@ -0,0 +1,23 @@
+import os
+import shutil
+import argparse
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+def modelscope_infer(args):
+ os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpuid)
+ inference_pipeline = pipeline(
+ task=Tasks.punctuation,
+ model=args.model,
+ output_dir=args.output_dir,
+ )
+ inference_pipeline(text_in=args.text_in)
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--model', type=str, default="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch")
+ parser.add_argument('--text_in', type=str, default="./data/test/punc.txt")
+ parser.add_argument('--output_dir', type=str, default="./results/")
+ parser.add_argument('--gpuid', type=str, default="0")
+ args = parser.parse_args()
+ modelscope_infer(args)
\ No newline at end of file
diff --git a/egs_modelscope/punctuation/TEMPLATE/infer.sh b/egs_modelscope/punctuation/TEMPLATE/infer.sh
new file mode 100644
index 000000000..0af502ed2
--- /dev/null
+++ b/egs_modelscope/punctuation/TEMPLATE/infer.sh
@@ -0,0 +1,66 @@
+#!/usr/bin/env bash
+
+set -e
+set -u
+set -o pipefail
+
+stage=1
+stop_stage=2
+model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
+data_dir="./data/test"
+output_dir="./results"
+gpu_inference=true # whether to perform gpu decoding
+gpuid_list="0,1" # set gpus, e.g., gpuid_list="0,1"
+njob=64 # the number of jobs for CPU decoding, if gpu_inference=false, use CPU decoding, please set njob
+checkpoint_dir=
+checkpoint_name="punc.pb"
+
+. utils/parse_options.sh || exit 1;
+
+if ${gpu_inference} == "true"; then
+ nj=$(echo $gpuid_list | awk -F "," '{print NF}')
+else
+ nj=$njob
+ gpuid_list=""
+ for JOB in $(seq ${nj}); do
+ gpuid_list=$gpuid_list"-1,"
+ done
+fi
+
+mkdir -p $output_dir/split
+split_scps=""
+for JOB in $(seq ${nj}); do
+ split_scps="$split_scps $output_dir/split/text.$JOB.scp"
+done
+perl utils/split_scp.pl ${data_dir}/punc.txt ${split_scps}
+
+if [ -n "${checkpoint_dir}" ]; then
+ python utils/prepare_checkpoint.py ${model} ${checkpoint_dir} ${checkpoint_name}
+ model=${checkpoint_dir}/${model}
+fi
+
+if [ $stage -le 1 ] && [ $stop_stage -ge 1 ];then
+ echo "Decoding ..."
+ gpuid_list_array=(${gpuid_list//,/ })
+ for JOB in $(seq ${nj}); do
+ {
+ id=$((JOB-1))
+ gpuid=${gpuid_list_array[$id]}
+ mkdir -p ${output_dir}/output.$JOB
+ python infer.py \
+ --model ${model} \
+ --text_in ${output_dir}/split/text.$JOB.scp \
+ --output_dir ${output_dir}/output.$JOB \
+ --gpuid ${gpuid}
+ }&
+ done
+ wait
+
+ mkdir -p ${output_dir}/final_res
+ if [ -f "${output_dir}/output.1/infer.out" ]; then
+ for i in $(seq "${nj}"); do
+ cat "${output_dir}/output.${i}/infer.out"
+ done | sort -k1 >"${output_dir}/final_res/infer.out"
+ fi
+fi
+
diff --git a/egs_modelscope/punctuation/TEMPLATE/utils b/egs_modelscope/punctuation/TEMPLATE/utils
new file mode 120000
index 000000000..dc7d4171f
--- /dev/null
+++ b/egs_modelscope/punctuation/TEMPLATE/utils
@@ -0,0 +1 @@
+../../../egs/aishell/transformer/utils
\ No newline at end of file
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/README.md b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/README.md
new file mode 120000
index 000000000..bb55ab52e
--- /dev/null
+++ b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/README.md
@@ -0,0 +1 @@
+../../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/demo.py b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/demo.py
new file mode 100644
index 000000000..a6629cdd0
--- /dev/null
+++ b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/demo.py
@@ -0,0 +1,27 @@
+
+##################text二进制数据#####################
+inputs = "跨境河流是养育沿岸|人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员|在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险|向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流问题上的关切|愿意进一步完善双方联合工作机制|凡是|中方能做的我们|都会去做而且会做得更好我请印度朋友们放心中国在上游的|任何开发利用都会经过科学|规划和论证兼顾上下游的利益"
+
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+from modelscope.utils.logger import get_logger
+import logging
+logger = get_logger(log_level=logging.CRITICAL)
+logger.setLevel(logging.CRITICAL)
+
+
+inference_pipeline = pipeline(
+ task=Tasks.punctuation,
+ model='damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727',
+ output_dir="./tmp/"
+)
+
+vads = inputs.split("|")
+rec_result_all="outputs:"
+param_dict = {"cache": []}
+for vad in vads:
+ rec_result = inference_pipeline(text_in=vad, param_dict=param_dict)
+ rec_result_all += rec_result['text']
+
+print(rec_result_all)
+
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.py b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.py
deleted file mode 100644
index a6629cdd0..000000000
--- a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.py
+++ /dev/null
@@ -1,27 +0,0 @@
-
-##################text二进制数据#####################
-inputs = "跨境河流是养育沿岸|人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员|在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险|向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流问题上的关切|愿意进一步完善双方联合工作机制|凡是|中方能做的我们|都会去做而且会做得更好我请印度朋友们放心中国在上游的|任何开发利用都会经过科学|规划和论证兼顾上下游的利益"
-
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-from modelscope.utils.logger import get_logger
-import logging
-logger = get_logger(log_level=logging.CRITICAL)
-logger.setLevel(logging.CRITICAL)
-
-
-inference_pipeline = pipeline(
- task=Tasks.punctuation,
- model='damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727',
- output_dir="./tmp/"
-)
-
-vads = inputs.split("|")
-rec_result_all="outputs:"
-param_dict = {"cache": []}
-for vad in vads:
- rec_result = inference_pipeline(text_in=vad, param_dict=param_dict)
- rec_result_all += rec_result['text']
-
-print(rec_result_all)
-
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.py b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.py
new file mode 120000
index 000000000..128fc31c2
--- /dev/null
+++ b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.py
@@ -0,0 +1 @@
+../../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.sh b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.sh
new file mode 120000
index 000000000..5e59f1841
--- /dev/null
+++ b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vadrealtime-vocab272727/infer.sh
@@ -0,0 +1 @@
+../../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/README.md b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/README.md
deleted file mode 100644
index b125d48ff..000000000
--- a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/README.md
+++ /dev/null
@@ -1,19 +0,0 @@
-# ModelScope Model
-
-## How to finetune and infer using a pretrained ModelScope Model
-
-### Inference
-
-Or you can use the finetuned model for inference directly.
-
-task=Tasks.punctuation,
- model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
-
-- Setting parameters in `modelscope_common_infer.sh`
- - model: damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch # pre-trained model, download from modelscope
- - text_in: input path, text or url
- - output_dir: the result dir
-- Then you can run the pipeline to infer with:
-```sh
- python ./infer.py
-```
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/README.md b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/README.md
new file mode 120000
index 000000000..92088a21d
--- /dev/null
+++ b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/README.md
@@ -0,0 +1 @@
+../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/data/punc_example.txt b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/data/punc_example.txt
deleted file mode 100644
index 367be7997..000000000
--- a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/data/punc_example.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-1 跨境河流是养育沿岸人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流问题上的关切愿意进一步完善双方联合工作机制凡是中方能做的我们都会去做而且会做得更好我请印度朋友们放心中国在上游的任何开发利用都会经过科学规划和论证兼顾上下游的利益
-2 从存储上来说仅仅是全景图片它就会是图片的四倍的容量然后全景的视频会是普通视频八倍的这个存储的容要求而三d的模型会是图片的十倍这都对我们今天运行在的云计算的平台存储的平台提出了更高的要求
-3 那今天的会就到这里吧 happy new year 明年见
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/demo.py b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/demo.py
new file mode 100644
index 000000000..20994d39c
--- /dev/null
+++ b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/demo.py
@@ -0,0 +1,23 @@
+
+##################text.scp文件路径###################
+inputs = "./egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/data/punc_example.txt"
+
+##################text二进制数据#####################
+#inputs = "我们都是木头人不会讲话不会动"
+
+##################text文件url#######################
+#inputs = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt"
+
+
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+inference_pipeline = pipeline(
+ task=Tasks.punctuation,
+ model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
+ model_revision="v1.1.7",
+ output_dir="./tmp/"
+)
+
+rec_result = inference_pipeline(text_in=inputs)
+print(rec_result)
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.py b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.py
deleted file mode 100644
index 0da8d25a1..000000000
--- a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.py
+++ /dev/null
@@ -1,23 +0,0 @@
-
-##################text.scp文件路径###################
-inputs = "./egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/data/punc_example.txt"
-
-##################text二进制数据#####################
-#inputs = "我们都是木头人不会讲话不会动"
-
-##################text文件url#######################
-#inputs = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt"
-
-
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-inference_pipline = pipeline(
- task=Tasks.punctuation,
- model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
- model_revision="v1.1.7",
- output_dir="./tmp/"
-)
-
-rec_result = inference_pipline(text_in=inputs)
-print(rec_result)
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.py b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.py
new file mode 120000
index 000000000..f05fbbb8b
--- /dev/null
+++ b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.py
@@ -0,0 +1 @@
+../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.sh b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.sh
new file mode 120000
index 000000000..0b3b38b6f
--- /dev/null
+++ b/egs_modelscope/punctuation/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/infer.sh
@@ -0,0 +1 @@
+../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/speaker_diarization/TEMPLATE/README.md b/egs_modelscope/speaker_diarization/TEMPLATE/README.md
new file mode 100644
index 000000000..99c9b593c
--- /dev/null
+++ b/egs_modelscope/speaker_diarization/TEMPLATE/README.md
@@ -0,0 +1,81 @@
+# Speaker Diarization
+
+> **Note**:
+> The modelscope pipeline supports all the models in
+[model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope)
+to inference and finetine. Here we take the model of xvector_sv as example to demonstrate the usage.
+
+## Inference with pipeline
+### Quick start
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+# initialize pipeline
+inference_diar_pipline = pipeline(
+ mode="sond_demo",
+ num_workers=0,
+ task=Tasks.speaker_diarization,
+ diar_model_config="sond.yaml",
+ model='damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch',
+ reversion="v1.0.5",
+ sv_model="damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch",
+ sv_model_revision="v1.2.2",
+)
+
+# input: a list of audio in which the first item is a speech recording to detect speakers,
+# and the following wav file are used to extract speaker embeddings.
+audio_list = [
+ "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/record.wav",
+ "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/spk1.wav",
+ "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/spk2.wav",
+ "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/spk3.wav",
+ "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_data/speaker_diarization/spk4.wav",
+]
+
+results = inference_diar_pipline(audio_in=audio_list)
+print(results)
+```
+
+### API-reference
+#### Define pipeline
+- `task`: `Tasks.speaker_diarization`
+- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
+- `output_dir`: `None` (Default), the output path of results if set
+- `batch_size`: `1` (Default), batch size when decoding
+- `smooth_size`: `83` (Default), the window size to perform smoothing
+- `dur_threshold`: `10` (Default), segments shorter than 100 ms will be dropped
+- `out_format`: `vad` (Default), the output format, choices `["vad", "rttm"]`.
+ - vad format: spk1: [1.0, 3.0], [5.0, 8.0]
+ - rttm format: "SPEAKER test1 0 1.00 2.00 spk1 " and "SPEAKER test1 0 5.00 3.00 spk1 "
+
+#### Infer pipeline for speaker embedding extraction
+- `audio_in`: the input to process, which could be:
+ - list of url: `e.g.`: waveform files at a website
+ - list of local file path: `e.g.`: path/to/a.wav
+ - ("wav.scp,speech,sound", "profile.scp,profile,kaldi_ark"): a script file of waveform files and another script file of speaker profiles (extracted with the [model](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary))
+ ```text
+ wav.scp
+ test1 path/to/enroll1.wav
+ test2 path/to/enroll2.wav
+
+ profile.scp
+ test1 path/to/profile.ark:11
+ test2 path/to/profile.ark:234
+ ```
+ The profile.ark file contains speaker embeddings in a kaldi-like style.
+ Please refer [README.md](../../speaker_verification/TEMPLATE/README.md) for more details.
+
+### Inference with you data
+For single input, we recommend the "list of local file path" mode for inference.
+For multiple inputs, we recommend the last mode with pre-organized wav.scp and profile.scp.
+
+### Inference with multi-threads on CPU
+We recommend the last mode with split wav.scp and profile.scp. Then, run inference for each split part.
+Please refer [README.md](../../speaker_verification/TEMPLATE/README.md) to find a similar process.
+
+### Inference with multi GPU
+Similar to CPU, please set `ngpu=1` for inference on GPU.
+Besides, you should use `CUDA_VISIBLE_DEVICES=0` to specify a GPU device.
+Please refer [README.md](../../speaker_verification/TEMPLATE/README.md) to find a similar process.
diff --git a/egs_modelscope/speaker_verification/TEMPLATE/README.md b/egs_modelscope/speaker_verification/TEMPLATE/README.md
new file mode 100644
index 000000000..f7b64ce4b
--- /dev/null
+++ b/egs_modelscope/speaker_verification/TEMPLATE/README.md
@@ -0,0 +1,121 @@
+# Speaker Verification
+
+> **Note**:
+> The modelscope pipeline supports all the models in
+[model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope)
+to inference and finetine. Here we take the model of xvector_sv as example to demonstrate the usage.
+
+## Inference with pipeline
+
+### Quick start
+#### Speaker verification
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+inference_sv_pipline = pipeline(
+ task=Tasks.speaker_verification,
+ model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
+)
+
+# The same speaker
+rec_result = inference_sv_pipline(audio_in=(
+ 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav',
+ 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_same.wav'))
+print("Similarity", rec_result["scores"])
+
+# Different speakers
+rec_result = inference_sv_pipline(audio_in=(
+ 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav',
+ 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav'))
+print("Similarity", rec_result["scores"])
+```
+#### Speaker embedding extraction
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+# Define extraction pipeline
+inference_sv_pipline = pipeline(
+ task=Tasks.speaker_verification,
+ model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
+)
+# Extract speaker embedding
+rec_result = inference_sv_pipline(
+ audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav')
+speaker_embedding = rec_result["spk_embedding"]
+```
+Full code of demo, please ref to [infer.py](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/speaker_verification/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/infer.py).
+
+### API-reference
+#### Define pipeline
+- `task`: `Tasks.speaker_verification`
+- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
+- `output_dir`: `None` (Default), the output path of results if set
+- `batch_size`: `1` (Default), batch size when decoding
+- `sv_threshold`: `0.9465` (Default), the similarity threshold to determine
+whether utterances belong to the same speaker (it should be in (0, 1))
+
+#### Infer pipeline for speaker embedding extraction
+- `audio_in`: the input to process, which could be:
+ - url (str): `e.g.`: https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav
+ - local_path: `e.g.`: path/to/a.wav
+ - wav.scp: `e.g.`: path/to/wav1.scp
+ ```text
+ wav.scp
+ test1 path/to/enroll1.wav
+ test2 path/to/enroll2.wav
+ ```
+ - bytes: `e.g.`: raw bytes data from a microphone
+ - fbank1.scp,speech,kaldi_ark: `e.g.`: extracted 80-dimensional fbank features
+with kaldi toolkits.
+
+#### Infer pipeline for speaker verification
+- `audio_in`: the input to process, which could be:
+ - Tuple(url1, url2): `e.g.`: (https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav, https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav)
+ - Tuple(local_path1, local_path2): `e.g.`: (path/to/a.wav, path/to/b.wav)
+ - Tuple(wav1.scp, wav2.scp): `e.g.`: (path/to/wav1.scp, path/to/wav2.scp)
+ ```text
+ wav1.scp
+ test1 path/to/enroll1.wav
+ test2 path/to/enroll2.wav
+
+ wav2.scp
+ test1 path/to/same1.wav
+ test2 path/to/diff2.wav
+ ```
+ - Tuple(bytes, bytes): `e.g.`: raw bytes data from a microphone
+ - Tuple("fbank1.scp,speech,kaldi_ark", "fbank2.scp,speech,kaldi_ark"): `e.g.`: extracted 80-dimensional fbank features
+with kaldi toolkits.
+
+### Inference with you data
+Use wav1.scp or fbank.scp to organize your own data to extract speaker embeddings or perform speaker verification.
+In this case, the `output_dir` should be set to save all the embeddings or scores.
+
+### Inference with multi-threads on CPU
+You can inference with multi-threads on CPU as follow steps:
+1. Set `ngpu=0` while defining the pipeline in `infer.py`.
+2. Split wav.scp to several files `e.g.: 4`
+ ```shell
+ split -l $((`wc -l < wav.scp`/4+1)) --numeric-suffixes wav.scp splits/wav.scp.
+ ```
+3. Start to extract embeddings
+ ```shell
+ for wav_scp in `ls splits/wav.scp.*`; do
+ infer.py ${wav_scp} outputs/$((basename ${wav_scp}))
+ done
+ ```
+4. The embeddings will be saved in `outputs/*`
+
+### Inference with multi GPU
+Similar to inference on CPU, the difference are as follows:
+
+Step 1. Set `ngpu=1` while defining the pipeline in `infer.py`.
+
+Step 3. specify the gpu device with `CUDA_VISIBLE_DEVICES`:
+```shell
+ for wav_scp in `ls splits/wav.scp.*`; do
+ CUDA_VISIBLE_DEVICES=1 infer.py ${wav_scp} outputs/$((basename ${wav_scp}))
+ done
+ ```
diff --git a/egs_modelscope/speaker_verification/TEMPLATE/infer.py b/egs_modelscope/speaker_verification/TEMPLATE/infer.py
new file mode 100644
index 000000000..efab09729
--- /dev/null
+++ b/egs_modelscope/speaker_verification/TEMPLATE/infer.py
@@ -0,0 +1,15 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+import sys
+
+# Define extraction pipeline
+inference_sv_pipline = pipeline(
+ task=Tasks.speaker_verification,
+ model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch',
+ output_dir=sys.argv[2],
+)
+# Extract speaker embedding
+rec_result = inference_sv_pipline(
+ audio_in=sys.argv[1],
+
+)
diff --git a/egs_modelscope/tp/TEMPLATE/README.md b/egs_modelscope/tp/TEMPLATE/README.md
index 2678a7fc0..62c35d80a 100644
--- a/egs_modelscope/tp/TEMPLATE/README.md
+++ b/egs_modelscope/tp/TEMPLATE/README.md
@@ -8,12 +8,12 @@
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
-inference_pipline = pipeline(
+inference_pipeline = pipeline(
task=Tasks.speech_timestamp,
model='damo/speech_timestamp_prediction-v1-16k-offline',
output_dir=None)
-rec_result = inference_pipline(
+rec_result = inference_pipeline(
audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_timestamps.wav',
text_in='一 个 东 太 平 洋 国 家 为 什 么 跑 到 西 太 平 洋 来 了 呢',)
print(rec_result)
@@ -23,15 +23,15 @@ Timestamp pipeline can also be used after ASR pipeline to compose complete ASR f
-#### API-reference
-##### Define pipeline
+### API-reference
+#### Define pipeline
- `task`: `Tasks.speech_timestamp`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
-##### Infer pipeline
+#### Infer pipeline
- `audio_in`: the input speech to predict, which could be:
- wav_path, `e.g.`: asr_example.wav (wav in local or url),
- wav.scp, kaldi style wav list (`wav_id wav_path`), `e.g.`:
@@ -59,37 +59,37 @@ Timestamp pipeline can also be used after ASR pipeline to compose complete ASR f
```
### Inference with multi-thread CPUs or multi GPUs
-FunASR also offer recipes [egs_modelscope/vad/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/vad/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
+FunASR also offer recipes [egs_modelscope/tp/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/tp/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
-- Setting parameters in `infer.sh`
- - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- - `data_dir`: the dataset dir **must** include `wav.scp` and `text.scp`
- - `output_dir`: output dir of the recognition results
- - `batch_size`: `64` (Default), batch size of inference on gpu
- - `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
- - `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
- - `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
- - `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
- - `checkpoint_name`: only used for infer finetuned models, `valid.cer_ctc.ave.pb` (Default), which checkpoint is used to infer
+#### Settings of `infer.sh`
+- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+- `data_dir`: the dataset dir **must** include `wav.scp` and `text.txt`
+- `output_dir`: output dir of the recognition results
+- `batch_size`: `64` (Default), batch size of inference on gpu
+- `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
+- `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
+- `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
+- `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
+- `checkpoint_name`: only used for infer finetuned models, `valid.cer_ctc.ave.pb` (Default), which checkpoint is used to infer
-- Decode with multi GPUs:
+#### Decode with multi GPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
- --batch_size 64 \
+ --batch_size 1 \
--gpu_inference true \
--gpuid_list "0,1"
```
-- Decode with multi-thread CPUs:
+#### Decode with multi-thread CPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--gpu_inference false \
- --njob 64
+ --njob 1
```
## Finetune with pipeline
diff --git a/egs_modelscope/tp/TEMPLATE/infer.py b/egs_modelscope/tp/TEMPLATE/infer.py
deleted file mode 120000
index df5dff25e..000000000
--- a/egs_modelscope/tp/TEMPLATE/infer.py
+++ /dev/null
@@ -1 +0,0 @@
-../speech_timestamp_prediction-v1-16k-offline/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/tp/TEMPLATE/infer.py b/egs_modelscope/tp/TEMPLATE/infer.py
new file mode 100644
index 000000000..6a7e496f7
--- /dev/null
+++ b/egs_modelscope/tp/TEMPLATE/infer.py
@@ -0,0 +1,28 @@
+import os
+import argparse
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+def modelscope_infer(args):
+ os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpuid)
+ inference_pipeline = pipeline(
+ task=Tasks.speech_timestamp,
+ model=args.model,
+ output_dir=args.output_dir,
+ batch_size=args.batch_size,
+ )
+ if args.output_dir is not None:
+ inference_pipeline(audio_in=args.audio_in, text_in=args.text_in)
+ else:
+ print(inference_pipeline(audio_in=args.audio_in, text_in=args.text_in))
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--model', type=str, default="damo/speech_timestamp_prediction-v1-16k-offline")
+ parser.add_argument('--audio_in', type=str, default="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_timestamps.wav")
+ parser.add_argument('--text_in', type=str, default="一 个 东 太 平 洋 国 家 为 什 么 跑 到 西 太 平 洋 来 了 呢")
+ parser.add_argument('--output_dir', type=str, default="./results/")
+ parser.add_argument('--batch_size', type=int, default=1)
+ parser.add_argument('--gpuid', type=str, default="0")
+ args = parser.parse_args()
+ modelscope_infer(args)
diff --git a/egs_modelscope/tp/TEMPLATE/infer.sh b/egs_modelscope/tp/TEMPLATE/infer.sh
index 2a923bb40..bae62e8b8 100644
--- a/egs_modelscope/tp/TEMPLATE/infer.sh
+++ b/egs_modelscope/tp/TEMPLATE/infer.sh
@@ -37,7 +37,7 @@ for JOB in $(seq ${nj}); do
split_texts="$split_texts $output_dir/split/text.$JOB.scp"
done
perl utils/split_scp.pl ${data_dir}/wav.scp ${split_scps}
-perl utils/split_scp.pl ${data_dir}/text.scp ${split_texts}
+perl utils/split_scp.pl ${data_dir}/text.txt ${split_texts}
if [ -n "${checkpoint_dir}" ]; then
python utils/prepare_checkpoint.py ${model} ${checkpoint_dir} ${checkpoint_name}
diff --git a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md
deleted file mode 100644
index 5488aaa3c..000000000
--- a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md
+++ /dev/null
@@ -1,25 +0,0 @@
-# ModelScope Model
-
-## How to finetune and infer using a pretrained ModelScope Model
-
-### Inference
-
-Or you can use the finetuned model for inference directly.
-
-- Setting parameters in `infer.py`
- - audio_in: # support wav, url, bytes, and parsed audio format.
- - text_in: # support text, text url.
- - output_dir: # If the input format is wav.scp, it needs to be set.
-
-- Then you can run the pipeline to infer with:
-```python
- python infer.py
-```
-
-
-Modify inference related parameters in vad.yaml.
-
-- max_end_silence_time: The end-point silence duration to judge the end of sentence, the parameter range is 500ms~6000ms, and the default value is 800ms
-- speech_noise_thres: The balance of speech and silence scores, the parameter range is (-1,1)
- - The value tends to -1, the greater probability of noise being judged as speech
- - The value tends to 1, the greater probability of speech being judged as noise
diff --git a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md
new file mode 120000
index 000000000..bb55ab52e
--- /dev/null
+++ b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/README.md
@@ -0,0 +1 @@
+../../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/demo.py b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/demo.py
new file mode 100644
index 000000000..bcc512837
--- /dev/null
+++ b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/demo.py
@@ -0,0 +1,12 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+inference_pipeline = pipeline(
+ task=Tasks.speech_timestamp,
+ model='damo/speech_timestamp_prediction-v1-16k-offline',
+ output_dir=None)
+
+rec_result = inference_pipeline(
+ audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_timestamps.wav',
+ text_in='一 个 东 太 平 洋 国 家 为 什 么 跑 到 西 太 平 洋 来 了 呢',)
+print(rec_result)
\ No newline at end of file
diff --git a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.py b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.py
deleted file mode 100644
index 6a7e496f7..000000000
--- a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.py
+++ /dev/null
@@ -1,28 +0,0 @@
-import os
-import argparse
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-def modelscope_infer(args):
- os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpuid)
- inference_pipeline = pipeline(
- task=Tasks.speech_timestamp,
- model=args.model,
- output_dir=args.output_dir,
- batch_size=args.batch_size,
- )
- if args.output_dir is not None:
- inference_pipeline(audio_in=args.audio_in, text_in=args.text_in)
- else:
- print(inference_pipeline(audio_in=args.audio_in, text_in=args.text_in))
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser()
- parser.add_argument('--model', type=str, default="damo/speech_timestamp_prediction-v1-16k-offline")
- parser.add_argument('--audio_in', type=str, default="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_timestamps.wav")
- parser.add_argument('--text_in', type=str, default="一 个 东 太 平 洋 国 家 为 什 么 跑 到 西 太 平 洋 来 了 呢")
- parser.add_argument('--output_dir', type=str, default="./results/")
- parser.add_argument('--batch_size', type=int, default=1)
- parser.add_argument('--gpuid', type=str, default="0")
- args = parser.parse_args()
- modelscope_infer(args)
diff --git a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.py b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.py
new file mode 120000
index 000000000..128fc31c2
--- /dev/null
+++ b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.py
@@ -0,0 +1 @@
+../../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.sh b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.sh
new file mode 120000
index 000000000..5e59f1841
--- /dev/null
+++ b/egs_modelscope/tp/speech_timestamp_prediction-v1-16k-offline/infer.sh
@@ -0,0 +1 @@
+../../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/vad/TEMPLATE/README.md b/egs_modelscope/vad/TEMPLATE/README.md
index 6f746d5d3..503b9bf8e 100644
--- a/egs_modelscope/vad/TEMPLATE/README.md
+++ b/egs_modelscope/vad/TEMPLATE/README.md
@@ -43,15 +43,15 @@ Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/
-#### API-reference
-##### Define pipeline
+### API-reference
+#### Define pipeline
- `task`: `Tasks.voice_activity_detection`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
-##### Infer pipeline
+#### Infer pipeline
- `audio_in`: the input to decode, which could be:
- wav_path, `e.g.`: asr_example.wav,
- pcm_path, `e.g.`: asr_example.pcm,
@@ -69,35 +69,35 @@ Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/
### Inference with multi-thread CPUs or multi GPUs
FunASR also offer recipes [egs_modelscope/vad/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/vad/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
-- Setting parameters in `infer.sh`
- - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- - `data_dir`: the dataset dir needs to include `wav.scp`
- - `output_dir`: output dir of the recognition results
- - `batch_size`: `64` (Default), batch size of inference on gpu
- - `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
- - `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
- - `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
- - `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
- - `checkpoint_name`: only used for infer finetuned models, `valid.cer_ctc.ave.pb` (Default), which checkpoint is used to infer
+#### Settings of `infer.sh`
+- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
+- `data_dir`: the dataset dir needs to include `wav.scp`
+- `output_dir`: output dir of the recognition results
+- `batch_size`: `64` (Default), batch size of inference on gpu
+- `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
+- `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
+- `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
+- `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
+- `checkpoint_name`: only used for infer finetuned models, `valid.cer_ctc.ave.pb` (Default), which checkpoint is used to infer
-- Decode with multi GPUs:
+#### Decode with multi GPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
- --batch_size 64 \
+ --batch_size 1 \
--gpu_inference true \
--gpuid_list "0,1"
```
-- Decode with multi-thread CPUs:
+#### Decode with multi-thread CPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--gpu_inference false \
- --njob 64
+ --njob 1
```
## Finetune with pipeline
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/README.md b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/README.md
deleted file mode 100644
index 6d9cd3024..000000000
--- a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/README.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# ModelScope Model
-
-## How to finetune and infer using a pretrained ModelScope Model
-
-### Inference
-
-Or you can use the finetuned model for inference directly.
-
-- Setting parameters in `infer.py`
- - audio_in: # support wav, url, bytes, and parsed audio format.
- - output_dir: # If the input format is wav.scp, it needs to be set.
-
-- Then you can run the pipeline to infer with:
-```python
- python infer.py
-```
-
-
-Modify inference related parameters in vad.yaml.
-
-- max_end_silence_time: The end-point silence duration to judge the end of sentence, the parameter range is 500ms~6000ms, and the default value is 800ms
-- speech_noise_thres: The balance of speech and silence scores, the parameter range is (-1,1)
- - The value tends to -1, the greater probability of noise being judged as speech
- - The value tends to 1, the greater probability of speech being judged as noise
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/README.md b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/README.md
new file mode 120000
index 000000000..bb55ab52e
--- /dev/null
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/README.md
@@ -0,0 +1 @@
+../../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/demo.py b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/demo.py
new file mode 100644
index 000000000..bbc16c5b6
--- /dev/null
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/demo.py
@@ -0,0 +1,15 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == '__main__':
+ audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav'
+ output_dir = None
+ inference_pipeline = pipeline(
+ task=Tasks.voice_activity_detection,
+ model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
+ model_revision='v1.2.0',
+ output_dir=output_dir,
+ batch_size=1,
+ )
+ segments_result = inference_pipeline(audio_in=audio_in)
+ print(segments_result)
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer_online.py b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/demo_online.py
similarity index 89%
rename from egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer_online.py
rename to egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/demo_online.py
index 02e919d2e..65693b5f1 100644
--- a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer_online.py
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/demo_online.py
@@ -8,7 +8,7 @@ import soundfile
if __name__ == '__main__':
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.voice_activity_detection,
model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
model_revision='v1.2.0',
@@ -30,7 +30,7 @@ if __name__ == '__main__':
else:
is_final = False
param_dict['is_final'] = is_final
- segments_result = inference_pipline(audio_in=speech[sample_offset: sample_offset + step],
+ segments_result = inference_pipeline(audio_in=speech[sample_offset: sample_offset + step],
param_dict=param_dict)
print(segments_result)
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.py b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.py
deleted file mode 100644
index 2bf3251e3..000000000
--- a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.py
+++ /dev/null
@@ -1,15 +0,0 @@
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-if __name__ == '__main__':
- audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav'
- output_dir = None
- inference_pipline = pipeline(
- task=Tasks.voice_activity_detection,
- model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
- model_revision='v1.2.0',
- output_dir=output_dir,
- batch_size=1,
- )
- segments_result = inference_pipline(audio_in=audio_in)
- print(segments_result)
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.py b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.py
new file mode 120000
index 000000000..128fc31c2
--- /dev/null
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.py
@@ -0,0 +1 @@
+../../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.sh b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.sh
new file mode 120000
index 000000000..5e59f1841
--- /dev/null
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-16k-common/infer.sh
@@ -0,0 +1 @@
+../../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/README.md b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/README.md
deleted file mode 100644
index 6d9cd3024..000000000
--- a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/README.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# ModelScope Model
-
-## How to finetune and infer using a pretrained ModelScope Model
-
-### Inference
-
-Or you can use the finetuned model for inference directly.
-
-- Setting parameters in `infer.py`
- - audio_in: # support wav, url, bytes, and parsed audio format.
- - output_dir: # If the input format is wav.scp, it needs to be set.
-
-- Then you can run the pipeline to infer with:
-```python
- python infer.py
-```
-
-
-Modify inference related parameters in vad.yaml.
-
-- max_end_silence_time: The end-point silence duration to judge the end of sentence, the parameter range is 500ms~6000ms, and the default value is 800ms
-- speech_noise_thres: The balance of speech and silence scores, the parameter range is (-1,1)
- - The value tends to -1, the greater probability of noise being judged as speech
- - The value tends to 1, the greater probability of speech being judged as noise
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/README.md b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/README.md
new file mode 120000
index 000000000..bb55ab52e
--- /dev/null
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/README.md
@@ -0,0 +1 @@
+../../TEMPLATE/README.md
\ No newline at end of file
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/demo.py b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/demo.py
new file mode 100644
index 000000000..84863d082
--- /dev/null
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/demo.py
@@ -0,0 +1,15 @@
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+if __name__ == '__main__':
+ audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example_8k.wav'
+ output_dir = None
+ inference_pipeline = pipeline(
+ task=Tasks.voice_activity_detection,
+ model="damo/speech_fsmn_vad_zh-cn-8k-common",
+ model_revision='v1.2.0',
+ output_dir=output_dir,
+ batch_size=1,
+ )
+ segments_result = inference_pipeline(audio_in=audio_in)
+ print(segments_result)
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer_online.py b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/demo_online.py
similarity index 89%
rename from egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer_online.py
rename to egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/demo_online.py
index a8cc912d6..5b67da74a 100644
--- a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer_online.py
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/demo_online.py
@@ -8,7 +8,7 @@ import soundfile
if __name__ == '__main__':
output_dir = None
- inference_pipline = pipeline(
+ inference_pipeline = pipeline(
task=Tasks.voice_activity_detection,
model="damo/speech_fsmn_vad_zh-cn-8k-common",
model_revision='v1.2.0',
@@ -30,7 +30,7 @@ if __name__ == '__main__':
else:
is_final = False
param_dict['is_final'] = is_final
- segments_result = inference_pipline(audio_in=speech[sample_offset: sample_offset + step],
+ segments_result = inference_pipeline(audio_in=speech[sample_offset: sample_offset + step],
param_dict=param_dict)
print(segments_result)
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.py b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.py
deleted file mode 100644
index 2e5027500..000000000
--- a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.py
+++ /dev/null
@@ -1,15 +0,0 @@
-from modelscope.pipelines import pipeline
-from modelscope.utils.constant import Tasks
-
-if __name__ == '__main__':
- audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example_8k.wav'
- output_dir = None
- inference_pipline = pipeline(
- task=Tasks.voice_activity_detection,
- model="damo/speech_fsmn_vad_zh-cn-8k-common",
- model_revision='v1.2.0',
- output_dir=output_dir,
- batch_size=1,
- )
- segments_result = inference_pipline(audio_in=audio_in)
- print(segments_result)
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.py b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.py
new file mode 120000
index 000000000..128fc31c2
--- /dev/null
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.py
@@ -0,0 +1 @@
+../../TEMPLATE/infer.py
\ No newline at end of file
diff --git a/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.sh b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.sh
new file mode 120000
index 000000000..5e59f1841
--- /dev/null
+++ b/egs_modelscope/vad/speech_fsmn_vad_zh-cn-8k-common/infer.sh
@@ -0,0 +1 @@
+../../TEMPLATE/infer.sh
\ No newline at end of file
diff --git a/fun_text_processing/inverse_text_normalization/id/taggers/cardinal.py b/fun_text_processing/inverse_text_normalization/id/taggers/cardinal.py
index 6b2fce5ad..539acbc8c 100644
--- a/fun_text_processing/inverse_text_normalization/id/taggers/cardinal.py
+++ b/fun_text_processing/inverse_text_normalization/id/taggers/cardinal.py
@@ -27,7 +27,7 @@ class CardinalFst(GraphFst):
graph_hundreds = pynini.string_file(get_abs_path("data/numbers/hundreds.tsv"))
graph_thousand = pynini.string_file(get_abs_path("data/numbers/thousand.tsv"))
- graph_cents = pynini.cross("seratus", "100") | pynini.cross("ratus", "100") | pynini.union(graph_hundreds, pynutil.insert("00"))
+ graph_cents = pynini.cross("seratus", "100") | pynini.cross("ratus", "100") | pynini.union(graph_hundreds, pynutil.insert("0"))
graph_hundred = pynini.cross("ratus", "") | pynini.cross("seratus", "")
graph_hundred_component = pynini.union(graph_digit + delete_space + graph_hundred, pynutil.insert("00"))
diff --git a/funasr/bin/asr_inference_paraformer_streaming.py b/funasr/bin/asr_inference_paraformer_streaming.py
index 821f69429..bf5590c9b 100644
--- a/funasr/bin/asr_inference_paraformer_streaming.py
+++ b/funasr/bin/asr_inference_paraformer_streaming.py
@@ -8,6 +8,7 @@ import os
import codecs
import tempfile
import requests
+import yaml
from pathlib import Path
from typing import Optional
from typing import Sequence
@@ -40,11 +41,12 @@ from funasr.utils.types import str2bool
from funasr.utils.types import str2triple_str
from funasr.utils.types import str_or_none
from funasr.utils import asr_utils, wav_utils, postprocess_utils
-from funasr.models.frontend.wav_frontend import WavFrontend
-from funasr.models.e2e_asr_paraformer import BiCifParaformer, ContextualParaformer
+from funasr.models.frontend.wav_frontend import WavFrontend, WavFrontendOnline
from funasr.export.models.e2e_asr_paraformer import Paraformer as Paraformer_export
+
np.set_printoptions(threshold=np.inf)
+
class Speech2Text:
"""Speech2Text class
@@ -89,7 +91,7 @@ class Speech2Text:
)
frontend = None
if asr_train_args.frontend is not None and asr_train_args.frontend_conf is not None:
- frontend = WavFrontend(cmvn_file=cmvn_file, **asr_train_args.frontend_conf)
+ frontend = WavFrontendOnline(cmvn_file=cmvn_file, **asr_train_args.frontend_conf)
logging.info("asr_model: {}".format(asr_model))
logging.info("asr_train_args: {}".format(asr_train_args))
@@ -189,8 +191,7 @@ class Speech2Text:
@torch.no_grad()
def __call__(
- self, cache: dict, speech: Union[torch.Tensor, np.ndarray], speech_lengths: Union[torch.Tensor, np.ndarray] = None,
- begin_time: int = 0, end_time: int = None,
+ self, cache: dict, speech: Union[torch.Tensor], speech_lengths: Union[torch.Tensor] = None
):
"""Inference
@@ -201,38 +202,62 @@ class Speech2Text:
"""
assert check_argument_types()
-
- # Input as audio signal
- if isinstance(speech, np.ndarray):
- speech = torch.tensor(speech)
- if self.frontend is not None:
- feats, feats_len = self.frontend.forward(speech, speech_lengths)
- feats = to_device(feats, device=self.device)
- feats_len = feats_len.int()
+ results = []
+ cache_en = cache["encoder"]
+ if speech.shape[1] < 16 * 60 and cache_en["is_final"]:
+ if cache_en["start_idx"] == 0:
+ return []
+ cache_en["tail_chunk"] = True
+ feats = cache_en["feats"]
+ feats_len = torch.tensor([feats.shape[1]])
self.asr_model.frontend = None
+ results = self.infer(feats, feats_len, cache)
+ return results
else:
- feats = speech
- feats_len = speech_lengths
- lfr_factor = max(1, (feats.size()[-1] // 80) - 1)
- feats_len = cache["encoder"]["stride"] + cache["encoder"]["pad_left"] + cache["encoder"]["pad_right"]
- feats = feats[:,cache["encoder"]["start_idx"]:cache["encoder"]["start_idx"]+feats_len,:]
- feats_len = torch.tensor([feats_len])
- batch = {"speech": feats, "speech_lengths": feats_len, "cache": cache}
+ if self.frontend is not None:
+ feats, feats_len = self.frontend.forward(speech, speech_lengths, cache_en["is_final"])
+ feats = to_device(feats, device=self.device)
+ feats_len = feats_len.int()
+ self.asr_model.frontend = None
+ else:
+ feats = speech
+ feats_len = speech_lengths
- # a. To device
+ if feats.shape[1] != 0:
+ if cache_en["is_final"]:
+ if feats.shape[1] + cache_en["chunk_size"][2] < cache_en["chunk_size"][1]:
+ cache_en["last_chunk"] = True
+ else:
+ # first chunk
+ feats_chunk1 = feats[:, :cache_en["chunk_size"][1], :]
+ feats_len = torch.tensor([feats_chunk1.shape[1]])
+ results_chunk1 = self.infer(feats_chunk1, feats_len, cache)
+
+ # last chunk
+ cache_en["last_chunk"] = True
+ feats_chunk2 = feats[:, -(feats.shape[1] + cache_en["chunk_size"][2] - cache_en["chunk_size"][1]):, :]
+ feats_len = torch.tensor([feats_chunk2.shape[1]])
+ results_chunk2 = self.infer(feats_chunk2, feats_len, cache)
+
+ return ["".join(results_chunk1 + results_chunk2)]
+
+ results = self.infer(feats, feats_len, cache)
+
+ return results
+
+ @torch.no_grad()
+ def infer(self, feats: Union[torch.Tensor], feats_len: Union[torch.Tensor], cache: List = None):
+ batch = {"speech": feats, "speech_lengths": feats_len}
batch = to_device(batch, device=self.device)
-
# b. Forward Encoder
- enc, enc_len = self.asr_model.encode_chunk(feats, feats_len, cache)
+ enc, enc_len = self.asr_model.encode_chunk(feats, feats_len, cache=cache)
if isinstance(enc, tuple):
enc = enc[0]
# assert len(enc) == 1, len(enc)
enc_len_batch_total = torch.sum(enc_len).item() * self.encoder_downsampling_factor
predictor_outs = self.asr_model.calc_predictor_chunk(enc, cache)
- pre_acoustic_embeds, pre_token_length, alphas, pre_peak_index = predictor_outs[0], predictor_outs[1], \
- predictor_outs[2], predictor_outs[3]
- pre_token_length = pre_token_length.floor().long()
+ pre_acoustic_embeds, pre_token_length= predictor_outs[0], predictor_outs[1]
if torch.max(pre_token_length) < 1:
return []
decoder_outs = self.asr_model.cal_decoder_with_predictor_chunk(enc, pre_acoustic_embeds, cache)
@@ -279,166 +304,12 @@ class Speech2Text:
text = self.tokenizer.tokens2text(token)
else:
text = None
-
- results.append((text, token, token_int, hyp, enc_len_batch_total, lfr_factor))
+ results.append(text)
# assert check_return_type(results)
return results
-class Speech2TextExport:
- """Speech2TextExport class
-
- """
-
- def __init__(
- self,
- asr_train_config: Union[Path, str] = None,
- asr_model_file: Union[Path, str] = None,
- cmvn_file: Union[Path, str] = None,
- lm_train_config: Union[Path, str] = None,
- lm_file: Union[Path, str] = None,
- token_type: str = None,
- bpemodel: str = None,
- device: str = "cpu",
- maxlenratio: float = 0.0,
- minlenratio: float = 0.0,
- dtype: str = "float32",
- beam_size: int = 20,
- ctc_weight: float = 0.5,
- lm_weight: float = 1.0,
- ngram_weight: float = 0.9,
- penalty: float = 0.0,
- nbest: int = 1,
- frontend_conf: dict = None,
- hotword_list_or_file: str = None,
- **kwargs,
- ):
-
- # 1. Build ASR model
- asr_model, asr_train_args = ASRTask.build_model_from_file(
- asr_train_config, asr_model_file, cmvn_file, device
- )
- frontend = None
- if asr_train_args.frontend is not None and asr_train_args.frontend_conf is not None:
- frontend = WavFrontend(cmvn_file=cmvn_file, **asr_train_args.frontend_conf)
-
- logging.info("asr_model: {}".format(asr_model))
- logging.info("asr_train_args: {}".format(asr_train_args))
- asr_model.to(dtype=getattr(torch, dtype)).eval()
-
- token_list = asr_model.token_list
-
- logging.info(f"Decoding device={device}, dtype={dtype}")
-
- # 5. [Optional] Build Text converter: e.g. bpe-sym -> Text
- if token_type is None:
- token_type = asr_train_args.token_type
- if bpemodel is None:
- bpemodel = asr_train_args.bpemodel
-
- if token_type is None:
- tokenizer = None
- elif token_type == "bpe":
- if bpemodel is not None:
- tokenizer = build_tokenizer(token_type=token_type, bpemodel=bpemodel)
- else:
- tokenizer = None
- else:
- tokenizer = build_tokenizer(token_type=token_type)
- converter = TokenIDConverter(token_list=token_list)
- logging.info(f"Text tokenizer: {tokenizer}")
-
- # self.asr_model = asr_model
- self.asr_train_args = asr_train_args
- self.converter = converter
- self.tokenizer = tokenizer
-
- self.device = device
- self.dtype = dtype
- self.nbest = nbest
- self.frontend = frontend
-
- model = Paraformer_export(asr_model, onnx=False)
- self.asr_model = model
-
- @torch.no_grad()
- def __call__(
- self, speech: Union[torch.Tensor, np.ndarray], speech_lengths: Union[torch.Tensor, np.ndarray] = None
- ):
- """Inference
-
- Args:
- speech: Input speech data
- Returns:
- text, token, token_int, hyp
-
- """
- assert check_argument_types()
-
- # Input as audio signal
- if isinstance(speech, np.ndarray):
- speech = torch.tensor(speech)
-
- if self.frontend is not None:
- feats, feats_len = self.frontend.forward(speech, speech_lengths)
- feats = to_device(feats, device=self.device)
- feats_len = feats_len.int()
- self.asr_model.frontend = None
- else:
- feats = speech
- feats_len = speech_lengths
-
- enc_len_batch_total = feats_len.sum()
- lfr_factor = max(1, (feats.size()[-1] // 80) - 1)
- batch = {"speech": feats, "speech_lengths": feats_len}
-
- # a. To device
- batch = to_device(batch, device=self.device)
-
- decoder_outs = self.asr_model(**batch)
- decoder_out, ys_pad_lens = decoder_outs[0], decoder_outs[1]
-
- results = []
- b, n, d = decoder_out.size()
- for i in range(b):
- am_scores = decoder_out[i, :ys_pad_lens[i], :]
-
- yseq = am_scores.argmax(dim=-1)
- score = am_scores.max(dim=-1)[0]
- score = torch.sum(score, dim=-1)
- # pad with mask tokens to ensure compatibility with sos/eos tokens
- yseq = torch.tensor(
- yseq.tolist(), device=yseq.device
- )
- nbest_hyps = [Hypothesis(yseq=yseq, score=score)]
-
- for hyp in nbest_hyps:
- assert isinstance(hyp, (Hypothesis)), type(hyp)
-
- # remove sos/eos and get results
- last_pos = -1
- if isinstance(hyp.yseq, list):
- token_int = hyp.yseq[1:last_pos]
- else:
- token_int = hyp.yseq[1:last_pos].tolist()
-
- # remove blank symbol id, which is assumed to be 0
- token_int = list(filter(lambda x: x != 0 and x != 2, token_int))
-
- # Change integer-ids to tokens
- token = self.converter.ids2tokens(token_int)
-
- if self.tokenizer is not None:
- text = self.tokenizer.tokens2text(token)
- else:
- text = None
-
- results.append((text, token, token_int, hyp, enc_len_batch_total, lfr_factor))
-
- return results
-
-
def inference(
maxlenratio: float,
minlenratio: float,
@@ -536,8 +407,6 @@ def inference_modelscope(
**kwargs,
):
assert check_argument_types()
- ncpu = kwargs.get("ncpu", 1)
- torch.set_num_threads(ncpu)
if word_lm_train_config is not None:
raise NotImplementedError("Word LM is not implemented")
@@ -580,11 +449,9 @@ def inference_modelscope(
penalty=penalty,
nbest=nbest,
)
- if export_mode:
- speech2text = Speech2TextExport(**speech2text_kwargs)
- else:
- speech2text = Speech2Text(**speech2text_kwargs)
-
+
+ speech2text = Speech2Text(**speech2text_kwargs)
+
def _load_bytes(input):
middle_data = np.frombuffer(input, dtype=np.int16)
middle_data = np.asarray(middle_data)
@@ -599,7 +466,46 @@ def inference_modelscope(
offset = i.min + abs_max
array = np.frombuffer((middle_data.astype(dtype) - offset) / abs_max, dtype=np.float32)
return array
-
+
+ def _read_yaml(yaml_path: Union[str, Path]) -> Dict:
+ if not Path(yaml_path).exists():
+ raise FileExistsError(f'The {yaml_path} does not exist.')
+
+ with open(str(yaml_path), 'rb') as f:
+ data = yaml.load(f, Loader=yaml.Loader)
+ return data
+
+ def _prepare_cache(cache: dict = {}, chunk_size=[5,10,5], batch_size=1):
+ if len(cache) > 0:
+ return cache
+ config = _read_yaml(asr_train_config)
+ enc_output_size = config["encoder_conf"]["output_size"]
+ feats_dims = config["frontend_conf"]["n_mels"] * config["frontend_conf"]["lfr_m"]
+ cache_en = {"start_idx": 0, "cif_hidden": torch.zeros((batch_size, 1, enc_output_size)),
+ "cif_alphas": torch.zeros((batch_size, 1)), "chunk_size": chunk_size, "last_chunk": False,
+ "feats": torch.zeros((batch_size, chunk_size[0] + chunk_size[2], feats_dims)), "tail_chunk": False}
+ cache["encoder"] = cache_en
+
+ cache_de = {"decode_fsmn": None}
+ cache["decoder"] = cache_de
+
+ return cache
+
+ def _cache_reset(cache: dict = {}, chunk_size=[5,10,5], batch_size=1):
+ if len(cache) > 0:
+ config = _read_yaml(asr_train_config)
+ enc_output_size = config["encoder_conf"]["output_size"]
+ feats_dims = config["frontend_conf"]["n_mels"] * config["frontend_conf"]["lfr_m"]
+ cache_en = {"start_idx": 0, "cif_hidden": torch.zeros((batch_size, 1, enc_output_size)),
+ "cif_alphas": torch.zeros((batch_size, 1)), "chunk_size": chunk_size, "last_chunk": False,
+ "feats": torch.zeros((batch_size, chunk_size[0] + chunk_size[2], feats_dims)), "tail_chunk": False}
+ cache["encoder"] = cache_en
+
+ cache_de = {"decode_fsmn": None}
+ cache["decoder"] = cache_de
+
+ return cache
+
def _forward(
data_path_and_name_and_type,
raw_inputs: Union[np.ndarray, torch.Tensor] = None,
@@ -610,123 +516,56 @@ def inference_modelscope(
):
# 3. Build data-iterator
- is_final = False
- cache = {}
- if param_dict is not None and "cache" in param_dict:
- cache = param_dict["cache"]
- if param_dict is not None and "is_final" in param_dict:
- is_final = param_dict["is_final"]
-
if data_path_and_name_and_type is not None and data_path_and_name_and_type[2] == "bytes":
raw_inputs = _load_bytes(data_path_and_name_and_type[0])
raw_inputs = torch.tensor(raw_inputs)
if data_path_and_name_and_type is not None and data_path_and_name_and_type[2] == "sound":
raw_inputs = torchaudio.load(data_path_and_name_and_type[0])[0][0]
- is_final = True
if data_path_and_name_and_type is None and raw_inputs is not None:
if isinstance(raw_inputs, np.ndarray):
raw_inputs = torch.tensor(raw_inputs)
+ is_final = False
+ cache = {}
+ chunk_size = [5, 10, 5]
+ if param_dict is not None and "cache" in param_dict:
+ cache = param_dict["cache"]
+ if param_dict is not None and "is_final" in param_dict:
+ is_final = param_dict["is_final"]
+ if param_dict is not None and "chunk_size" in param_dict:
+ chunk_size = param_dict["chunk_size"]
+
# 7 .Start for-loop
# FIXME(kamo): The output format should be discussed about
+ raw_inputs = torch.unsqueeze(raw_inputs, axis=0)
asr_result_list = []
- results = []
- asr_result = ""
- wait = True
- if len(cache) == 0:
- cache["encoder"] = {"start_idx": 0, "pad_left": 0, "stride": 10, "pad_right": 5, "cif_hidden": None, "cif_alphas": None, "is_final": is_final, "left": 0, "right": 0}
- cache_de = {"decode_fsmn": None}
- cache["decoder"] = cache_de
- cache["first_chunk"] = True
- cache["speech"] = []
- cache["accum_speech"] = 0
-
- if raw_inputs is not None:
- if len(cache["speech"]) == 0:
- cache["speech"] = raw_inputs
- else:
- cache["speech"] = torch.cat([cache["speech"], raw_inputs], dim=0)
- cache["accum_speech"] += len(raw_inputs)
- while cache["accum_speech"] >= 960:
- if cache["first_chunk"]:
- if cache["accum_speech"] >= 14400:
- speech = torch.unsqueeze(cache["speech"], axis=0)
- speech_length = torch.tensor([len(cache["speech"])])
- cache["encoder"]["pad_left"] = 5
- cache["encoder"]["pad_right"] = 5
- cache["encoder"]["stride"] = 10
- cache["encoder"]["left"] = 5
- cache["encoder"]["right"] = 0
- results = speech2text(cache, speech, speech_length)
- cache["accum_speech"] -= 4800
- cache["first_chunk"] = False
- cache["encoder"]["start_idx"] = -5
- cache["encoder"]["is_final"] = False
- wait = False
- else:
- if is_final:
- cache["encoder"]["stride"] = len(cache["speech"]) // 960
- cache["encoder"]["pad_left"] = 0
- cache["encoder"]["pad_right"] = 0
- speech = torch.unsqueeze(cache["speech"], axis=0)
- speech_length = torch.tensor([len(cache["speech"])])
- results = speech2text(cache, speech, speech_length)
- cache["accum_speech"] = 0
- wait = False
- else:
- break
+ cache = _prepare_cache(cache, chunk_size=chunk_size, batch_size=1)
+ item = {}
+ if data_path_and_name_and_type is not None and data_path_and_name_and_type[2] == "sound":
+ sample_offset = 0
+ speech_length = raw_inputs.shape[1]
+ stride_size = chunk_size[1] * 960
+ cache = _prepare_cache(cache, chunk_size=chunk_size, batch_size=1)
+ final_result = ""
+ for sample_offset in range(0, speech_length, min(stride_size, speech_length - sample_offset)):
+ if sample_offset + stride_size >= speech_length - 1:
+ stride_size = speech_length - sample_offset
+ cache["encoder"]["is_final"] = True
else:
- if cache["accum_speech"] >= 19200:
- cache["encoder"]["start_idx"] += 10
- cache["encoder"]["stride"] = 10
- cache["encoder"]["pad_left"] = 5
- cache["encoder"]["pad_right"] = 5
- cache["encoder"]["left"] = 0
- cache["encoder"]["right"] = 0
- speech = torch.unsqueeze(cache["speech"], axis=0)
- speech_length = torch.tensor([len(cache["speech"])])
- results = speech2text(cache, speech, speech_length)
- cache["accum_speech"] -= 9600
- wait = False
- else:
- if is_final:
- cache["encoder"]["is_final"] = True
- if cache["accum_speech"] >= 14400:
- cache["encoder"]["start_idx"] += 10
- cache["encoder"]["stride"] = 10
- cache["encoder"]["pad_left"] = 5
- cache["encoder"]["pad_right"] = 5
- cache["encoder"]["left"] = 0
- cache["encoder"]["right"] = cache["accum_speech"] // 960 - 15
- speech = torch.unsqueeze(cache["speech"], axis=0)
- speech_length = torch.tensor([len(cache["speech"])])
- results = speech2text(cache, speech, speech_length)
- cache["accum_speech"] -= 9600
- wait = False
- else:
- cache["encoder"]["start_idx"] += 10
- cache["encoder"]["stride"] = cache["accum_speech"] // 960 - 5
- cache["encoder"]["pad_left"] = 5
- cache["encoder"]["pad_right"] = 0
- cache["encoder"]["left"] = 0
- cache["encoder"]["right"] = 0
- speech = torch.unsqueeze(cache["speech"], axis=0)
- speech_length = torch.tensor([len(cache["speech"])])
- results = speech2text(cache, speech, speech_length)
- cache["accum_speech"] = 0
- wait = False
- else:
- break
-
- if len(results) >= 1:
- asr_result += results[0][0]
- if asr_result == "":
- asr_result = "sil"
- if wait:
- asr_result = "waiting_for_more_voice"
- item = {'key': "utt", 'value': asr_result}
- asr_result_list.append(item)
+ cache["encoder"]["is_final"] = False
+ input_lens = torch.tensor([stride_size])
+ asr_result = speech2text(cache, raw_inputs[:, sample_offset: sample_offset + stride_size], input_lens)
+ if len(asr_result) != 0:
+ final_result += asr_result[0]
+ item = {'key': "utt", 'value': [final_result]}
else:
- return []
+ input_lens = torch.tensor([raw_inputs.shape[1]])
+ cache["encoder"]["is_final"] = is_final
+ asr_result = speech2text(cache, raw_inputs, input_lens)
+ item = {'key': "utt", 'value': asr_result}
+
+ asr_result_list.append(item)
+ if is_final:
+ cache = _cache_reset(cache, chunk_size=chunk_size, batch_size=1)
return asr_result_list
return _forward
@@ -920,5 +759,3 @@ if __name__ == "__main__":
#
# rec_result = inference_16k_pipline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
# print(rec_result)
-
-
diff --git a/funasr/export/models/CT_Transformer.py b/funasr/export/models/CT_Transformer.py
index 932e3afe6..2319c4abe 100644
--- a/funasr/export/models/CT_Transformer.py
+++ b/funasr/export/models/CT_Transformer.py
@@ -53,7 +53,7 @@ class CT_Transformer(nn.Module):
def get_dummy_inputs(self):
length = 120
- text_indexes = torch.randint(0, self.embed.num_embeddings, (2, length))
+ text_indexes = torch.randint(0, self.embed.num_embeddings, (2, length)).type(torch.int32)
text_lengths = torch.tensor([length-20, length], dtype=torch.int32)
return (text_indexes, text_lengths)
@@ -130,7 +130,7 @@ class CT_Transformer_VadRealtime(nn.Module):
def get_dummy_inputs(self):
length = 120
- text_indexes = torch.randint(0, self.embed.num_embeddings, (1, length))
+ text_indexes = torch.randint(0, self.embed.num_embeddings, (1, length)).type(torch.int32)
text_lengths = torch.tensor([length], dtype=torch.int32)
vad_mask = torch.ones(length, length, dtype=torch.float32)[None, None, :, :]
sub_masks = torch.ones(length, length, dtype=torch.float32)
diff --git a/funasr/models/e2e_asr_paraformer.py b/funasr/models/e2e_asr_paraformer.py
index 699d85fdb..d02783f49 100644
--- a/funasr/models/e2e_asr_paraformer.py
+++ b/funasr/models/e2e_asr_paraformer.py
@@ -712,9 +712,9 @@ class ParaformerOnline(Paraformer):
def calc_predictor_chunk(self, encoder_out, cache=None):
- pre_acoustic_embeds, pre_token_length, alphas, pre_peak_index = \
+ pre_acoustic_embeds, pre_token_length = \
self.predictor.forward_chunk(encoder_out, cache["encoder"])
- return pre_acoustic_embeds, pre_token_length, alphas, pre_peak_index
+ return pre_acoustic_embeds, pre_token_length
def cal_decoder_with_predictor_chunk(self, encoder_out, sematic_embeds, cache=None):
decoder_outs = self.decoder.forward_chunk(
diff --git a/funasr/models/encoder/sanm_encoder.py b/funasr/models/encoder/sanm_encoder.py
index f2502bbb6..2a680114e 100644
--- a/funasr/models/encoder/sanm_encoder.py
+++ b/funasr/models/encoder/sanm_encoder.py
@@ -6,9 +6,11 @@ from typing import Union
import logging
import torch
import torch.nn as nn
+import torch.nn.functional as F
from funasr.modules.streaming_utils.chunk_utilis import overlap_chunk
from typeguard import check_argument_types
import numpy as np
+from funasr.torch_utils.device_funcs import to_device
from funasr.modules.nets_utils import make_pad_mask
from funasr.modules.attention import MultiHeadedAttention, MultiHeadedAttentionSANM, MultiHeadedAttentionSANMwithMask
from funasr.modules.embedding import SinusoidalPositionEncoder, StreamSinusoidalPositionEncoder
@@ -349,6 +351,23 @@ class SANMEncoder(AbsEncoder):
return (xs_pad, intermediate_outs), olens, None
return xs_pad, olens, None
+ def _add_overlap_chunk(self, feats: np.ndarray, cache: dict = {}):
+ if len(cache) == 0:
+ return feats
+ # process last chunk
+ cache["feats"] = to_device(cache["feats"], device=feats.device)
+ overlap_feats = torch.cat((cache["feats"], feats), dim=1)
+ if cache["is_final"]:
+ cache["feats"] = overlap_feats[:, -cache["chunk_size"][0]:, :]
+ if not cache["last_chunk"]:
+ padding_length = sum(cache["chunk_size"]) - overlap_feats.shape[1]
+ overlap_feats = overlap_feats.transpose(1, 2)
+ overlap_feats = F.pad(overlap_feats, (0, padding_length))
+ overlap_feats = overlap_feats.transpose(1, 2)
+ else:
+ cache["feats"] = overlap_feats[:, -(cache["chunk_size"][0] + cache["chunk_size"][2]):, :]
+ return overlap_feats
+
def forward_chunk(self,
xs_pad: torch.Tensor,
ilens: torch.Tensor,
@@ -360,7 +379,10 @@ class SANMEncoder(AbsEncoder):
xs_pad = xs_pad
else:
xs_pad = self.embed(xs_pad, cache)
-
+ if cache["tail_chunk"]:
+ xs_pad = to_device(cache["feats"], device=xs_pad.device)
+ else:
+ xs_pad = self._add_overlap_chunk(xs_pad, cache)
encoder_outs = self.encoders0(xs_pad, None, None, None, None)
xs_pad, masks = encoder_outs[0], encoder_outs[1]
intermediate_outs = []
diff --git a/funasr/models/predictor/cif.py b/funasr/models/predictor/cif.py
index a5273f841..c59e24502 100644
--- a/funasr/models/predictor/cif.py
+++ b/funasr/models/predictor/cif.py
@@ -2,6 +2,7 @@ import torch
from torch import nn
import logging
import numpy as np
+from funasr.torch_utils.device_funcs import to_device
from funasr.modules.nets_utils import make_pad_mask
from funasr.modules.streaming_utils.utils import sequence_mask
@@ -200,7 +201,7 @@ class CifPredictorV2(nn.Module):
return acoustic_embeds, token_num, alphas, cif_peak
def forward_chunk(self, hidden, cache=None):
- b, t, d = hidden.size()
+ batch_size, len_time, hidden_size = hidden.shape
h = hidden
context = h.transpose(1, 2)
queries = self.pad(context)
@@ -211,58 +212,81 @@ class CifPredictorV2(nn.Module):
alphas = torch.nn.functional.relu(alphas * self.smooth_factor - self.noise_threshold)
alphas = alphas.squeeze(-1)
- mask_chunk_predictor = None
- if cache is not None:
- mask_chunk_predictor = None
- mask_chunk_predictor = torch.zeros_like(alphas)
- mask_chunk_predictor[:, cache["pad_left"]:cache["stride"] + cache["pad_left"]] = 1.0
-
- if mask_chunk_predictor is not None:
- alphas = alphas * mask_chunk_predictor
-
- if cache is not None:
- if cache["is_final"]:
- alphas[:, cache["stride"] + cache["pad_left"] - 1] += 0.45
- if cache["cif_hidden"] is not None:
- hidden = torch.cat((cache["cif_hidden"], hidden), 1)
- if cache["cif_alphas"] is not None:
- alphas = torch.cat((cache["cif_alphas"], alphas), -1)
- token_num = alphas.sum(-1)
- acoustic_embeds, cif_peak = cif(hidden, alphas, self.threshold)
- len_time = alphas.size(-1)
- last_fire_place = len_time - 1
- last_fire_remainds = 0.0
- pre_alphas_length = 0
- last_fire = False
-
- mask_chunk_peak_predictor = None
- if cache is not None:
- mask_chunk_peak_predictor = None
- mask_chunk_peak_predictor = torch.zeros_like(cif_peak)
- if cache["cif_alphas"] is not None:
- pre_alphas_length = cache["cif_alphas"].size(-1)
- mask_chunk_peak_predictor[:, :pre_alphas_length] = 1.0
- mask_chunk_peak_predictor[:, pre_alphas_length + cache["pad_left"]:pre_alphas_length + cache["stride"] + cache["pad_left"]] = 1.0
-
- if mask_chunk_peak_predictor is not None:
- cif_peak = cif_peak * mask_chunk_peak_predictor.squeeze(-1)
-
- for i in range(len_time):
- if cif_peak[0][len_time - 1 - i] > self.threshold or cif_peak[0][len_time - 1 - i] == self.threshold:
- last_fire_place = len_time - 1 - i
- last_fire_remainds = cif_peak[0][len_time - 1 - i] - self.threshold
- last_fire = True
- break
- if last_fire:
- last_fire_remainds = torch.tensor([last_fire_remainds], dtype=alphas.dtype).to(alphas.device)
- cache["cif_hidden"] = hidden[:, last_fire_place:, :]
- cache["cif_alphas"] = torch.cat((last_fire_remainds.unsqueeze(0), alphas[:, last_fire_place+1:]), -1)
- else:
- cache["cif_hidden"] = hidden
- cache["cif_alphas"] = alphas
- token_num_int = token_num.floor().type(torch.int32).item()
- return acoustic_embeds[:, 0:token_num_int, :], token_num, alphas, cif_peak
+ token_length = []
+ list_fires = []
+ list_frames = []
+ cache_alphas = []
+ cache_hiddens = []
+
+ if cache is not None and "chunk_size" in cache:
+ alphas[:, :cache["chunk_size"][0]] = 0.0
+ alphas[:, sum(cache["chunk_size"][:2]):] = 0.0
+ if cache is not None and "cif_alphas" in cache and "cif_hidden" in cache:
+ cache["cif_hidden"] = to_device(cache["cif_hidden"], device=hidden.device)
+ cache["cif_alphas"] = to_device(cache["cif_alphas"], device=alphas.device)
+ hidden = torch.cat((cache["cif_hidden"], hidden), dim=1)
+ alphas = torch.cat((cache["cif_alphas"], alphas), dim=1)
+ if cache is not None and "last_chunk" in cache and cache["last_chunk"]:
+ tail_hidden = torch.zeros((batch_size, 1, hidden_size), device=hidden.device)
+ tail_alphas = torch.tensor([[self.tail_threshold]], device=alphas.device)
+ tail_alphas = torch.tile(tail_alphas, (batch_size, 1))
+ hidden = torch.cat((hidden, tail_hidden), dim=1)
+ alphas = torch.cat((alphas, tail_alphas), dim=1)
+
+ len_time = alphas.shape[1]
+ for b in range(batch_size):
+ integrate = 0.0
+ frames = torch.zeros((hidden_size), device=hidden.device)
+ list_frame = []
+ list_fire = []
+ for t in range(len_time):
+ alpha = alphas[b][t]
+ if alpha + integrate < self.threshold:
+ integrate += alpha
+ list_fire.append(integrate)
+ frames += alpha * hidden[b][t]
+ else:
+ frames += (self.threshold - integrate) * hidden[b][t]
+ list_frame.append(frames)
+ integrate += alpha
+ list_fire.append(integrate)
+ integrate -= self.threshold
+ frames = integrate * hidden[b][t]
+
+ cache_alphas.append(integrate)
+ if integrate > 0.0:
+ cache_hiddens.append(frames / integrate)
+ else:
+ cache_hiddens.append(frames)
+
+ token_length.append(torch.tensor(len(list_frame), device=alphas.device))
+ list_fires.append(list_fire)
+ list_frames.append(list_frame)
+
+ cache["cif_alphas"] = torch.stack(cache_alphas, axis=0)
+ cache["cif_alphas"] = torch.unsqueeze(cache["cif_alphas"], axis=0)
+ cache["cif_hidden"] = torch.stack(cache_hiddens, axis=0)
+ cache["cif_hidden"] = torch.unsqueeze(cache["cif_hidden"], axis=0)
+
+ max_token_len = max(token_length)
+ if max_token_len == 0:
+ return hidden, torch.stack(token_length, 0)
+ list_ls = []
+ for b in range(batch_size):
+ pad_frames = torch.zeros((max_token_len - token_length[b], hidden_size), device=alphas.device)
+ if token_length[b] == 0:
+ list_ls.append(pad_frames)
+ else:
+ list_frames[b] = torch.stack(list_frames[b])
+ list_ls.append(torch.cat((list_frames[b], pad_frames), dim=0))
+
+ cache["cif_alphas"] = torch.stack(cache_alphas, axis=0)
+ cache["cif_alphas"] = torch.unsqueeze(cache["cif_alphas"], axis=0)
+ cache["cif_hidden"] = torch.stack(cache_hiddens, axis=0)
+ cache["cif_hidden"] = torch.unsqueeze(cache["cif_hidden"], axis=0)
+ return torch.stack(list_ls, 0), torch.stack(token_length, 0)
+
def tail_process_fn(self, hidden, alphas, token_num=None, mask=None):
b, t, d = hidden.size()
diff --git a/funasr/modules/embedding.py b/funasr/modules/embedding.py
index c347e24f1..aaac80a7d 100644
--- a/funasr/modules/embedding.py
+++ b/funasr/modules/embedding.py
@@ -425,21 +425,14 @@ class StreamSinusoidalPositionEncoder(torch.nn.Module):
return encoding.type(dtype)
def forward(self, x, cache=None):
- start_idx = 0
- pad_left = 0
- pad_right = 0
batch_size, timesteps, input_dim = x.size()
+ start_idx = 0
if cache is not None:
start_idx = cache["start_idx"]
- pad_left = cache["left"]
- pad_right = cache["right"]
+ cache["start_idx"] += timesteps
positions = torch.arange(1, timesteps+start_idx+1)[None, :]
position_encoding = self.encode(positions, input_dim, x.dtype).to(x.device)
- outputs = x + position_encoding[:, start_idx: start_idx + timesteps]
- outputs = outputs.transpose(1, 2)
- outputs = F.pad(outputs, (pad_left, pad_right))
- outputs = outputs.transpose(1, 2)
- return outputs
+ return x + position_encoding[:, start_idx: start_idx + timesteps]
class StreamingRelPositionalEncoding(torch.nn.Module):
"""Relative positional encoding.
diff --git a/funasr/runtime/grpc/CMakeLists.txt b/funasr/runtime/grpc/CMakeLists.txt
index c7727d57c..98c478752 100644
--- a/funasr/runtime/grpc/CMakeLists.txt
+++ b/funasr/runtime/grpc/CMakeLists.txt
@@ -42,17 +42,23 @@ add_custom_command(
"${rg_proto}"
DEPENDS "${rg_proto}")
-
# Include generated *.pb.h files
include_directories("${CMAKE_CURRENT_BINARY_DIR}")
-include_directories(../onnxruntime/include/)
-link_directories(../onnxruntime/build/src/)
-link_directories(../onnxruntime/build/third_party/yaml-cpp/)
-
link_directories(${ONNXRUNTIME_DIR}/lib)
+
+include_directories(${PROJECT_SOURCE_DIR}/../onnxruntime/include/)
+include_directories(${PROJECT_SOURCE_DIR}/../onnxruntime/third_party/yaml-cpp/include/)
+include_directories(${PROJECT_SOURCE_DIR}/../onnxruntime/third_party/kaldi-native-fbank)
+
+add_subdirectory(${PROJECT_SOURCE_DIR}/../onnxruntime/third_party/yaml-cpp yaml-cpp)
+add_subdirectory(${PROJECT_SOURCE_DIR}/../onnxruntime/third_party/kaldi-native-fbank/kaldi-native-fbank/csrc csrc)
add_subdirectory("../onnxruntime/src" onnx_src)
+include_directories(${PROJECT_SOURCE_DIR}/../onnxruntime/third_party/glog)
+set(BUILD_TESTING OFF)
+add_subdirectory(${PROJECT_SOURCE_DIR}/../onnxruntime/third_party/glog glog)
+
# rg_grpc_proto
add_library(rg_grpc_proto
${rg_grpc_srcs}
@@ -60,16 +66,13 @@ add_library(rg_grpc_proto
${rg_proto_srcs}
${rg_proto_hdrs})
-
-
target_link_libraries(rg_grpc_proto
${_REFLECTION}
${_GRPC_GRPCPP}
${_PROTOBUF_LIBPROTOBUF})
-# Targets paraformer_(server)
foreach(_target
- paraformer_server)
+ paraformer-server)
add_executable(${_target}
"${_target}.cc")
target_link_libraries(${_target}
diff --git a/funasr/runtime/grpc/Readme.md b/funasr/runtime/grpc/Readme.md
index 23e618c22..44994412b 100644
--- a/funasr/runtime/grpc/Readme.md
+++ b/funasr/runtime/grpc/Readme.md
@@ -1,18 +1,9 @@
-# Using funasr with grpc-cpp
+# Service with grpc-cpp
## For the Server
### Build [onnxruntime](./onnxruntime_cpp.md) as it's document
-```
-#put onnx-lib & onnx-asr-model into /path/to/asrmodel(eg: /data/asrmodel)
-ls /data/asrmodel/
-onnxruntime-linux-x64-1.14.0 speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
-
-#make sure you have config.yaml, am.mvn, model.onnx(or model_quant.onnx) under speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
-
-```
-
### Compile and install grpc v1.52.0 in case of grpc bugs
```
export GRPC_INSTALL_DIR=/data/soft/grpc
@@ -46,8 +37,39 @@ source ~/.bashrc
### Start grpc paraformer server
```
-Usage: ./cmake/build/paraformer_server port thread_num /path/to/model_file quantize(true or false)
-./cmake/build/paraformer_server 10108 4 /data/asrmodel/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch false
+./cmake/build/paraformer-server --port-id [--punc-config
+ ] [--punc-model ]
+ --am-config --am-cmvn
+ --am-model [--vad-config
+ ] [--vad-cmvn ]
+ [--vad-model ] [--] [--version]
+ [-h]
+Where:
+ --port-id
+ (required) port id
+
+ --am-config
+ (required) am config path
+ --am-cmvn
+ (required) am cmvn path
+ --am-model
+ (required) am model path
+
+ --punc-config
+ punc config path
+ --punc-model
+ punc model path
+
+ --vad-config
+ vad config path
+ --vad-cmvn
+ vad cmvn path
+ --vad-model
+ vad model path
+
+ Required: --port-id --am-config --am-cmvn --am-model
+ If use vad, please add: [--vad-config ] [--vad-cmvn ] [--vad-model ]
+ If use punc, please add: [--punc-config ] [--punc-model ]
```
## For the client
diff --git a/funasr/runtime/grpc/paraformer_server.cc b/funasr/runtime/grpc/paraformer-server.cc
similarity index 65%
rename from funasr/runtime/grpc/paraformer_server.cc
rename to funasr/runtime/grpc/paraformer-server.cc
index 2893d4cfb..31333c9eb 100644
--- a/funasr/runtime/grpc/paraformer_server.cc
+++ b/funasr/runtime/grpc/paraformer-server.cc
@@ -13,7 +13,10 @@
#include
#include "paraformer.grpc.pb.h"
-#include "paraformer_server.h"
+#include "paraformer-server.h"
+#include "tclap/CmdLine.h"
+#include "com-define.h"
+#include "glog/logging.h"
using grpc::Server;
using grpc::ServerBuilder;
@@ -27,31 +30,43 @@ using paraformer::Request;
using paraformer::Response;
using paraformer::ASR;
-ASRServicer::ASRServicer(const char* model_path, int thread_num, bool quantize) {
- AsrHanlde=FunASRInit(model_path, thread_num, quantize);
+ASRServicer::ASRServicer(std::map& model_path) {
+ AsrHanlde=FunASRInit(model_path, 1);
std::cout << "ASRServicer init" << std::endl;
init_flag = 0;
}
+void ASRServicer::clear_states(const std::string& user) {
+ clear_buffers(user);
+ clear_transcriptions(user);
+}
+
+void ASRServicer::clear_buffers(const std::string& user) {
+ if (client_buffers.count(user)) {
+ client_buffers.erase(user);
+ }
+}
+
+void ASRServicer::clear_transcriptions(const std::string& user) {
+ if (client_transcription.count(user)) {
+ client_transcription.erase(user);
+ }
+}
+
+void ASRServicer::disconnect(const std::string& user) {
+ clear_states(user);
+ std::cout << "Disconnecting user: " << user << std::endl;
+}
+
grpc::Status ASRServicer::Recognize(
grpc::ServerContext* context,
grpc::ServerReaderWriter* stream) {
Request req;
- std::unordered_map client_buffers;
- std::unordered_map client_transcription;
-
while (stream->Read(&req)) {
if (req.isend()) {
std::cout << "asr end" << std::endl;
- // disconnect
- if (client_buffers.count(req.user())) {
- client_buffers.erase(req.user());
- }
- if (client_transcription.count(req.user())) {
- client_transcription.erase(req.user());
- }
-
+ disconnect(req.user());
Response res;
res.set_sentence(
R"({"success": true, "detail": "asr end"})"
@@ -89,14 +104,8 @@ grpc::Status ASRServicer::Recognize(
auto& buf = client_buffers[req.user()];
buf.insert(buf.end(), req.audio_data().begin(), req.audio_data().end());
}
- std::string tmp_data = client_buffers[req.user()];
- // clear_states
- if (client_buffers.count(req.user())) {
- client_buffers.erase(req.user());
- }
- if (client_transcription.count(req.user())) {
- client_transcription.erase(req.user());
- }
+ std::string tmp_data = this->client_buffers[req.user()];
+ this->clear_states(req.user());
Response res;
res.set_sentence(
@@ -161,10 +170,17 @@ grpc::Status ASRServicer::Recognize(
return Status::OK;
}
-void RunServer(const std::string& port, int thread_num, const char* model_path, bool quantize) {
+void RunServer(std::map& model_path) {
+ std::string port;
+ try{
+ port = model_path.at(PORT_ID);
+ }catch(std::exception const &e){
+ printf("Error when read port.\n");
+ exit(0);
+ }
std::string server_address;
server_address = "0.0.0.0:" + port;
- ASRServicer service(model_path, thread_num, quantize);
+ ASRServicer service(model_path);
ServerBuilder builder;
builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
@@ -174,16 +190,54 @@ void RunServer(const std::string& port, int thread_num, const char* model_path,
server->Wait();
}
-int main(int argc, char* argv[]) {
- if (argc < 5)
- {
- printf("Usage: %s port thread_num /path/to/model_file quantize(true or false) \n", argv[0]);
- exit(-1);
+void GetValue(TCLAP::ValueArg& value_arg, std::string key, std::map& model_path)
+{
+ if (value_arg.isSet()){
+ model_path.insert({key, value_arg.getValue()});
+ LOG(INFO)<< key << " : " << value_arg.getValue();
}
+}
- // is quantize
- bool quantize = false;
- std::istringstream(argv[4]) >> std::boolalpha >> quantize;
- RunServer(argv[1], atoi(argv[2]), argv[3], quantize);
+int main(int argc, char* argv[]) {
+
+ google::InitGoogleLogging(argv[0]);
+ FLAGS_logtostderr = true;
+
+ TCLAP::CmdLine cmd("paraformer-server", ' ', "1.0");
+ TCLAP::ValueArg vad_model("", VAD_MODEL_PATH, "vad model path", false, "", "string");
+ TCLAP::ValueArg vad_cmvn("", VAD_CMVN_PATH, "vad cmvn path", false, "", "string");
+ TCLAP::ValueArg vad_config("", VAD_CONFIG_PATH, "vad config path", false, "", "string");
+
+ TCLAP::ValueArg am_model("", AM_MODEL_PATH, "am model path", true, "", "string");
+ TCLAP::ValueArg am_cmvn("", AM_CMVN_PATH, "am cmvn path", true, "", "string");
+ TCLAP::ValueArg am_config("", AM_CONFIG_PATH, "am config path", true, "", "string");
+
+ TCLAP::ValueArg punc_model("", PUNC_MODEL_PATH, "punc model path", false, "", "string");
+ TCLAP::ValueArg punc_config("", PUNC_CONFIG_PATH, "punc config path", false, "", "string");
+ TCLAP::ValueArg port_id("", PORT_ID, "port id", true, "", "string");
+
+ cmd.add(vad_model);
+ cmd.add(vad_cmvn);
+ cmd.add(vad_config);
+ cmd.add(am_model);
+ cmd.add(am_cmvn);
+ cmd.add(am_config);
+ cmd.add(punc_model);
+ cmd.add(punc_config);
+ cmd.add(port_id);
+ cmd.parse(argc, argv);
+
+ std::map model_path;
+ GetValue(vad_model, VAD_MODEL_PATH, model_path);
+ GetValue(vad_cmvn, VAD_CMVN_PATH, model_path);
+ GetValue(vad_config, VAD_CONFIG_PATH, model_path);
+ GetValue(am_model, AM_MODEL_PATH, model_path);
+ GetValue(am_cmvn, AM_CMVN_PATH, model_path);
+ GetValue(am_config, AM_CONFIG_PATH, model_path);
+ GetValue(punc_model, PUNC_MODEL_PATH, model_path);
+ GetValue(punc_config, PUNC_CONFIG_PATH, model_path);
+ GetValue(port_id, PORT_ID, model_path);
+
+ RunServer(model_path);
return 0;
}
diff --git a/funasr/runtime/grpc/paraformer_server.h b/funasr/runtime/grpc/paraformer-server.h
similarity index 70%
rename from funasr/runtime/grpc/paraformer_server.h
rename to funasr/runtime/grpc/paraformer-server.h
index dba1e45c2..108e3b688 100644
--- a/funasr/runtime/grpc/paraformer_server.h
+++ b/funasr/runtime/grpc/paraformer-server.h
@@ -37,13 +37,18 @@ typedef struct
float snippet_time;
}FUNASR_RECOG_RESULT;
-
class ASRServicer final : public ASR::Service {
private:
int init_flag;
+ std::unordered_map client_buffers;
+ std::unordered_map client_transcription;
public:
- ASRServicer(const char* model_path, int thread_num, bool quantize);
+ ASRServicer(std::map& model_path);
+ void clear_states(const std::string& user);
+ void clear_buffers(const std::string& user);
+ void clear_transcriptions(const std::string& user);
+ void disconnect(const std::string& user);
grpc::Status Recognize(grpc::ServerContext* context, grpc::ServerReaderWriter* stream);
FUNASR_HANDLE AsrHanlde;
diff --git a/funasr/runtime/onnxruntime/CMakeLists.txt b/funasr/runtime/onnxruntime/CMakeLists.txt
index 25b816f98..9f6013f76 100644
--- a/funasr/runtime/onnxruntime/CMakeLists.txt
+++ b/funasr/runtime/onnxruntime/CMakeLists.txt
@@ -38,5 +38,4 @@ if(ENABLE_GLOG)
include_directories(${PROJECT_SOURCE_DIR}/third_party/glog)
set(BUILD_TESTING OFF)
add_subdirectory(third_party/glog)
-endif()
-
+endif()
\ No newline at end of file
diff --git a/funasr/runtime/onnxruntime/include/com-define.h b/funasr/runtime/onnxruntime/include/com-define.h
index 8c885178e..9b7b212b7 100644
--- a/funasr/runtime/onnxruntime/include/com-define.h
+++ b/funasr/runtime/onnxruntime/include/com-define.h
@@ -24,6 +24,7 @@
#define WAV_PATH "wav-path"
#define WAV_SCP "wav-scp"
#define THREAD_NUM "thread-num"
+#define PORT_ID "port-id"
// vad
#ifndef VAD_SILENCE_DURATION
diff --git a/funasr/runtime/onnxruntime/include/libfunasrapi.h b/funasr/runtime/onnxruntime/include/libfunasrapi.h
index 8dca7f4d6..f65efccfc 100644
--- a/funasr/runtime/onnxruntime/include/libfunasrapi.h
+++ b/funasr/runtime/onnxruntime/include/libfunasrapi.h
@@ -47,10 +47,9 @@ typedef enum {
typedef void (* QM_CALLBACK)(int cur_step, int n_total); // n_total: total steps; cur_step: Current Step.
-// APIs for funasr
+// // ASR
_FUNASRAPI FUNASR_HANDLE FunASRInit(std::map& model_path, int thread_num);
-// if not give a fn_callback ,it should be NULL
_FUNASRAPI FUNASR_RESULT FunASRRecogBuffer(FUNASR_HANDLE handle, const char* sz_buf, int n_len, FUNASR_MODE mode, QM_CALLBACK fn_callback);
_FUNASRAPI FUNASR_RESULT FunASRRecogPCMBuffer(FUNASR_HANDLE handle, const char* sz_buf, int n_len, int sampling_rate, FUNASR_MODE mode, QM_CALLBACK fn_callback);
_FUNASRAPI FUNASR_RESULT FunASRRecogPCMFile(FUNASR_HANDLE handle, const char* sz_filename, int sampling_rate, FUNASR_MODE mode, QM_CALLBACK fn_callback);
@@ -62,6 +61,14 @@ _FUNASRAPI void FunASRFreeResult(FUNASR_RESULT result);
_FUNASRAPI void FunASRUninit(FUNASR_HANDLE handle);
_FUNASRAPI const float FunASRGetRetSnippetTime(FUNASR_RESULT result);
+// VAD
+_FUNASRAPI FUNASR_HANDLE FunVadInit(std::map& model_path, int thread_num);
+
+_FUNASRAPI FUNASR_RESULT FunASRVadBuffer(FUNASR_HANDLE handle, const char* sz_buf, int n_len, FUNASR_MODE mode, QM_CALLBACK fn_callback);
+_FUNASRAPI FUNASR_RESULT FunASRVadPCMBuffer(FUNASR_HANDLE handle, const char* sz_buf, int n_len, int sampling_rate, FUNASR_MODE mode, QM_CALLBACK fn_callback);
+_FUNASRAPI FUNASR_RESULT FunASRVadPCMFile(FUNASR_HANDLE handle, const char* sz_filename, int sampling_rate, FUNASR_MODE mode, QM_CALLBACK fn_callback);
+_FUNASRAPI FUNASR_RESULT FunASRVadFile(FUNASR_HANDLE handle, const char* sz_wavfile, FUNASR_MODE mode, QM_CALLBACK fn_callback);
+
#ifdef __cplusplus
}
diff --git a/funasr/runtime/onnxruntime/readme.md b/funasr/runtime/onnxruntime/readme.md
index 95840e530..436c7df69 100644
--- a/funasr/runtime/onnxruntime/readme.md
+++ b/funasr/runtime/onnxruntime/readme.md
@@ -4,9 +4,10 @@
### Install [modelscope and funasr](https://github.com/alibaba-damo-academy/FunASR#installation)
```shell
-pip3 install torch torchaudio
-pip install -U modelscope
-pip install -U funasr
+# pip3 install torch torchaudio
+pip install -U modelscope funasr
+# For the users in China, you could install with the command:
+# pip install -U modelscope funasr -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html -i https://mirror.sjtu.edu.cn/pypi/web/simple
```
### Export [onnx model](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export)
@@ -79,5 +80,6 @@ Where:
## Acknowledge
1. This project is maintained by [FunASR community](https://github.com/alibaba-damo-academy/FunASR).
-2. We acknowledge [mayong](https://github.com/RapidAI/RapidASR/tree/main/cpp_onnx) for contributing the onnxruntime(cpp api).
-3. We borrowed a lot of code from [FastASR](https://github.com/chenkui164/FastASR) for audio frontend and text-postprocess.
+2. We acknowledge mayong for contributing the onnxruntime of Paraformer and CT_Transformer, [repo-asr](https://github.com/RapidAI/RapidASR/tree/main/cpp_onnx), [repo-punc](https://github.com/RapidAI/RapidPunc).
+3. We acknowledge [ChinaTelecom](https://github.com/zhuzizyf/damo-fsmn-vad-infer-httpserver) for contributing the VAD runtime.
+4. We borrowed a lot of code from [FastASR](https://github.com/chenkui164/FastASR) for audio frontend and text-postprocess.
diff --git a/funasr/runtime/onnxruntime/src/CMakeLists.txt b/funasr/runtime/onnxruntime/src/CMakeLists.txt
index 28a67b4be..d33c5402f 100644
--- a/funasr/runtime/onnxruntime/src/CMakeLists.txt
+++ b/funasr/runtime/onnxruntime/src/CMakeLists.txt
@@ -28,5 +28,4 @@ target_link_libraries(funasr PUBLIC onnxruntime ${EXTRA_LIBS})
add_executable(funasr-onnx-offline "funasr-onnx-offline.cpp")
add_executable(funasr-onnx-offline-rtf "funasr-onnx-offline-rtf.cpp")
target_link_libraries(funasr-onnx-offline PUBLIC funasr)
-target_link_libraries(funasr-onnx-offline-rtf PUBLIC funasr)
-
+target_link_libraries(funasr-onnx-offline-rtf PUBLIC funasr)
\ No newline at end of file
diff --git a/funasr/runtime/onnxruntime/src/audio.cpp b/funasr/runtime/onnxruntime/src/audio.cpp
index d104500d1..8f46a4f63 100644
--- a/funasr/runtime/onnxruntime/src/audio.cpp
+++ b/funasr/runtime/onnxruntime/src/audio.cpp
@@ -380,8 +380,10 @@ bool Audio::LoadPcmwav(const char* filename, int32_t* sampling_rate)
FILE* fp;
fp = fopen(filename, "rb");
if (fp == nullptr)
+ {
LOG(ERROR) << "Failed to read " << filename;
return false;
+ }
fseek(fp, 0, SEEK_END);
uint32_t n_file_len = ftell(fp);
fseek(fp, 0, SEEK_SET);
@@ -517,4 +519,4 @@ void Audio::Split(Model* recog_obj)
frame_queue.push(frame);
frame = NULL;
}
-}
+}
\ No newline at end of file
diff --git a/funasr/runtime/onnxruntime/src/e2e-vad.h b/funasr/runtime/onnxruntime/src/e2e-vad.h
index 90f2635f6..02bae6296 100644
--- a/funasr/runtime/onnxruntime/src/e2e-vad.h
+++ b/funasr/runtime/onnxruntime/src/e2e-vad.h
@@ -1,6 +1,7 @@
/**
* Copyright FunASR (https://github.com/alibaba-damo-academy/FunASR). All Rights Reserved.
* MIT License (https://opensource.org/licenses/MIT)
+ * Collaborators: zhuzizyf(China Telecom Shanghai)
*/
#include
@@ -381,10 +382,11 @@ private:
int max_end_sil_frame_cnt_thresh;
float speech_noise_thres;
std::vector> scores;
+ int idx_pre_chunk = 0;
bool max_time_out;
std::vector decibel;
- std::vector data_buf;
- std::vector data_buf_all;
+ int data_buf_size = 0;
+ int data_buf_all_size = 0;
std::vector waveform;
void AllResetDetection() {
@@ -409,10 +411,11 @@ private:
max_end_sil_frame_cnt_thresh = vad_opts.max_end_silence_time - vad_opts.speech_to_sil_time_thres;
speech_noise_thres = vad_opts.speech_noise_thres;
scores.clear();
+ idx_pre_chunk = 0;
max_time_out = false;
decibel.clear();
- data_buf.clear();
- data_buf_all.clear();
+ int data_buf_size = 0;
+ int data_buf_all_size = 0;
waveform.clear();
ResetDetection();
}
@@ -432,18 +435,17 @@ private:
void ComputeDecibel() {
int frame_sample_length = int(vad_opts.frame_length_ms * vad_opts.sample_rate / 1000);
int frame_shift_length = int(vad_opts.frame_in_ms * vad_opts.sample_rate / 1000);
- if (data_buf_all.empty()) {
- data_buf_all = waveform;
- data_buf = data_buf_all;
+ if (data_buf_all_size == 0) {
+ data_buf_all_size = waveform.size();
+ data_buf_size = data_buf_all_size;
} else {
- data_buf_all.insert(data_buf_all.end(), waveform.begin(), waveform.end());
+ data_buf_all_size += waveform.size();
}
for (int offset = 0; offset < waveform.size() - frame_sample_length + 1; offset += frame_shift_length) {
float sum = 0.0;
for (int i = 0; i < frame_sample_length; i++) {
sum += waveform[offset + i] * waveform[offset + i];
}
-// float decibel = 10 * log10(sum + 0.000001);
this->decibel.push_back(10 * log10(sum + 0.000001));
}
}
@@ -451,30 +453,17 @@ private:
void ComputeScores(const std::vector> &scores) {
vad_opts.nn_eval_block_size = scores.size();
frm_cnt += scores.size();
- if (this->scores.empty()) {
- this->scores = scores; // the first calculation
- } else {
- this->scores.insert(this->scores.end(), scores.begin(), scores.end());
- }
+ this->scores = scores;
}
void PopDataBufTillFrame(int frame_idx) {
int frame_sample_length = int(vad_opts.frame_in_ms * vad_opts.sample_rate / 1000);
- int start_pos=-1;
- int data_length= data_buf.size();
while (data_buf_start_frame < frame_idx) {
- if (data_length >= frame_sample_length) {
+ if (data_buf_size >= frame_sample_length) {
data_buf_start_frame += 1;
- start_pos= data_buf_start_frame* frame_sample_length;
- data_length=data_buf_all.size()-start_pos;
- } else {
- break;
+ data_buf_size = data_buf_all_size - data_buf_start_frame * frame_sample_length;
}
}
- if (start_pos!=-1){
- data_buf.resize(data_length);
- std::copy(data_buf_all.begin() + start_pos, data_buf_all.end(), data_buf.begin());
- }
}
void PopDataToOutputBuf(int start_frm, int frm_cnt, bool first_frm_is_start_point, bool last_frm_is_end_point,
@@ -487,9 +476,9 @@ private:
expected_sample_number += int(extra_sample);
}
if (end_point_is_sent_end) {
- expected_sample_number = std::max(expected_sample_number, int(data_buf.size()));
+ expected_sample_number = std::max(expected_sample_number, data_buf_size);
}
- if (data_buf.size() < expected_sample_number) {
+ if (data_buf_size < expected_sample_number) {
std::cout << "error in calling pop data_buf\n";
}
if (output_data_buf.size() == 0 || first_frm_is_start_point) {
@@ -510,10 +499,10 @@ private:
} else {
data_to_pop = int(frm_cnt * vad_opts.frame_in_ms * vad_opts.sample_rate / 1000);
}
- if (data_to_pop > int(data_buf.size())) {
+ if (data_to_pop > data_buf_size) {
std::cout << "VAD data_to_pop is bigger than data_buf.size()!!!\n";
- data_to_pop = (int) data_buf.size();
- expected_sample_number = (int) data_buf.size();
+ data_to_pop = data_buf_size;
+ expected_sample_number = data_buf_size;
}
cur_seg.doa = 0;
for (int sample_cpy_out = 0; sample_cpy_out < data_to_pop; sample_cpy_out++) {
@@ -619,7 +608,7 @@ private:
if (sil_pdf_ids.size() > 0) {
std::vector sil_pdf_scores;
for (auto sil_pdf_id: sil_pdf_ids) {
- sil_pdf_scores.push_back(scores[t][sil_pdf_id]);
+ sil_pdf_scores.push_back(scores[t - idx_pre_chunk][sil_pdf_id]);
}
sum_score = accumulate(sil_pdf_scores.begin(), sil_pdf_scores.end(), 0.0);
noise_prob = log(sum_score) * vad_opts.speech_2_noise_ratio;
@@ -663,6 +652,7 @@ private:
frame_state = GetFrameState(frm_cnt - 1 - i);
DetectOneFrame(frame_state, frm_cnt - 1 - i, false);
}
+ idx_pre_chunk += scores.size();
return 0;
}
diff --git a/funasr/runtime/onnxruntime/src/libfunasrapi.cpp b/funasr/runtime/onnxruntime/src/libfunasrapi.cpp
index 93434bb73..01aa38a8c 100644
--- a/funasr/runtime/onnxruntime/src/libfunasrapi.cpp
+++ b/funasr/runtime/onnxruntime/src/libfunasrapi.cpp
@@ -11,6 +11,12 @@ extern "C" {
return mm;
}
+ _FUNASRAPI FUNASR_HANDLE FunVadInit(std::map& model_path, int thread_num)
+ {
+ Model* mm = CreateModel(model_path, thread_num);
+ return mm;
+ }
+
_FUNASRAPI FUNASR_RESULT FunASRRecogBuffer(FUNASR_HANDLE handle, const char* sz_buf, int n_len, FUNASR_MODE mode, QM_CALLBACK fn_callback)
{
Model* recog_obj = (Model*)handle;
diff --git a/funasr/runtime/onnxruntime/src/precomp.h b/funasr/runtime/onnxruntime/src/precomp.h
index cf69ad976..1630e55fb 100644
--- a/funasr/runtime/onnxruntime/src/precomp.h
+++ b/funasr/runtime/onnxruntime/src/precomp.h
@@ -21,8 +21,8 @@ using namespace std;
// third part
#include "onnxruntime_run_options_config_keys.h"
#include "onnxruntime_cxx_api.h"
-#include
-#include
+#include "kaldi-native-fbank/csrc/feature-fbank.h"
+#include "kaldi-native-fbank/csrc/online-feature.h"
// mine
#include
@@ -40,6 +40,7 @@ using namespace std;
#include "util.h"
#include "resample.h"
#include "model.h"
+//#include "vad-model.h"
#include "paraformer.h"
#include "libfunasrapi.h"
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.bazelci/presubmit.yml b/funasr/runtime/onnxruntime/third_party/glog/.bazelci/presubmit.yml
deleted file mode 100644
index 04a538507..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.bazelci/presubmit.yml
+++ /dev/null
@@ -1,62 +0,0 @@
----
-tasks:
- ubuntu1804:
- name: "Ubuntu 18.04"
- platform: ubuntu1804
- build_flags:
- - "--features=layering_check"
- - "--copt=-Werror"
- build_targets:
- - "//..."
- test_flags:
- - "--features=layering_check"
- - "--copt=-Werror"
- test_targets:
- - "//..."
- macos:
- name: "macOS: latest Xcode"
- platform: macos
- build_flags:
- - "--features=layering_check"
- - "--copt=-Werror"
- build_targets:
- - "//..."
- test_flags:
- - "--features=layering_check"
- - "--copt=-Werror"
- test_targets:
- - "//..."
- windows-msvc:
- name: "Windows: MSVC 2017"
- platform: windows
- environment:
- BAZEL_VC: "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\BuildTools\\VC"
- build_flags:
- - "--features=layering_check"
- - "--copt=/WX"
- build_targets:
- - "//..."
- test_flags:
- - "--features=layering_check"
- - "--copt=/WX"
- test_targets:
- - "//..."
- windows-clang-cl:
- name: "Windows: Clang"
- platform: windows
- environment:
- BAZEL_VC: "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\BuildTools\\VC"
- build_flags:
- - "--extra_toolchains=@local_config_cc//:cc-toolchain-x64_windows-clang-cl"
- - "--extra_execution_platforms=//:x64_windows-clang-cl"
- - "--compiler=clang-cl"
- - "--features=layering_check"
- build_targets:
- - "//..."
- test_flags:
- - "--extra_toolchains=@local_config_cc//:cc-toolchain-x64_windows-clang-cl"
- - "--extra_execution_platforms=//:x64_windows-clang-cl"
- - "--compiler=clang-cl"
- - "--features=layering_check"
- test_targets:
- - "//..."
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.clang-format b/funasr/runtime/onnxruntime/third_party/glog/.clang-format
deleted file mode 100644
index 039559d63..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.clang-format
+++ /dev/null
@@ -1,168 +0,0 @@
----
-Language: Cpp
-# BasedOnStyle: Google
-AccessModifierOffset: -1
-AlignAfterOpenBracket: Align
-AlignConsecutiveMacros: false
-AlignConsecutiveAssignments: false
-AlignConsecutiveDeclarations: false
-AlignEscapedNewlines: Left
-AlignOperands: true
-AlignTrailingComments: true
-AllowAllArgumentsOnNextLine: true
-AllowAllConstructorInitializersOnNextLine: true
-AllowAllParametersOfDeclarationOnNextLine: true
-AllowShortBlocksOnASingleLine: Never
-AllowShortCaseLabelsOnASingleLine: false
-AllowShortFunctionsOnASingleLine: All
-AllowShortLambdasOnASingleLine: All
-AllowShortIfStatementsOnASingleLine: WithoutElse
-AllowShortLoopsOnASingleLine: true
-AlwaysBreakAfterDefinitionReturnType: None
-AlwaysBreakAfterReturnType: None
-AlwaysBreakBeforeMultilineStrings: true
-AlwaysBreakTemplateDeclarations: Yes
-BinPackArguments: true
-BinPackParameters: true
-BraceWrapping:
- AfterCaseLabel: false
- AfterClass: false
- AfterControlStatement: false
- AfterEnum: false
- AfterFunction: false
- AfterNamespace: false
- AfterObjCDeclaration: false
- AfterStruct: false
- AfterUnion: false
- AfterExternBlock: false
- BeforeCatch: false
- BeforeElse: false
- IndentBraces: false
- SplitEmptyFunction: true
- SplitEmptyRecord: true
- SplitEmptyNamespace: true
-BreakBeforeBinaryOperators: None
-BreakBeforeBraces: Attach
-BreakBeforeInheritanceComma: false
-BreakInheritanceList: BeforeColon
-BreakBeforeTernaryOperators: true
-BreakConstructorInitializersBeforeComma: false
-BreakConstructorInitializers: BeforeColon
-BreakAfterJavaFieldAnnotations: false
-BreakStringLiterals: true
-ColumnLimit: 80
-CommentPragmas: '^ IWYU pragma:'
-CompactNamespaces: false
-ConstructorInitializerAllOnOneLineOrOnePerLine: true
-ConstructorInitializerIndentWidth: 4
-ContinuationIndentWidth: 4
-Cpp11BracedListStyle: true
-DeriveLineEnding: true
-DerivePointerAlignment: true
-DisableFormat: false
-ExperimentalAutoDetectBinPacking: false
-FixNamespaceComments: true
-ForEachMacros:
- - foreach
- - Q_FOREACH
- - BOOST_FOREACH
-IncludeBlocks: Regroup
-IncludeCategories:
- - Regex: '^'
- Priority: 2
- SortPriority: 0
- - Regex: '^<.*\.h>'
- Priority: 1
- SortPriority: 0
- - Regex: '^<.*'
- Priority: 2
- SortPriority: 0
- - Regex: '.*'
- Priority: 3
- SortPriority: 0
-IncludeIsMainRegex: '([-_](test|unittest))?$'
-IncludeIsMainSourceRegex: ''
-IndentCaseLabels: true
-IndentGotoLabels: true
-IndentPPDirectives: None
-IndentWidth: 2
-IndentWrappedFunctionNames: false
-JavaScriptQuotes: Leave
-JavaScriptWrapImports: true
-KeepEmptyLinesAtTheStartOfBlocks: false
-MacroBlockBegin: ''
-MacroBlockEnd: ''
-MaxEmptyLinesToKeep: 1
-NamespaceIndentation: None
-ObjCBinPackProtocolList: Never
-ObjCBlockIndentWidth: 2
-ObjCSpaceAfterProperty: false
-ObjCSpaceBeforeProtocolList: true
-PenaltyBreakAssignment: 2
-PenaltyBreakBeforeFirstCallParameter: 1
-PenaltyBreakComment: 300
-PenaltyBreakFirstLessLess: 120
-PenaltyBreakString: 1000
-PenaltyBreakTemplateDeclaration: 10
-PenaltyExcessCharacter: 1000000
-PenaltyReturnTypeOnItsOwnLine: 200
-PointerAlignment: Left
-RawStringFormats:
- - Language: Cpp
- Delimiters:
- - cc
- - CC
- - cpp
- - Cpp
- - CPP
- - 'c++'
- - 'C++'
- CanonicalDelimiter: ''
- BasedOnStyle: google
- - Language: TextProto
- Delimiters:
- - pb
- - PB
- - proto
- - PROTO
- EnclosingFunctions:
- - EqualsProto
- - EquivToProto
- - PARSE_PARTIAL_TEXT_PROTO
- - PARSE_TEST_PROTO
- - PARSE_TEXT_PROTO
- - ParseTextOrDie
- - ParseTextProtoOrDie
- CanonicalDelimiter: ''
- BasedOnStyle: google
-ReflowComments: true
-SortIncludes: true
-SortUsingDeclarations: true
-SpaceAfterCStyleCast: false
-SpaceAfterLogicalNot: false
-SpaceAfterTemplateKeyword: true
-SpaceBeforeAssignmentOperators: true
-SpaceBeforeCpp11BracedList: false
-SpaceBeforeCtorInitializerColon: true
-SpaceBeforeInheritanceColon: true
-SpaceBeforeParens: ControlStatements
-SpaceBeforeRangeBasedForLoopColon: true
-SpaceInEmptyBlock: false
-SpaceInEmptyParentheses: false
-SpacesBeforeTrailingComments: 2
-SpacesInAngles: false
-SpacesInConditionalStatement: false
-SpacesInContainerLiterals: true
-SpacesInCStyleCastParentheses: false
-SpacesInParentheses: false
-SpacesInSquareBrackets: false
-SpaceBeforeSquareBrackets: false
-Standard: c++14
-StatementMacros:
- - Q_UNUSED
- - QT_REQUIRE_VERSION
-TabWidth: 8
-UseCRLF: false
-UseTab: Never
-...
-
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.clang-tidy b/funasr/runtime/onnxruntime/third_party/glog/.clang-tidy
deleted file mode 100644
index 1f4ea16fd..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.clang-tidy
+++ /dev/null
@@ -1,59 +0,0 @@
----
-Checks: 'clang-diagnostic-*,clang-analyzer-*,google-*,modernize-*,-modernize-use-trailing-return-type,readability-*,portability-*,performance-*,bugprone-*,android-*,darwin-*,clang-analyzer-*'
-WarningsAsErrors: ''
-HeaderFilterRegex: ''
-AnalyzeTemporaryDtors: false
-FormatStyle: file
-CheckOptions:
- - key: cert-dcl16-c.NewSuffixes
- value: 'L;LL;LU;LLU'
- - key: cert-oop54-cpp.WarnOnlyIfThisHasSuspiciousField
- value: '0'
- - key: cppcoreguidelines-explicit-virtual-functions.IgnoreDestructors
- value: '1'
- - key: cppcoreguidelines-non-private-member-variables-in-classes.IgnoreClassesWithAllMemberVariablesBeingPublic
- value: '1'
- - key: google-build-namespaces.HeaderFileExtensions
- value: ',h,hh,hpp,hxx'
- - key: google-global-names-in-headers.HeaderFileExtensions
- value: ',h,hh,hpp,hxx'
- - key: google-readability-braces-around-statements.ShortStatementLines
- value: '1'
- - key: google-readability-function-size.BranchThreshold
- value: '4294967295'
- - key: google-readability-function-size.LineThreshold
- value: '4294967295'
- - key: google-readability-function-size.NestingThreshold
- value: '4294967295'
- - key: google-readability-function-size.ParameterThreshold
- value: '4294967295'
- - key: google-readability-function-size.StatementThreshold
- value: '800'
- - key: google-readability-function-size.VariableThreshold
- value: '4294967295'
- - key: google-readability-namespace-comments.ShortNamespaceLines
- value: '10'
- - key: google-readability-namespace-comments.SpacesBeforeComments
- value: '2'
- - key: google-runtime-int.SignedTypePrefix
- value: int
- - key: google-runtime-int.TypeSuffix
- value: ''
- - key: google-runtime-int.UnsignedTypePrefix
- value: uint
- - key: google-runtime-references.WhiteListTypes
- value: ''
- - key: modernize-loop-convert.MaxCopySize
- value: '16'
- - key: modernize-loop-convert.MinConfidence
- value: reasonable
- - key: modernize-loop-convert.NamingStyle
- value: CamelCase
- - key: modernize-pass-by-value.IncludeStyle
- value: llvm
- - key: modernize-replace-auto-ptr.IncludeStyle
- value: llvm
- - key: modernize-use-nullptr.NullMacros
- value: 'NULL'
-...
-
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.gitattributes b/funasr/runtime/onnxruntime/third_party/glog/.gitattributes
deleted file mode 100644
index 2f6d49472..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.gitattributes
+++ /dev/null
@@ -1 +0,0 @@
-*.h linguist-language=C++
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/android.yml b/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/android.yml
deleted file mode 100644
index e860d0a3f..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/android.yml
+++ /dev/null
@@ -1,55 +0,0 @@
-name: Android
-
-on: [push, pull_request]
-
-jobs:
- build-android:
- name: NDK-C++${{matrix.std}}-${{matrix.abi}}-${{matrix.build_type}}
- runs-on: ubuntu-latest
- defaults:
- run:
- shell: bash
- env:
- NDK_VERSION: 25.0.8775105
- strategy:
- fail-fast: true
- matrix:
- std: [14, 17, 20]
- abi: [arm64-v8a, armeabi-v7a, x86_64, x86]
- build_type: [Debug, Release]
-
- steps:
- - uses: actions/checkout@v3
-
- - name: Setup Ninja
- uses: ashutoshvarma/setup-ninja@master
- with:
- version: 1.10.0
-
- - name: Setup NDK
- env:
- ANDROID_SDK_ROOT: /usr/local/lib/android/sdk
- run: |
- echo 'y' | ${{env.ANDROID_SDK_ROOT}}/cmdline-tools/latest/bin/sdkmanager --install 'ndk;${{env.NDK_VERSION}}'
-
- - name: Configure
- env:
- CXXFLAGS: -Wall -Wextra -Wpedantic -Wsign-conversion -Wtautological-compare -Wformat-nonliteral -Wundef -Werror ${{env.CXXFLAGS}}
- run: |
- cmake -S . -B build_${{matrix.abi}} \
- -DCMAKE_ANDROID_API=28 \
- -DCMAKE_ANDROID_ARCH_ABI=${{matrix.abi}} \
- -DCMAKE_ANDROID_NDK=/usr/local/lib/android/sdk/ndk/${{env.NDK_VERSION}} \
- -DCMAKE_ANDROID_STL_TYPE=c++_shared \
- -DCMAKE_BUILD_TYPE=${{matrix.build_type}} \
- -DCMAKE_CXX_EXTENSIONS=OFF \
- -DCMAKE_CXX_STANDARD=${{matrix.std}} \
- -DCMAKE_CXX_STANDARD_REQUIRED=ON \
- -DCMAKE_SYSTEM_NAME=Android \
- -G Ninja \
- -Werror
-
- - name: Build
- run: |
- cmake --build build_${{matrix.abi}} \
- --config ${{matrix.build_type}}
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/cifuzz.yml b/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/cifuzz.yml
deleted file mode 100644
index 091024b88..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/cifuzz.yml
+++ /dev/null
@@ -1,26 +0,0 @@
-name: CIFuzz
-on: [pull_request]
-jobs:
- Fuzzing:
- runs-on: ubuntu-latest
- steps:
- - name: Build Fuzzers
- id: build
- uses: google/oss-fuzz/infra/cifuzz/actions/build_fuzzers@master
- with:
- oss-fuzz-project-name: 'glog'
- dry-run: false
- language: c++
- - name: Run Fuzzers
- uses: google/oss-fuzz/infra/cifuzz/actions/run_fuzzers@master
- with:
- oss-fuzz-project-name: 'glog'
- fuzz-seconds: 60
- dry-run: false
- language: c++
- - name: Upload Crash
- uses: actions/upload-artifact@v3
- if: failure() && steps.build.outcome == 'success'
- with:
- name: artifacts
- path: ./out/artifacts
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/emscripten.yml b/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/emscripten.yml
deleted file mode 100644
index 4bbb6dc56..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/emscripten.yml
+++ /dev/null
@@ -1,52 +0,0 @@
-name: Emscripten
-
-on: [push, pull_request]
-
-jobs:
- build-linux:
- defaults:
- run:
- shell: bash
- name: Emscripten-C++${{matrix.std}}-${{matrix.build_type}}-${{matrix.lib}}
- runs-on: ubuntu-latest
- container: emscripten/emsdk
- strategy:
- fail-fast: true
- matrix:
- build_type: [Release, Debug]
- lib: [static]
- std: [14, 17, 20]
-
- steps:
- - uses: actions/checkout@v3
-
- - name: Setup Dependencies
- run: |
- apt-get update
- DEBIAN_FRONTEND=noninteractive sudo apt-get install -y \
- cmake \
- ninja-build
-
- - name: Configure
- env:
- CXXFLAGS: -Wall -Wextra -Wsign-conversion -Wtautological-compare -Wformat-nonliteral -Wundef -Werror -Wno-error=wasm-exception-spec ${{env.CXXFLAGS}}
- run: |
- cmake -S . -B build_${{matrix.build_type}} \
- -DBUILD_SHARED_LIBS=${{matrix.lib == 'shared'}} \
- -DCMAKE_AR=$(which emar) \
- -DCMAKE_CXX_COMPILER=$(which em++) \
- -DCMAKE_CXX_STANDARD=${{matrix.std}} \
- -DCMAKE_CXX_STANDARD_REQUIRED=ON \
- -DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=ONLY \
- -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
- -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ONLY \
- -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
- -DCMAKE_INSTALL_PREFIX=${{github.workspace}}/install \
- -DCMAKE_RANLIB=$(which emranlib) \
- -G Ninja \
- -Werror
-
- - name: Build
- run: |
- cmake --build build_${{matrix.build_type}} \
- --config ${{matrix.build_type}}
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/linux.yml b/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/linux.yml
deleted file mode 100644
index 9d29f597b..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/linux.yml
+++ /dev/null
@@ -1,143 +0,0 @@
-name: Linux
-
-on: [push, pull_request]
-
-jobs:
- build-linux:
- defaults:
- run:
- shell: bash
- name: GCC-C++${{matrix.std}}-${{matrix.build_type}}-${{matrix.lib}}
- runs-on: ubuntu-22.04
- strategy:
- fail-fast: true
- matrix:
- build_type: [Release, Debug]
- lib: [shared, static]
- std: [14, 17, 20]
-
- steps:
- - uses: actions/checkout@v3
-
- - name: Setup Dependencies
- run: |
- sudo apt-get update
- DEBIAN_FRONTEND=noninteractive sudo apt-get install -y \
- build-essential \
- cmake \
- lcov \
- libgflags-dev \
- libunwind-dev \
- ninja-build
-
- - name: Cache GTest
- id: cache-gtest
- uses: actions/cache@v3
- with:
- path: gtest/
- key: ${{runner.os}}-gtest-1.11
-
- - name: Download GTest
- if: steps.cache-gtest.outputs.cache-hit != 'true'
- run: |
- wget https://github.com/google/googletest/archive/refs/tags/release-1.11.0.tar.gz
- tar xvf release-1.11.0.tar.gz
-
- - name: Build GTest
- if: steps.cache-gtest.outputs.cache-hit != 'true'
- run: |
- cmake -S googletest-release-1.11.0 -B build-googletest \
- -DBUILD_SHARED_LIBS=${{matrix.shared}} \
- -DCMAKE_BUILD_TYPE=${{matrix.build_type}} \
- -DCMAKE_INSTALL_PREFIX=${{github.workspace}}/gtest \
- -G Ninja
- cmake --build build-googletest --target install
-
- - name: Setup Environment
- if: matrix.build_type == 'Debug'
- run: |
- echo 'CXXFLAGS=--coverage' >> $GITHUB_ENV
- echo 'GTest_ROOT=${{github.workspace}}/gtest' >> $GITHUB_ENV
-
- - name: Configure
- env:
- CXXFLAGS: -Wall -Wextra -Wsign-conversion -Wtautological-compare -Wformat-nonliteral -Wundef -Werror ${{env.CXXFLAGS}}
- run: |
- cmake -S . -B build_${{matrix.build_type}} \
- -DBUILD_SHARED_LIBS=${{matrix.lib == 'shared'}} \
- -DCMAKE_CXX_STANDARD=${{matrix.std}} \
- -DCMAKE_CXX_STANDARD_REQUIRED=ON \
- -DCMAKE_INSTALL_PREFIX=${{github.workspace}}/install \
- -G Ninja \
- -Werror
-
- - name: Build
- run: |
- cmake --build build_${{matrix.build_type}} \
- --config ${{matrix.build_type}}
-
- - name: Install
- run: |
- cmake --build build_${{matrix.build_type}} \
- --config ${{matrix.build_type}} \
- --target install
-
- cmake build_${{matrix.build_type}} \
- -DCMAKE_INSTALL_INCLUDEDIR=${{runner.workspace}}/foo/include \
- -DCMAKE_INSTALL_LIBDIR=${{runner.workspace}}/foo/lib \
- -DCMAKE_INSTALL_DATAROOTDIR=${{runner.workspace}}/foo/share
- cmake --build build_${{matrix.build_type}} \
- --config ${{matrix.build_type}} \
- --target install
-
- - name: Test CMake Package (relative GNUInstallDirs)
- run: |
- cmake -S src/package_config_unittest/working_config \
- -B build_${{matrix.build_type}}_package \
- -DCMAKE_BUILD_TYPE=${{matrix.build_type}} \
- -DCMAKE_PREFIX_PATH=${{github.workspace}}/install \
- -G Ninja
- cmake --build build_${{matrix.build_type}}_package \
- --config ${{matrix.build_type}}
-
- - name: Test CMake Package (absolute GNUInstallDirs)
- run: |
- cmake -S src/package_config_unittest/working_config \
- -B build_${{matrix.build_type}}_package_foo \
- -DCMAKE_BUILD_TYPE=${{matrix.build_type}} \
- -DCMAKE_PREFIX_PATH=${{runner.workspace}}/foo \
- -G Ninja
- cmake --build build_${{matrix.build_type}}_package_foo \
- --config ${{matrix.build_type}}
-
- - name: Test
- run: |
- ctest --test-dir build_${{matrix.build_type}} -j$(nproc) --output-on-failure
-
- - name: Generate Coverage
- if: matrix.build_type == 'Debug'
- run: |
- lcov --directory . --capture --output-file coverage.info
- lcov --remove coverage.info \
- '${{github.workspace}}/gtest/*' \
- '*/src/*_unittest.cc' \
- '*/src/googletest.h' \
- '*/src/mock-log.h' \
- '/usr/*' \
- --output-file coverage.info
-
- for file in src/glog/*.h.in; do
- name=$(basename ${file})
- name_we=${name%.h.in}
- sed -i "s|build_${{matrix.build_type}}/glog/${name_we}.h\$|${file}|g" coverage.info
- done
-
- lcov --list coverage.info
-
- - name: Upload Coverage to Codecov
- if: matrix.build_type == 'Debug'
- uses: codecov/codecov-action@v3
- with:
- token: ${{ secrets.CODECOV_TOKEN }}
- fail_ci_if_error: true
- verbose: true
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/macos.yml b/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/macos.yml
deleted file mode 100644
index 51cf7ffb6..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/macos.yml
+++ /dev/null
@@ -1,83 +0,0 @@
-name: macOS
-
-on: [push, pull_request]
-
-jobs:
- build-macos:
- name: AppleClang-C++${{matrix.std}}-${{matrix.build_type}}
- runs-on: macos-12
- strategy:
- fail-fast: true
- matrix:
- std: [14, 17, 20]
- include:
- - generator: Ninja
- - build_type: Debug
-
- steps:
- - uses: actions/checkout@v3
-
- - name: Setup Ninja
- uses: ashutoshvarma/setup-ninja@master
- with:
- version: 1.10.0
-
- - name: Setup Dependencies
- run: |
- brew install lcov
-
- - name: Setup Environment
- if: matrix.build_type == 'Debug'
- run: |
- echo 'CXXFLAGS=--coverage' >> $GITHUB_ENV
-
- - name: Configure
- shell: bash
- env:
- CXXFLAGS: -Wall -Wextra -Wsign-conversion -Wtautological-compare -Wformat-nonliteral -Wundef -Werror ${{env.CXXFLAGS}}
- run: |
- cmake -S . -B build_${{matrix.build_type}} \
- -DCMAKE_CXX_EXTENSIONS=OFF \
- -DCMAKE_CXX_FLAGS_DEBUG=-pedantic-errors \
- -DCMAKE_CXX_FLAGS_RELEASE=-pedantic-errors \
- -DCMAKE_CXX_STANDARD=${{matrix.std}} \
- -DCMAKE_CXX_STANDARD_REQUIRED=ON \
- -G "${{matrix.generator}}" \
- -Werror
-
- - name: Build
- run: |
- cmake --build build_${{matrix.build_type}} \
- --config ${{matrix.build_type}}
-
- - name: Test
- run: |
- ctest --test-dir build_${{matrix.build_type}} \
- --output-on-failure
-
- - name: Generate Coverage
- if: matrix.build_type == 'Debug'
- run: |
- lcov --directory . --capture --output-file coverage.info
- lcov --remove coverage.info \
- '*/src/*_unittest.cc' \
- '*/src/googletest.h' \
- '*/src/mock-log.h' \
- '*/usr/*' \
- --output-file coverage.info
-
- for file in src/glog/*.h.in; do
- name=$(basename ${file})
- name_we=${name%.h.in}
- sed -i "" "s|${{github.workspace}}/glog/${name_we}.h\$|${file}|g" coverage.info
- done
-
- lcov --list coverage.info
-
- - name: Upload Coverage to Codecov
- if: matrix.build_type == 'Debug'
- uses: codecov/codecov-action@v3
- with:
- token: ${{ secrets.CODECOV_TOKEN }}
- fail_ci_if_error: true
- verbose: true
diff --git a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/windows.yml b/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/windows.yml
deleted file mode 100644
index 158e8d1db..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/.github/workflows/windows.yml
+++ /dev/null
@@ -1,224 +0,0 @@
-name: Windows
-
-on: [push, pull_request]
-
-jobs:
- build-msvc:
- name: ${{matrix.msvc}}-${{matrix.arch}}-C++${{matrix.std}}-${{matrix.build_type}}-${{matrix.lib}}
- runs-on: ${{matrix.os}}
- defaults:
- run:
- shell: powershell
- env:
- CL: /MP
- CXXFLAGS: /WX /permissive-
- strategy:
- fail-fast: true
- matrix:
- arch: [Win32, x64]
- build_type: [Debug, Release]
- lib: [shared, static]
- msvc: [VS-16-2019, VS-17-2022]
- std: [14, 17, 20]
- include:
- - msvc: VS-16-2019
- os: windows-2019
- generator: 'Visual Studio 16 2019'
- - msvc: VS-17-2022
- os: windows-2022
- generator: 'Visual Studio 17 2022'
-
- steps:
- - uses: actions/checkout@v3
-
- - name: Cache GTest
- id: cache-gtest
- uses: actions/cache@v3
- with:
- path: gtest/
- key: ${{runner.os}}-gtest-1.11-${{matrix.lib}}-${{matrix.arch}}-${{matrix.build_type}}
-
- - name: Download GTest
- if: steps.cache-gtest.outputs.cache-hit != 'true'
- run: |
- (New-Object System.Net.WebClient).DownloadFile("https://github.com/google/googletest/archive/refs/tags/release-1.11.0.zip", "release-1.11.0.zip")
- Expand-Archive release-1.11.0.zip .
-
- - name: Build GTest
- if: steps.cache-gtest.outputs.cache-hit != 'true'
- run: |
- cmake -S googletest-release-1.11.0 -B build-googletest `
- -A ${{matrix.arch}} `
- -DBUILD_SHARED_LIBS=${{matrix.lib == 'shared'}} `
- -Dgtest_force_shared_crt=ON `
- -DCMAKE_INSTALL_PREFIX=${{github.workspace}}/gtest
- cmake --build build-googletest `
- --config ${{matrix.build_type}} `
- --target install
-
- - name: Cache gflags
- id: cache-gflags
- uses: actions/cache@v3
- with:
- path: gflags/
- key: ${{runner.os}}-gflags-2.2.2-${{matrix.lib}}-${{matrix.arch}}-${{matrix.build_type}}
-
- - name: Download gflags
- if: steps.cache-gflags.outputs.cache-hit != 'true'
- run: |
- (New-Object System.Net.WebClient).DownloadFile("https://github.com/gflags/gflags/archive/refs/tags/v2.2.2.zip", "v2.2.2.zip")
- Expand-Archive v2.2.2.zip .
-
- - name: Build gflags
- if: steps.cache-gflags.outputs.cache-hit != 'true'
- run: |
- cmake -S gflags-2.2.2 -B build-gflags `
- -A ${{matrix.arch}} `
- -DBUILD_SHARED_LIBS=${{matrix.lib == 'shared'}} `
- -DCMAKE_INSTALL_PREFIX=${{github.workspace}}/gflags
- cmake --build build-gflags `
- --config ${{matrix.build_type}} `
- --target install
-
- - name: Setup Environment
- run: |
- echo "GTest_ROOT=$((Get-Item .).FullName)/gtest" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
- echo "gflags_ROOT=$((Get-Item .).FullName)/gflags" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
- echo "${{github.workspace}}/gtest/bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
- echo "${{github.workspace}}/gflags/bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-
- - name: Setup Release Environment
- if: matrix.build_type != 'Debug'
- run: |
- echo "CXXFLAGS=/Zi ${{env.CXXFLAGS}}" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
-
- - name: Configure
- run: |
- cmake -S . -B build_${{matrix.build_type}} `
- -A ${{matrix.arch}} `
- -DBUILD_SHARED_LIBS=${{matrix.lib == 'shared'}} `
- -DCMAKE_CXX_EXTENSIONS=OFF `
- -DCMAKE_CXX_STANDARD=${{matrix.std}} `
- -DCMAKE_CXX_STANDARD_REQUIRED=ON `
- -DCMAKE_EXE_LINKER_FLAGS='/NOIMPLIB' `
- -DCMAKE_EXE_LINKER_FLAGS_RELEASE='/INCREMENTAL:NO /DEBUG' `
- -DCMAKE_INSTALL_PREFIX:PATH=./install `
- -DCMAKE_MSVC_RUNTIME_LIBRARY='MultiThreaded$<$:Debug>DLL' `
- -G "${{matrix.generator}}" `
- -Werror
-
- - name: Build
- run: cmake --build build_${{matrix.build_type}} `
- --config ${{matrix.build_type}}
-
- - name: Test
- env:
- CTEST_OUTPUT_ON_FAILURE: 1
- run: |
- cmake --build build_${{matrix.build_type}}/ `
- --config ${{matrix.build_type}} `
- --target RUN_TESTS
-
- - name: Install
- run: |
- cmake --build build_${{matrix.build_type}}/ `
- --config ${{matrix.build_type}} `
- --target install
-
- build-mingw:
- name: ${{matrix.sys}}-${{matrix.env}}-C++${{matrix.std}}-${{matrix.build_type}}-${{matrix.lib}}
- runs-on: windows-2022
- env:
- BUILDDIR: 'build_${{matrix.sys}}-${{matrix.env}}-C++${{matrix.std}}-${{matrix.build_type}}-${{matrix.lib}}'
- defaults:
- run:
- shell: msys2 {0}
- strategy:
- fail-fast: true
- matrix:
- build_type: [Debug]
- lib: [shared, static]
- std: [14, 17, 20]
- sys: [mingw32, mingw64]
- include:
- - sys: mingw32
- env: i686
- - sys: mingw64
- env: x86_64
-
- steps:
- - uses: actions/checkout@v3
- - uses: msys2/setup-msys2@v2
- with:
- msystem: ${{matrix.sys}}
- install: >-
- lcov
- mingw-w64-${{matrix.env}}-cmake
- mingw-w64-${{matrix.env}}-gcc
- mingw-w64-${{matrix.env}}-gflags
- mingw-w64-${{matrix.env}}-ninja
-
- - name: Setup Environment
- if: matrix.build_type == 'Debug'
- run: |
- echo 'CXXFLAGS=--coverage ${{env.CXXFLAGS}}' >> $GITHUB_ENV
-
- - name: Configure
- env:
- CXXFLAGS: -Wall -Wextra -Wpedantic -Wsign-conversion -Wtautological-compare -Wformat-nonliteral -Wundef -Werror ${{env.CXXFLAGS}}
- run: |
- cmake -S . -B build_${{matrix.build_type}}/ \
- -DBUILD_SHARED_LIBS=${{matrix.lib == 'shared'}} \
- -DCMAKE_BUILD_TYPE=${{matrix.build_type}} \
- -DCMAKE_CXX_EXTENSIONS=OFF \
- -DCMAKE_CXX_STANDARD=${{matrix.std}} \
- -DCMAKE_CXX_STANDARD_REQUIRED=ON \
- -DCMAKE_INSTALL_PREFIX:PATH=./install \
- -G Ninja \
- -Werror
-
- - name: Build
- run: |
- cmake --build build_${{matrix.build_type}}/ --config ${{matrix.build_type}}
-
- - name: Test
- env:
- CTEST_OUTPUT_ON_FAILURE: 1
- run: |
- cmake --build build_${{matrix.build_type}}/ --config ${{matrix.build_type}} \
- --target test
-
- - name: Install
- run: |
- cmake --build build_${{matrix.build_type}}/ \
- --config ${{matrix.build_type}} \
- --target install
-
- - name: Generate Coverage
- if: matrix.build_type == 'Debug'
- run: |
- lcov --directory . --capture --output-file coverage.info
- lcov --remove coverage.info \
- '*/install/include/*' \
- '*/msys64/mingw32/*' \
- '*/msys64/mingw64/*' \
- '*/src/*_unittest.cc' \
- '*/src/googletest.h' \
- '*/src/mock-log.h' \
- --output-file coverage.info
-
- for file in src/glog/*.h.in; do
- name=$(basename ${file})
- name_we=${name%.h.in}
- sed -i "s|build_${{matrix.build_type}}/glog/${name_we}.h\$|${file}|g" coverage.info
- done
-
- lcov --list coverage.info
-
- - name: Upload Coverage to Codecov
- if: matrix.build_type == 'Debug'
- uses: codecov/codecov-action@v3
- with:
- token: ${{ secrets.CODECOV_TOKEN }}
- fail_ci_if_error: true
- verbose: true
diff --git a/funasr/runtime/onnxruntime/third_party/glog/AUTHORS b/funasr/runtime/onnxruntime/third_party/glog/AUTHORS
deleted file mode 100644
index 9d711ec62..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/AUTHORS
+++ /dev/null
@@ -1,29 +0,0 @@
-# This is the official list of glog authors for copyright purposes.
-# This file is distinct from the CONTRIBUTORS files.
-# See the latter for an explanation.
-#
-# Names should be added to this file as:
-# Name or Organization
-# The email address is not required for organizations.
-#
-# Please keep the list sorted.
-
-Abhishek Dasgupta
-Abhishek Parmar
-Andrew Schwartzmeyer
-Andy Ying
-Brian Silverman
-Dmitriy Arbitman
-Google Inc.
-Guillaume Dumont
-Marco Wang
-Michael Tanner
-MiniLight
-romange
-Roman Perepelitsa
-Sergiu Deitsch
-tbennun
-Teddy Reed
-Vijaymahantesh Sattigeri
-Zhongming Qu
-Zhuoran Shen
diff --git a/funasr/runtime/onnxruntime/third_party/glog/BUILD.bazel b/funasr/runtime/onnxruntime/third_party/glog/BUILD.bazel
deleted file mode 100644
index 0acdc72b6..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/BUILD.bazel
+++ /dev/null
@@ -1,22 +0,0 @@
-licenses(["notice"])
-
-exports_files(["COPYING"])
-
-load(":bazel/glog.bzl", "glog_library")
-
-glog_library()
-
-# platform() to build with clang-cl on Bazel CI. This is enabled with
-# the flags in .bazelci/presubmit.yml:
-#
-# --incompatible_enable_cc_toolchain_resolution
-# --extra_toolchains=@local_config_cc//:cc-toolchain-x64_windows-clang-cl
-# --extra_execution_platforms=//:x64_windows-clang-cl
-platform(
- name = "x64_windows-clang-cl",
- constraint_values = [
- "@platforms//cpu:x86_64",
- "@platforms//os:windows",
- "@bazel_tools//tools/cpp:clang-cl",
- ],
-)
diff --git a/funasr/runtime/onnxruntime/third_party/glog/CONTRIBUTORS b/funasr/runtime/onnxruntime/third_party/glog/CONTRIBUTORS
deleted file mode 100644
index 05cb688cd..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/CONTRIBUTORS
+++ /dev/null
@@ -1,52 +0,0 @@
-# People who have agreed to one of the CLAs and can contribute patches.
-# The AUTHORS file lists the copyright holders; this file
-# lists people. For example, Google employees are listed here
-# but not in AUTHORS, because Google holds the copyright.
-#
-# Names should be added to this file only after verifying that
-# the individual or the individual's organization has agreed to
-# the appropriate Contributor License Agreement, found here:
-#
-# https://developers.google.com/open-source/cla/individual
-# https://developers.google.com/open-source/cla/corporate
-#
-# The agreement for individuals can be filled out on the web.
-#
-# When adding J Random Contributor's name to this file,
-# either J's name or J's organization's name should be
-# added to the AUTHORS file, depending on whether the
-# individual or corporate CLA was used.
-#
-# Names should be added to this file as:
-# Name
-#
-# Please keep the list sorted.
-
-Abhishek Dasgupta
-Abhishek Parmar
-Andrew Schwartzmeyer
-Andy Ying
-Bret McKee
-Brian Silverman
-Dmitriy Arbitman
-Fumitoshi Ukai
-Guillaume Dumont
-Håkan L. S. Younes
-Ivan Penkov
-Jacob Trimble
-Jim Ray
-Marco Wang
-Michael Darr
-Michael Tanner
-MiniLight
-Peter Collingbourne
-Rodrigo Queiro
-romange
-Roman Perepelitsa
-Sergiu Deitsch
-Shinichiro Hamaji
-tbennun
-Teddy Reed
-Vijaymahantesh Sattigeri
-Zhongming Qu
-Zhuoran Shen
diff --git a/funasr/runtime/onnxruntime/third_party/glog/COPYING b/funasr/runtime/onnxruntime/third_party/glog/COPYING
deleted file mode 100644
index 38396b580..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/COPYING
+++ /dev/null
@@ -1,65 +0,0 @@
-Copyright (c) 2008, Google Inc.
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are
-met:
-
- * Redistributions of source code must retain the above copyright
-notice, this list of conditions and the following disclaimer.
- * Redistributions in binary form must reproduce the above
-copyright notice, this list of conditions and the following disclaimer
-in the documentation and/or other materials provided with the
-distribution.
- * Neither the name of Google Inc. nor the names of its
-contributors may be used to endorse or promote products derived from
-this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-
-A function gettimeofday in utilities.cc is based on
-
-http://www.google.com/codesearch/p?hl=en#dR3YEbitojA/COPYING&q=GetSystemTimeAsFileTime%20license:bsd
-
-The license of this code is:
-
-Copyright (c) 2003-2008, Jouni Malinen and contributors
-All Rights Reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are
-met:
-
-1. Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimer in the
- documentation and/or other materials provided with the distribution.
-
-3. Neither the name(s) of the above-listed copyright holder(s) nor the
- names of its contributors may be used to endorse or promote products
- derived from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/funasr/runtime/onnxruntime/third_party/glog/ChangeLog b/funasr/runtime/onnxruntime/third_party/glog/ChangeLog
deleted file mode 100644
index a107e9391..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/ChangeLog
+++ /dev/null
@@ -1,109 +0,0 @@
-2022-04-05 Google Inc.
-
- * google-glog: version 0.6.0.
- * See git log for the details.
-
-2021-05-08 Google Inc.
-
- * google-glog: version 0.5.0.
- * See git log for the details.
-
-2019-01-22 Google Inc.
-
- * google-glog: version 0.4.0.
- * See git log for the details.
-
-2017-05-09 Google Inc.
-
- * google-glog: version 0.3.5
- * See git log for the details.
-
-2015-03-09 Google Inc.
-
- * google-glog: version 0.3.4
- * See git log for the details.
-
-2013-02-01 Google Inc.
-
- * google-glog: version 0.3.3
- * Add --disable-rtti option for configure.
- * Visual Studio build and test fix.
- * QNX build fix (thanks vanuan).
- * Reduce warnings.
- * Fixed LOG_SYSRESULT (thanks ukai).
- * FreeBSD build fix (thanks yyanagisawa).
- * Clang build fix.
- * Now users can re-initialize glog after ShutdownGoogleLogging.
- * Color output support by GLOG_colorlogtostderr (thanks alexs).
- * Now glog's ABI around flags are compatible with gflags.
- * Document mentions how to modify flags from user programs.
-
-2012-01-12 Google Inc.
-
- * google-glog: version 0.3.2
- * Clang support.
- * Demangler and stacktrace improvement for newer GCCs.
- * Now fork(2) doesn't mess up log files.
- * Make valgrind happier.
- * Reduce warnings for more -W options.
- * Provide a workaround for ERROR defined by windows.h.
-
-2010-06-15 Google Inc.
-
- * google-glog: version 0.3.1
- * GLOG_* environment variables now work even when gflags is installed.
- * Snow leopard support.
- * Now we can build and test from out side tree.
- * Add DCHECK_NOTNULL.
- * Add ShutdownGoogleLogging to close syslog (thanks DGunchev)
- * Fix --enable-frame-pointers option (thanks kazuki.ohta)
- * Fix libunwind detection (thanks giantchen)
-
-2009-07-30 Google Inc.
-
- * google-glog: version 0.3.0
- * Fix a deadlock happened when user uses glog with recent gflags.
- * Suppress several unnecessary warnings (thanks keir).
- * NetBSD and OpenBSD support.
- * Use Win32API GetComputeNameA properly (thanks magila).
- * Fix user name detection for Windows (thanks ademin).
- * Fix several minor bugs.
-
-2009-04-10 Google Inc.
- * google-glog: version 0.2.1
- * Fix timestamps of VC++ version.
- * Add pkg-config support (thanks Tomasz)
- * Fix build problem when building with gtest (thanks Michael)
- * Add --with-gflags option for configure (thanks Michael)
- * Fixes for GCC 4.4 (thanks John)
-
-2009-01-23 Google Inc.
- * google-glog: version 0.2
- * Add initial Windows VC++ support.
- * Google testing/mocking frameworks integration.
- * Link pthread library automatically.
- * Flush logs in signal handlers.
- * Add macros LOG_TO_STRING, LOG_AT_LEVEL, DVLOG, and LOG_TO_SINK_ONLY.
- * Log microseconds.
- * Add --log_backtrace_at option.
- * Fix some minor bugs.
-
-2008-11-18 Google Inc.
- * google-glog: version 0.1.2
- * Add InstallFailureSignalHandler(). (satorux)
- * Re-organize the way to produce stacktraces.
- * Don't define unnecessary macro DISALLOW_EVIL_CONSTRUCTORS.
-
-2008-10-15 Google Inc.
- * google-glog: version 0.1.1
- * Support symbolize for MacOSX 10.5.
- * BUG FIX: --vmodule didn't work with gflags.
- * BUG FIX: symbolize_unittest failed with GCC 4.3.
- * Several fixes on the document.
-
-2008-10-07 Google Inc.
-
- * google-glog: initial release:
- The glog package contains a library that implements application-level
- logging. This library provides logging APIs based on C++-style
- streams and various helper macros.
diff --git a/funasr/runtime/onnxruntime/third_party/glog/README.rst b/funasr/runtime/onnxruntime/third_party/glog/README.rst
deleted file mode 100644
index 29a38d57a..000000000
--- a/funasr/runtime/onnxruntime/third_party/glog/README.rst
+++ /dev/null
@@ -1,890 +0,0 @@
-Google Logging Library
-======================
-
-|Linux Github actions| |Windows Github actions| |macOS Github actions| |Codecov|
-
-Google Logging (glog) is a C++14 library that implements application-level
-logging. The library provides logging APIs based on C++-style streams and
-various helper macros.
-
-.. role:: cmake(code)
- :language: cmake
-
-.. role:: cmd(code)
- :language: bash
-
-.. role:: cpp(code)
- :language: cpp
-
-.. role:: bazel(code)
- :language: starlark
-
-
-Getting Started
----------------
-
-You can log a message by simply streaming things to ``LOG``\ (`__>), e.g.,
-
-.. code:: cpp
-
- #include
-
- int main(int argc, char* argv[]) {
- // Initialize Google’s logging library.
- google::InitGoogleLogging(argv[0]);
-
- // ...
- LOG(INFO) << "Found " << num_cookies << " cookies";
- }
-
-
-For a detailed overview of glog features and their usage, please refer
-to the `user guide <#user-guide>`__.
-
-.. contents:: Table of Contents
-
-
-Building from Source
---------------------
-
-glog supports multiple build systems for compiling the project from
-source: `Bazel <#bazel>`__, `CMake <#cmake>`__, `vcpkg <#vcpkg>`__, and `conan <#conan>`__.
-
-Bazel
-~~~~~
-
-To use glog within a project which uses the
-`Bazel `__ build tool, add the following lines to
-your ``WORKSPACE`` file:
-
-.. code:: bazel
-
- load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
-
- http_archive(
- name = "com_github_gflags_gflags",
- sha256 = "34af2f15cf7367513b352bdcd2493ab14ce43692d2dcd9dfc499492966c64dcf",
- strip_prefix = "gflags-2.2.2",
- urls = ["https://github.com/gflags/gflags/archive/v2.2.2.tar.gz"],
- )
-
- http_archive(
- name = "com_github_google_glog",
- sha256 = "122fb6b712808ef43fbf80f75c52a21c9760683dae470154f02bddfc61135022",
- strip_prefix = "glog-0.6.0",
- urls = ["https://github.com/google/glog/archive/v0.6.0.zip"],
- )
-
-You can then add :bazel:`@com_github_google_glog//:glog` to the deps section
-of a :bazel:`cc_binary` or :bazel:`cc_library` rule, and :code:`#include
-` to include it in your source code. Here’s a simple example:
-
-.. code:: bazel
-
- cc_binary(
- name = "main",
- srcs = ["main.cc"],
- deps = ["@com_github_google_glog//:glog"],
- )
-
-CMake
-~~~~~
-
-glog also supports CMake that can be used to build the project on a wide
-range of platforms. If you don’t have CMake installed already, you can
-download it for from CMake’s `official
-website `__.
-
-CMake works by generating native makefiles or build projects that can be
-used in the compiler environment of your choice. You can either build
-glog with CMake as a standalone project or it can be incorporated into
-an existing CMake build for another project.
-
-Building glog with CMake
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-When building glog as a standalone project, on Unix-like systems with
-GNU Make as build tool, the typical workflow is:
-
-1. Get the source code and change to it. e.g., cloning with git:
-
- .. code:: bash
-
- git clone https://github.com/google/glog.git
- cd glog
-
-2. Run CMake to configure the build tree.
-
- .. code:: bash
-
- cmake -S . -B build -G "Unix Makefiles"
-
- CMake provides different generators, and by default will pick the most
- relevant one to your environment. If you need a specific version of Visual
- Studio, use :cmd:`cmake . -G `, and see :cmd:`cmake --help`
- for the available generators. Also see :cmd:`-T `, which can
- be used to request the native x64 toolchain with :cmd:`-T host=x64`.
-
-3. Afterwards, generated files can be used to compile the project.
-
- .. code:: bash
-
- cmake --build build
-
-4. Test the build software (optional).
-
- .. code:: bash
-
- cmake --build build --target test
-
-5. Install the built files (optional).
-
- .. code:: bash
-
- cmake --build build --target install
-
-Consuming glog in a CMake Project
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-If you have glog installed in your system, you can use the CMake command
-:cmake:`find_package` to build against glog in your CMake Project as follows:
-
-.. code:: cmake
-
- cmake_minimum_required (VERSION 3.16)
- project (myproj VERSION 1.0)
-
- find_package (glog 0.6.0 REQUIRED)
-
- add_executable (myapp main.cpp)
- target_link_libraries (myapp glog::glog)
-
-Compile definitions and options will be added automatically to your
-target as needed.
-
-Incorporating glog into a CMake Project
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-You can also use the CMake command :cmake:`add_subdirectory` to include glog
-directly from a subdirectory of your project by replacing the
-:cmake:`find_package` call from the previous example by
-:cmake:`add_subdirectory`. The :cmake:`glog::glog` target is in this case an
-:cmake:`ALIAS` library target for the ``glog`` library target.
-
-Again, compile definitions and options will be added automatically to
-your target as needed.
-
-vcpkg
-~~~~~
-
-You can download and install glog using the `vcpkg
-`__ dependency manager:
-
-.. code:: bash
-
- git clone https://github.com/Microsoft/vcpkg.git
- cd vcpkg
- ./bootstrap-vcpkg.sh
- ./vcpkg integrate install
- ./vcpkg install glog
-
-The glog port in vcpkg is kept up to date by Microsoft team members and
-community contributors. If the version is out of date, please create an
-issue or pull request on the vcpkg repository.
-
-conan
-~~~~~
-
-You can download and install glog using the `conan
-`__ package manager:
-
-.. code:: bash
-
- pip install conan
- conan install -r conancenter glob/@
-
-The glog recipe in conan center is kept up to date by conan center index community
-contributors. If the version is out of date, please create an
-issue or pull request on the `conan-center-index
-`__ repository.
-
-User Guide
-----------
-
-glog defines a series of macros that simplify many common logging tasks.
-You can log messages by severity level, control logging behavior from
-the command line, log based on conditionals, abort the program when
-expected conditions are not met, introduce your own verbose logging
-levels, customize the prefix attached to log messages, and more.
-
-Following sections describe the functionality supported by glog. Please note
-this description may not be complete but limited to the most useful ones. If you
-want to find less common features, please check header files under `src/glog
-`__ directory.
-
-Severity Levels
-~~~~~~~~~~~~~~~
-
-You can specify one of the following severity levels (in increasing
-order of severity): ``INFO``, ``WARNING``, ``ERROR``, and ``FATAL``.
-Logging a ``FATAL`` message terminates the program (after the message is
-logged). Note that messages of a given severity are logged not only in
-the logfile for that severity, but also in all logfiles of lower
-severity. E.g., a message of severity ``FATAL`` will be logged to the
-logfiles of severity ``FATAL``, ``ERROR``, ``WARNING``, and ``INFO``.
-
-The ``DFATAL`` severity logs a ``FATAL`` error in debug mode (i.e.,
-there is no ``NDEBUG`` macro defined), but avoids halting the program in
-production by automatically reducing the severity to ``ERROR``.
-
-Unless otherwise specified, glog writes to the filename
-``/tmp/\.\.\.log.\.\-\