This commit is contained in:
游雁 2023-07-27 14:37:55 +08:00
parent 89ab5d5a3b
commit e22f256ee6
27 changed files with 595 additions and 650 deletions

View File

@ -71,12 +71,10 @@ Overview
:maxdepth: 1
:caption: Runtime and Service
./funasr/export/README.md
./funasr/runtime/python/onnxruntime/README.md
./funasr/runtime/docs/SDK_tutorial.md
./funasr/runtime/python/websocket/README.md
./funasr/runtime/websocket/readme.md
./funasr/runtime/html5/readme.md
./funasr/runtime/python/libtorch/README.md

View File

@ -0,0 +1 @@
../TEMPLATE/README_zh.md

View File

@ -1,58 +0,0 @@
# ModelScope Model
## How to finetune and infer using a pretrained Paraformer-large Model
### Finetune
- Modify finetune training related parameters in `finetune.py`
- <strong>output_dir:</strong> # result dir
- <strong>data_dir:</strong> # the dataset dir needs to include files: `train/wav.scp`, `train/text`; `validation/wav.scp`, `validation/text`
- <strong>dataset_type:</strong> # for dataset larger than 1000 hours, set as `large`, otherwise set as `small`
- <strong>batch_bins:</strong> # batch size. For dataset_type is `small`, `batch_bins` indicates the feature frames. For dataset_type is `large`, `batch_bins` indicates the duration in ms
- <strong>max_epoch:</strong> # number of training epoch
- <strong>lr:</strong> # learning rate
- Then you can run the pipeline to finetune with:
```python
python finetune.py
```
### Inference
Or you can use the finetuned model for inference directly.
- Setting parameters in `infer.sh`
- <strong>model:</strong> # model name on ModelScope
- <strong>data_dir:</strong> # the dataset dir needs to include `test/wav.scp`. If `test/text` is also exists, CER will be computed
- <strong>output_dir:</strong> # result dir
- <strong>batch_size:</strong> # batchsize of inference
- <strong>gpu_inference:</strong> # whether to perform gpu decoding, set false for cpu decoding
- <strong>gpuid_list:</strong> # set gpus, e.g., gpuid_list="0,1"
- <strong>njob:</strong> # the number of jobs for CPU decoding, if `gpu_inference`=false, use CPU decoding, please set `njob`
- Then you can run the pipeline to infer with:
```python
sh infer.sh
```
- Results
The decoding results can be found in `$output_dir/1best_recog/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.
### Inference using local finetuned model
- Modify inference related parameters in `infer_after_finetune.py`
- <strong>modelscope_model_name: </strong> # model name on ModelScope
- <strong>output_dir:</strong> # result dir
- <strong>data_dir:</strong> # the dataset dir needs to include `test/wav.scp`. If `test/text` is also exists, CER will be computed
- <strong>decoding_model_name:</strong> # set the checkpoint name for decoding, e.g., `valid.cer_ctc.ave.pb`
- <strong>batch_size:</strong> # batchsize of inference
- Then you can run the pipeline to finetune with:
```python
python infer_after_finetune.py
```
- Results
The decoding results can be found in `$output_dir/decoding_results/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.

View File

@ -0,0 +1 @@
../asr/TEMPLATE

View File

@ -1,246 +0,0 @@
# Speech Recognition
> **Note**:
> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take the typic models as examples to demonstrate the usage.
## Inference
### Quick start
#### [Paraformer Model](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
)
rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
print(rec_result)
```
#### [Paraformer-online Model](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary)
```python
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online',
)
import soundfile
speech, sample_rate = soundfile.read("example/asr_example.wav")
param_dict = {"cache": dict(), "is_final": False}
chunk_stride = 7680# 480ms
# first chunk, 480ms
speech_chunk = speech[0:chunk_stride]
rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
print(rec_result)
# next chunk, 480ms
speech_chunk = speech[chunk_stride:chunk_stride+chunk_stride]
rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
print(rec_result)
```
Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/241)
#### [UniASR Model](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary)
There are three decoding mode for UniASR model(`fast`、`normal`、`offline`), for more model detailes, please refer to [docs](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary)
```python
decoding_model = "fast" # "fast"、"normal"、"offline"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825',
param_dict={"decoding_model": decoding_model})
rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
print(rec_result)
```
The decoding mode of `fast` and `normal` is fake streaming, which could be used for evaluating of recognition accuracy.
Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/151)
#### [RNN-T-online model]()
Undo
#### [MFCCA Model](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary)
For more model detailes, please refer to [docs](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary)
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950',
model_revision='v3.0.0'
)
rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav')
print(rec_result)
```
#### API-reference
##### Define pipeline
- `task`: `Tasks.auto_speech_recognition`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU
- `output_dir`: `None` (Default), the output path of results if set
- `batch_size`: `1` (Default), batch size when decoding
##### Infer pipeline
- `audio_in`: the input to decode, which could be:
- wav_path, `e.g.`: asr_example.wav,
- pcm_path, `e.g.`: asr_example.pcm,
- audio bytes stream, `e.g.`: bytes data from a microphone
- audio sample point`e.g.`: `audio, rate = soundfile.read("asr_example_zh.wav")`, the dtype is numpy.ndarray or torch.Tensor
- wav.scp, kaldi style wav list (`wav_id \t wav_path`), `e.g.`:
```text
asr_example1 ./audios/asr_example1.wav
asr_example2 ./audios/asr_example2.wav
```
In this case of `wav.scp` input, `output_dir` must be set to save the output results
- `audio_fs`: audio sampling rate, only set when audio_in is pcm audio
- `output_dir`: None (Default), the output path of results if set
### Inference with multi-thread CPUs or multi GPUs
FunASR also offer recipes [egs_modelscope/asr/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
- Setting parameters in `infer.sh`
- `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
- `data_dir`: the dataset dir needs to include `wav.scp`. If `${data_dir}/text` is also exists, CER will be computed
- `output_dir`: output dir of the recognition results
- `batch_size`: `64` (Default), batch size of inference on gpu
- `gpu_inference`: `true` (Default), whether to perform gpu decoding, set false for CPU inference
- `gpuid_list`: `0,1` (Default), which gpu_ids are used to infer
- `njob`: only used for CPU inference (`gpu_inference`=`false`), `64` (Default), the number of jobs for CPU decoding
- `checkpoint_dir`: only used for infer finetuned models, the path dir of finetuned models
- `checkpoint_name`: only used for infer finetuned models, `valid.cer_ctc.ave.pb` (Default), which checkpoint is used to infer
- `decoding_mode`: `normal` (Default), decoding mode for UniASR model(fast、normal、offline)
- `hotword_txt`: `None` (Default), hotword file for contextual paraformer model(the hotword file name ends with .txt")
- Decode with multi GPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--batch_size 64 \
--gpu_inference true \
--gpuid_list "0,1"
```
- Decode with multi-thread CPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--gpu_inference false \
--njob 64
```
- Results
The decoding results can be found in `$output_dir/1best_recog/text.cer`, which includes recognition results of each sample and the CER metric of the whole test set.
If you decode the SpeechIO test sets, you can use textnorm with `stage`=3, and `DETAILS.txt`, `RESULTS.txt` record the results and CER after text normalization.
## Finetune with pipeline
### Quick start
[finetune.py](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/finetune.py)
```python
import os
from modelscope.metainfo import Trainers
from modelscope.trainers import build_trainer
from modelscope.msdatasets.audio.asr_dataset import ASRDataset
def modelscope_finetune(params):
if not os.path.exists(params.output_dir):
os.makedirs(params.output_dir, exist_ok=True)
# dataset split ["train", "validation"]
ds_dict = ASRDataset.load(params.data_path, namespace='speech_asr')
kwargs = dict(
model=params.model,
data_dir=ds_dict,
dataset_type=params.dataset_type,
work_dir=params.output_dir,
batch_bins=params.batch_bins,
max_epoch=params.max_epoch,
lr=params.lr)
trainer = build_trainer(Trainers.speech_asr_trainer, default_args=kwargs)
trainer.train()
if __name__ == '__main__':
from funasr.utils.modelscope_param import modelscope_args
params = modelscope_args(model="damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch")
params.output_dir = "./checkpoint" # 模型保存路径
params.data_path = "speech_asr_aishell1_trainsets" # 数据路径可以为modelscope中已上传数据也可以是本地数据
params.dataset_type = "small" # 小数据量设置small若数据量大于1000小时请使用large
params.batch_bins = 2000 # batch size如果dataset_type="small"batch_bins单位为fbank特征帧数如果dataset_type="large"batch_bins单位为毫秒
params.max_epoch = 50 # 最大训练轮数
params.lr = 0.00005 # 设置学习率
modelscope_finetune(params)
```
```shell
python finetune.py &> log.txt &
```
### Finetune with your data
- Modify finetune training related parameters in [finetune.py](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/finetune.py)
- `output_dir`: result dir
- `data_dir`: the dataset dir needs to include files: `train/wav.scp`, `train/text`; `validation/wav.scp`, `validation/text`
- `dataset_type`: for dataset larger than 1000 hours, set as `large`, otherwise set as `small`
- `batch_bins`: batch size. For dataset_type is `small`, `batch_bins` indicates the feature frames. For dataset_type is `large`, `batch_bins` indicates the duration in ms
- `max_epoch`: number of training epoch
- `lr`: learning rate
- Training data formats
```sh
cat ./example_data/text
BAC009S0002W0122 而 对 楼 市 成 交 抑 制 作 用 最 大 的 限 购
BAC009S0002W0123 也 成 为 地 方 政 府 的 眼 中 钉
english_example_1 hello world
english_example_2 go swim 去 游 泳
cat ./example_data/wav.scp
BAC009S0002W0122 /mnt/data/wav/train/S0002/BAC009S0002W0122.wav
BAC009S0002W0123 /mnt/data/wav/train/S0002/BAC009S0002W0123.wav
english_example_1 /mnt/data/wav/train/S0002/english_example_1.wav
english_example_2 /mnt/data/wav/train/S0002/english_example_2.wav
```
- Then you can run the pipeline to finetune with:
```shell
python finetune.py
```
If you want finetune with multi-GPUs, you could:
```shell
CUDA_VISIBLE_DEVICES=1,2 python -m torch.distributed.launch --nproc_per_node 2 finetune.py > log.txt 2>&1
```
## Inference with your finetuned model
- Setting parameters in [egs_modelscope/asr/TEMPLATE/infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) is the same with [docs](https://github.com/alibaba-damo-academy/FunASR/tree/main/egs_modelscope/asr/TEMPLATE#inference-with-multi-thread-cpus-or-multi-gpus), `model` is the model name from modelscope, which you finetuned.
- Decode with multi GPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--batch_size 64 \
--gpu_inference true \
--gpuid_list "0,1" \
--checkpoint_dir "./checkpoint" \
--checkpoint_name "valid.cer_ctc.ave.pb"
```
- Decode with multi-thread CPUs:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--gpu_inference false \
--njob 64 \
--checkpoint_dir "./checkpoint" \
--checkpoint_name "valid.cer_ctc.ave.pb"
```

View File

@ -1,3 +1,5 @@
([简体中文](./README_zh.md)|English)
# Punctuation Restoration
> **Note**:

View File

@ -0,0 +1,112 @@
(简体中文|[English](./README.md))
# 标点预测
> **Note**:
> Pipeline 支持在[modelscope模型仓库](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope)中的所有模型进行推理和微调。在这里,我们以 CT-Transformer 模型为例来演示使用方法。
## 推理
### 快速使用
#### [CT-Transformer 模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary)
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.punctuation,
model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
model_revision=None)
rec_result = inference_pipeline(text_in='example/punc_example.txt')
print(rec_result)
```
- text二进制数据例如用户直接从文件里读出bytes数据
```python
rec_result = inference_pipeline(text_in='我们都是木头人不会讲话不会动')
```
- text文件url例如https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt
```python
rec_result = inference_pipeline(text_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_text/punc_example.txt')
```
#### [CT-Transformer 实时模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727/summary)
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.punctuation,
model='damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727',
model_revision=None,
)
inputs = "跨境河流是养育沿岸|人民的生命之源长期以来为帮助下游地区防灾减灾中方技术人员|在上游地区极为恶劣的自然条件下克服巨大困难甚至冒着生命危险|向印方提供汛期水文资料处理紧急事件中方重视印方在跨境河流问题上的关切|愿意进一步完善双方联合工作机制|凡是|中方能做的我们|都会去做而且会做得更好我请印度朋友们放心中国在上游的|任何开发利用都会经过科学|规划和论证兼顾上下游的利益"
vads = inputs.split("|")
rec_result_all="outputs:"
param_dict = {"cache": []}
for vad in vads:
rec_result = inference_pipeline(text_in=vad, param_dict=param_dict)
rec_result_all += rec_result['text']
print(rec_result_all)
```
演示例子完整代码请参考 [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/238)
### API接口说明
#### pipeline定义
- `task`: `Tasks.punctuation`
- `model`: [模型仓库](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope) 中的模型名称,或本地磁盘中的模型路径
- `ngpu`: `1`(默认),使用 GPU 进行推理。如果 ngpu=0则使用 CPU 进行推理
- `ncpu`: `1` (默认),设置用于 CPU 内部操作并行性的线程数
- `output_dir`: `None` (默认),如果设置,输出结果的输出路径
- `model_revision`: `None`默认modelscope中版本版本号
#### pipeline推理
- `text_in`: 需要进行推理的输入,支持一下输入:
- 文本字符,例如:"我们都是木头人不会讲话不会动"
- 文本文件例如example/punc_example.txt。
在使用文本文件 输入时,必须设置 `output_dir` 以保存输出结果。
- `param_dict`: 在实时模式下必要的缓存。
### Inference with multi-thread CPUs or multi GPUs
FunASR 还提供了 [egs_modelscope/punctuation/TEMPLATE/infer.sh](infer.sh) 脚本,以使用多线程 CPU 或多个 GPU 进行解码。
#### `infer.sh` 设置
- `model`: [modelscope模型仓库](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope)中的模型名称,或本地磁盘中的模型路径
- `data_dir`: 数据集目录需要包括 `punc.txt` 文件
- `output_dir`: 识别结果的输出目录
- `batch_size`: `1`(默认),在 GPU 上进行推理的批处理大小
- `gpu_inference`: `true` (默认),是否执行 GPU 解码,如果进行 CPU 推理,则设置为 `false`
- `gpuid_list`: `0,1` (默认),用于推理的 GPU ID
- `njob`: 仅用于 CPU 推理(`gpu_inference=false``64`默认CPU 解码的作业数
#### 使用多个 GPU 进行解码:
```shell
bash infer.sh \
--model "damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--batch_size 1 \
--gpu_inference true \
--gpuid_list "0,1"
```
#### 使用多线程 CPU 进行解码:
```shell
bash infer.sh \
--model "damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--gpu_inference false \
--njob 1
```
## Finetune with pipeline
### Quick start
### Finetune with your data
## Inference with your finetuned model

View File

@ -0,0 +1 @@
../TEMPLATE/README_zh.md

View File

@ -0,0 +1 @@
../TEMPLATE/README_zh.md

View File

@ -0,0 +1 @@
../TEMPLATE/README_zh.md

View File

@ -1,3 +1,5 @@
([简体中文](./README_zh.md)|English)
# Timestamp Prediction (FA)
## Inference

View File

@ -0,0 +1,102 @@
(简体中文|[English](./README.md))
# 时间戳预测
## 推理
### 快速使用
#### [TP-Aligner 模型](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary)
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.speech_timestamp,
model='damo/speech_timestamp_prediction-v1-16k-offline',
model_revision='v1.1.0')
rec_result = inference_pipeline(
audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_timestamps.wav',
text_in='一 个 东 太 平 洋 国 家 为 什 么 跑 到 西 太 平 洋 来 了 呢',)
print(rec_result)
```
Timestamp pipeline can also be used after ASR pipeline to compose complete ASR function, ref to [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/246).
### API接口说明
#### pipeline定义
- `task`: `Tasks.speech_timestamp`
- `model`: [模型仓库](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope) 中的模型名称,或本地磁盘中的模型路径
- `ngpu`: `1`(默认),使用 GPU 进行推理。如果 ngpu=0则使用 CPU 进行推理
- `ncpu`: `1` (默认),设置用于 CPU 内部操作并行性的线程数
- `output_dir`: `None` (默认),如果设置,输出结果的输出路径
- `batch_size`: `1` (默认),解码时的批处理大小
#### Infer pipeline
- `audio_in`: 待预测的输入语音,可以是:
- wav文件路径例如asr_example.wav本地或 URL 上的 wav 文件)
- wav.scpkaldi 风格的 wav 列表 (`wav_id wav_path`),例如:
```text
asr_example1 ./audios/asr_example1.wav
asr_example2 ./audios/asr_example2.wav
```
在使用 `wav.scp` 输入时,必须设置 `output_dir` 以保存输出结果。
- `text_in`: 待预测的输入文本,使用空格分隔,可以是:
- 文本字符串,例如:`今 天 天 气 怎 么 样`
- text.scpkaldi 风格的文本文件(`wav_id transcription`),例如:
```text
asr_example1 今 天 天 气 怎 么 样
asr_example2 欢 迎 体 验 达 摩 院 语 音 识 别 模 型
```
- `audio_fs`: 音频采样率,仅在输入为 PCM 音频时设置
- `output_dir`: 默认为 None如果设置则为结果的输出路径包含
- output_dir/timestamp_prediction/tp_sync带有静音段的以秒为单位的时间戳`wav_id# token1 start_time end_time;`,例如:
```text
test_wav1# <sil> 0.000 0.500;温 0.500 0.680;州 0.680 0.840;化 0.840 1.040;工 1.040 1.280;仓 1.280 1.520;<sil> 1.520 1.680;库 1.680 1.920;<sil> 1.920 2.160;起 2.160 2.380;火 2.380 2.580;殃 2.580 2.760;及 2.760 2.920;附 2.920 3.100;近 3.100 3.340;<sil> 3.340 3.400;河 3.400 3.640;<sil> 3.640 3.700;流 3.700 3.940;<sil> 3.940 4.240;大 4.240 4.400;量 4.400 4.520;死 4.520 4.680;鱼 4.680 4.920;<sil> 4.920 4.940;漂 4.940 5.120;浮 5.120 5.300;河 5.300 5.500;面 5.500 5.900;<sil> 5.900 6.240;
```
- output_dir/timestamp_prediction/tp_time无静音的时间戳列表以毫秒为单位与输入文本长度相同`wav_id# [[start_time, end_time],]`,例如:
```text
test_wav1# [[500, 680], [680, 840], [840, 1040], [1040, 1280], [1280, 1520], [1680, 1920], [2160, 2380], [2380, 2580], [2580, 2760], [2760, 2920], [2920, 3100], [3100, 3340], [3400, 3640], [3700, 3940], [4240, 4400], [4400, 4520], [4520, 4680], [4680, 4920], [4940, 5120], [5120, 5300], [5300, 5500], [5500, 5900]]
```
### 使用多线程 CPU 或多个 GPU 进行推理
FunASR 还提供了 [egs_modelscope/tp/TEMPLATE/infer.sh](infer.sh) 脚本,以使用多线程 CPU 或多个 GPU 进行解码。
#### `infer.sh` 设置
- `model`: [modelscope模型仓库](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope)中的模型名称,或本地磁盘中的模型路径
- `data_dir`: 数据集目录需要包括 `wav.scp` 文件。如果 `${data_dir}/text` 也存在,则将计算 CER
- `output_dir`: 识别结果的输出目录
- `batch_size`: `1`(默认),在 GPU 上进行推理的批处理大小
- `gpu_inference`: `true` (默认),是否执行 GPU 解码,如果进行 CPU 推理,则设置为 `false`
- `gpuid_list`: `0,1` (默认),用于推理的 GPU ID
- `njob`: 仅用于 CPU 推理(`gpu_inference=false``64`默认CPU 解码的作业数
#### 使用多个 GPU 进行解码:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--batch_size 1 \
--gpu_inference true \
--gpuid_list "0,1"
```
#### 使用多线程 CPU 进行解码:
```shell
bash infer.sh \
--model "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--gpu_inference false \
--njob 1
```
## Finetune with pipeline
### Quick start
### Finetune with your data
## Inference with your finetuned model

View File

@ -1,3 +1,5 @@
([简体中文](./README_zh.md)|English)
# Voice Activity Detection
> **Note**:

View File

@ -0,0 +1,113 @@
(简体中文|[English](./README.md))
# 语音端点检测
> **注意**:
> Pipeline 支持在[modelscope模型仓库](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope)中的所有模型进行推理和微调。在这里,我们以 FSMN-VAD 模型为例来演示使用方法。
## 推理
### 快速使用
#### [FSMN-VAD 模型](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary)
```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
inference_pipeline = pipeline(
task=Tasks.voice_activity_detection,
model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
)
segments_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav')
print(segments_result)
```
#### [FSMN-VAD-实时模型](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary)
```python
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
)
import soundfile
speech, sample_rate = soundfile.read("example/asr_example.wav")
param_dict = {"in_cache": dict(), "is_final": False}
chunk_stride = 1600# 100ms
# first chunk, 100ms
speech_chunk = speech[0:chunk_stride]
rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
print(rec_result)
# next chunk, 480ms
speech_chunk = speech[chunk_stride:chunk_stride+chunk_stride]
rec_result = inference_pipeline(audio_in=speech_chunk, param_dict=param_dict)
print(rec_result)
```
演示示例,完整代码请参考 [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/236)
### API接口说明
#### pipeline定义
- `task`: `Tasks.voice_activity_detection`
- `model`: [模型仓库](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope) 中的模型名称,或本地磁盘中的模型路径
- `ngpu`: `1`(默认),使用 GPU 进行推理。如果 ngpu=0则使用 CPU 进行推理
- `ncpu`: `1` (默认),设置用于 CPU 内部操作并行性的线程数
- `output_dir`: `None` (默认),如果设置,输出结果的输出路径
- `batch_size`: `1` (默认),解码时的批处理大小
#### pipeline 推理
- `audio_in`: 要解码的输入,可以是:
- wav文件路径, 例如: asr_example.wav,
- pcm文件路径, 例如: asr_example.pcm,
- 音频字节数流,例如:麦克风的字节数数据
- 音频采样点,例如:`audio, rate = soundfile.read("asr_example_zh.wav")`, 数据类型为 numpy.ndarray 或者 torch.Tensor
- wav.scpkaldi 样式的 wav 列表 (`wav_id \t wav_path`), 例如:
```text
asr_example1 ./audios/asr_example1.wav
asr_example2 ./audios/asr_example2.wav
```
在这种输入 `wav.scp` 的情况下,必须设置 `output_dir` 以保存输出结果
- `audio_fs`: 音频采样率,仅在 audio_in 为 pcm 音频时设置
- `output_dir`: None (默认),如果设置,输出结果的输出路径
### 使用多线程 CPU 或多个 GPU 进行推理
FunASR 还提供了 [egs_modelscope/vad/TEMPLATE/infer.sh](infer.sh) 脚本,以使用多线程 CPU 或多个 GPU 进行解码。
#### `infer.sh` 设置
- `model`: [modelscope模型仓库](https://alibaba-damo-academy.github.io/FunASR/en/model_zoo/modelscope_models.html#pretrained-models-on-modelscope)中的模型名称,或本地磁盘中的模型路径
- `data_dir`: 数据集目录需要包括 `wav.scp` 文件。如果 `${data_dir}/text` 也存在,则将计算 CER
- `output_dir`: 识别结果的输出目录
- `batch_size`: `1`(默认),在 GPU 上进行推理的批处理大小
- `gpu_inference`: `true` (默认),是否执行 GPU 解码,如果进行 CPU 推理,则设置为 `false`
- `gpuid_list`: `0,1` (默认),用于推理的 GPU ID
- `njob`: 仅用于 CPU 推理(`gpu_inference=false``64`默认CPU 解码的作业数
#### 使用多个 GPU 进行解码:
```shell
bash infer.sh \
--model "damo/speech_fsmn_vad_zh-cn-16k-common-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--batch_size 1 \
--gpu_inference true \
--gpuid_list "0,1"
```
#### 使用多线程 CPU 进行解码:
```shell
bash infer.sh \
--model "damo/speech_fsmn_vad_zh-cn-16k-common-pytorch" \
--data_dir "./data/test" \
--output_dir "./results" \
--gpu_inference false \
--njob 64
```
## Finetune with pipeline
### Quick start
### Finetune with your data
## Inference with your finetuned model

View File

@ -0,0 +1 @@
../TEMPLATE/README_zh.md

View File

@ -0,0 +1 @@
../TEMPLATE/README_zh.md

View File

@ -1,328 +1,206 @@
# FunASR File Transcription Service Convenient Deployment Tutorial
([简体中文](./SDK_tutorial_zh.md)|English)
FunASR provides offline file transcription services that can be conveniently deployed on local or cloud servers. The core of the service is based on the open-source runtime-SDK of FunASR. It integrates various related capabilities, such as voice endpoint detection (VAD) and Paraformer-large speech recognition (ASR), as well as punctuation recovery (PUNC), which have been open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. With these capabilities, the service can transcribe audio accurately and efficiently under high concurrency.
# FunASR Offline File Transcription Service Convenient Deployment Tutorial
## Installation and Start Service
FunASR provides an offline file transcription service that can be easily deployed on a local or cloud server. The core is the FunASR open-source runtime-SDK. It integrates various capabilities such as speech endpoint detection (VAD) and Paraformer-large speech recognition (ASR) and punctuation restoration (PUNC) released by the speech laboratory of the Damo Academy in the Modelscope community. It has a complete speech recognition chain and can recognize audio or video of tens of hours into punctuated text. Moreover, it supports transcription for hundreds of simultaneous requests.
Environment Preparation and Configuration[docs](./aliyun_server_tutorial.md)
## Server Configuration
### Downloading Tools and Deployment
Users can choose appropriate server configurations based on their business needs. The recommended configurations are:
- Configuration 1: (X86, computing-type) 4-core vCPU, 8GB memory, and a single machine can support about 32 requests.
- Configuration 2: (X86, computing-type) 16-core vCPU, 32GB memory, and a single machine can support about 64 requests.
- Configuration 3: (X86, computing-type) 64-core vCPU, 128GB memory, and a single machine can support about 200 requests.
Run the following command to perform a one-click deployment of the FunASR runtime-SDK service. Follow the prompts to complete the deployment and running of the service. Currently, only Linux environments are supported, and for other environments, please refer to the Advanced SDK Development Guide ([docs](./SDK_advanced_guide_offline.md)).
Detailed performance [report](./benchmark_onnx_cpp.md)
[//]: # (Due to network restrictions, the download of the funasr-runtime-deploy.sh one-click deployment tool may not proceed smoothly. If the tool has not been downloaded and entered into the one-click deployment tool after several seconds, please terminate it with Ctrl + C and run the following command again.)
Cloud service providers offer a 3-month free trial for new users. Application tutorial ([docs](./aliyun_server_tutorial.md)).
## Quick Start
### Server Startup
`Note`: The one-click deployment tool process includes installing Docker, downloading Docker images, and starting the service. If the user wants to start from the FunASR Docker image, please refer to the development guide ([docs](./SDK_advanced_guide_offline.md).
Download the deployment tool `funasr-runtime-deploy-offline-cpu-zh.sh`
```shell
curl -O https://raw.githubusercontent.com/alibaba-damo-academy/FunASR-APP/main/TransAudio/funasr-runtime-deploy.sh; sudo bash funasr-runtime-deploy.sh install
# For the users in China, you could install with the command:
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy.sh; sudo bash funasr-runtime-deploy.sh install
curl -O https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funasr/runtime/deploy_tools/funasr-runtime-deploy-offline-cpu-en.sh;
# If there is a network problem, users in mainland China can use the following command:
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-offline-cpu-en.sh;
```
#### Details of Configuration
Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_offline.md)).
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace /root/funasr-runtime-resources
```
##### Choosing FunASR Docker Image
### Client Testing and Usage
We recommend selecting the "latest" tag to use our latest image, but you can also choose from our historical versions.
After running the above installation instructions, the client testing tool directory samples will be downloaded in the default installation directory /root/funasr-runtime-resources ([download click](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz)).
We take the Python language client as an example to explain that it supports multiple audio format inputs (such as .wav, .pcm, .mp3, etc.), video inputs (.mp4, etc.), and multiple file list wav.scp inputs. For other client versions, please refer to the [documentation](#Detailed-Description-of-Client-Usage).
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```
## Detailed Description of Client Usage
After completing the FunASR runtime-SDK service deployment on the server, you can test and use the offline file transcription service through the following steps. Currently, the following programming language client versions are supported:
- [Python](#python-client)
- [CPP](#cpp-client)
- [html](#html-client)
- [java](#java-client)
For more client version support, please refer to the [development guide](./SDK_advanced_guide_offline_zh.md).
### python-client
If you want to run the client directly for testing, you can refer to the following simple instructions, using the Python version as an example:
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
```
Command parameter instructions:
```text
--host is the IP address of the FunASR runtime-SDK service deployment machine, which defaults to the local IP address (127.0.0.1). If the client and the service are not on the same server, it needs to be changed to the deployment machine IP address.
--port 10095 deployment port number
--mode offline represents offline file transcription
--audio_in is the audio file that needs to be transcribed, supporting file paths and file list wav.scp
--thread_num sets the number of concurrent sending threads, default is 1
--ssl sets whether to enable SSL certificate verification, default is 1 to enable, and 0 to disable
```
### cpp-client
After entering the samples/cpp directory, you can test it with CPP. The command is as follows:
```shell
./funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path ../audio/asr_example.wav
```
Command parameter description:
```text
--server-ip specifies the IP address of the machine where the FunASR runtime-SDK service is deployed. The default value is the local IP address (127.0.0.1). If the client and the service are not on the same server, the IP address needs to be changed to the IP address of the deployment machine.
--port specifies the deployment port number as 10095.
--wav-path specifies the audio file to be transcribed, and supports file paths.
--thread_num sets the number of concurrent send threads, with a default value of 1.
--ssl sets whether to enable SSL certificate verification, with a default value of 1 for enabling and 0 for disabling.
```
### html-client
To experience it directly, open `html/static/index.html` in your browser. You will see the following page, which supports microphone input and file upload.
<img src="images/html.png" width="900"/>
### java-client
```shell
FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
```
For more details, please refer to the [docs](../java/readme.md)
## Server Usage Details
### Start the deployed FunASR service
If you have restarted the computer or shut down Docker after one-click deployment, you can start the FunASR service directly with the following command. The startup configuration is the same as the last one-click deployment.
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh start
```
### Stop the FunASR service
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh stop
```
### Release the FunASR service
Release the deployed FunASR service.
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh remove
```
### Restart the FunASR service
Restart the FunASR service with the same configuration as the last one-click deployment.
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh restart
```
### Replace the model and restart the FunASR service
Replace the currently used model, and restart the FunASR service. The model must be an ASR/VAD/PUNC model in ModelScope, or a finetuned model from ModelScope.
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--asr_model | --vad_model | --punc_model] <model_id or local model path>
e.g
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update --asr_model damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
```
### Update parameters and restart the FunASR service
Update the configured parameters and restart the FunASR service to take effect. The parameters that can be updated include the host and Docker port numbers, as well as the number of inference and IO threads.
```shell
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--host_port | --docker_port] <port number>
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--decode_thread_num | --io_thread_num] <the number of threads>
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--workspace] <workspace in local>
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update [--ssl] <0: close SSL; 1: open SSL, default:1>
e.g
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update --decode_thread_num 32
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh update --workspace /root/funasr-runtime-resources
```
## Detailed Configuration of Server Startup Process
### Select FunASR Docker image
We recommend choosing to use our latest released image, but you can also choose historical versions.
```text
[1/9]
[1/5]
Getting the list of docker images, please wait a few seconds.
[DONE]
Please choose the Docker image.
1) registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
2) registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0
Enter your choice: 1
You have chosen the Docker image: registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
1) registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0
Enter your choice, default(1):
You have chosen the Docker image: registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.1.0
```
##### Choosing ASR/VAD/PUNC Models
You can choose a model from ModelScope by name, or fill in the name of a model in ModelScope as <model_name>. The model will be automatically downloaded during Docker runtime. You can also select <model_path> to fill in the local model path on the host machine.
### Set the port provided by the host for FunASR
Set the host port provided to Docker, which is 10095 by default. Please make sure that this port is available.
```text
[2/9]
Please input [Y/n] to confirm whether to automatically download model_id in ModelScope or use a local model.
[y] With the model in ModelScope, the model will be automatically downloaded to Docker(/workspace/models).
If you select both the local model and the model in ModelScope, select [y].
[n] Use the models on the localhost, the directory where the model is located will be mapped to Docker.
Setting confirmation[Y/n]:
You have chosen to use the model in ModelScope, please set the model ID in the next steps, and the model will be automatically downloaded in (/workspace/models) during the run.
Please enter the local path to download models, the corresponding path in Docker is /workspace/models.
Setting the local path to download models, default(/root/models):
The local path(/root/models) set will store models during the run.
[2.1/9]
Please select ASR model_id in ModelScope from the list below.
1) damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx
2) model_name
3) model_path
Enter your choice: 1
The model ID is damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx
The model dir in Docker is /workspace/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx
[2.2/9]
Please select VAD model_id in ModelScope from the list below.
1) damo/speech_fsmn_vad_zh-cn-16k-common-onnx
2) model_name
3) model_path
Enter your choice: 1
The model ID is damo/speech_fsmn_vad_zh-cn-16k-common-onnx
The model dir in Docker is /workspace/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx
[2.3/9]
Please select PUNC model_id in ModelScope from the list below.
1) damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
2) model_name
3) model_path
Enter your choice: 1
The model ID is damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
The model dir in Docker is /workspace/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
```
##### Enter the executable path of the FunASR service on the host machine
Enter the host path of the executable of the FunASR service. It will be automatically mounted and run in Docker at runtime. If left blank, the default path in Docker will be set to /workspace/FunASR/funasr/runtime/websocket/build/bin/funasr-wss-server.
```text
[3/9]
Please enter the path to the excutor of the FunASR service on the localhost.
If not set, the default /workspace/FunASR/funasr/runtime/websocket/build/bin/funasr-wss-server in Docker is used.
Setting the path to the excutor of the FunASR service on the localhost:
Corresponding, the path of FunASR in Docker is /workspace/FunASR/funasr/runtime/websocket/build/bin/funasr-wss-server
```
##### Setting the port on the host machine for FunASR
Setting the port on the host machine for Docker. The default port is 10095. Please ensure that this port is available.
```text
[4/9]
[2/5]
Please input the opened port in the host used for FunASR server.
Default: 10095
Setting the opened host port [1-65535]:
Setting the opened host port [1-65535], default(10095):
The port of the host is 10095
The port in Docker for FunASR server is 10095
```
### Set SSL
##### Setting the number of inference threads for the FunASR service
Setting the number of inference threads for the FunASR service. The default value is the number of cores on the host machine. The number of I/O threads for the service will also be automatically set to one-quarter of the number of inference threads.
```text
[5/9]
Please input thread number for FunASR decoder.
Default: 1
Setting the number of decoder thread:
The number of decoder threads is 1
The number of IO threads is 1
```
##### Displaying all set parameters for confirmation
Displaying the parameters set in the previous 6 steps. Confirming will save all parameters to /var/funasr/config and start Docker. Otherwise, users will be prompted to reset the parameters.
```text
[6/9]
Show parameters of FunASR server setting and confirm to run ...
The current Docker image is : registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest
The model is downloaded or stored to this directory in local : /root/models
The model will be automatically downloaded to the directory : /workspace/models
The ASR model_id used : damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx
The ASR model directory corresponds to the directory in Docker : /workspace/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx
The VAD model_id used : damo/speech_fsmn_vad_zh-cn-16k-common-onnx
The VAD model directory corresponds to the directory in Docker : /workspace/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx
The PUNC model_id used : damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
The PUNC model directory corresponds to the directory in Docker: /workspace/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx
The path in the docker of the FunASR service executor : /workspace/FunASR/funasr/runtime/websocket/build/bin/funasr-wss-server
Set the host port used for use by the FunASR service : 10095
Set the docker port used by the FunASR service : 10095
Set the number of threads used for decoding the FunASR service : 1
Set the number of threads used for IO the FunASR service : 1
Please input [Y/n] to confirm the parameters.
[y] Verify that these parameters are correct and that the service will run.
[n] The parameters set are incorrect, it will be rolled out, please rerun.
read confirmation[Y/n]:
Will run FunASR server later ...
Parameters are stored in the file /var/funasr/config
```
##### Checking the Docker service
Checking if Docker service is installed on the host machine. If not installed, installing and starting Docker
```text
[7/9]
Start install docker for ubuntu
Get docker installer: curl -fsSL https://test.docker.com -o test-docker.sh
Get docker run: sudo sh test-docker.sh
# Executing docker install script, commit: c2de0811708b6d9015ed1a2c80f02c9b70c8ce7b
+ sh -c apt-get update -qq >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null
+ sh -c install -m 0755 -d /etc/apt/keyrings
+ sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" | gpg --dearmor --yes -o /etc/apt/keyrings/docker.gpg
+ sh -c chmod a+r /etc/apt/keyrings/docker.gpg
+ sh -c echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu focal test" > /etc/apt/sources.list.d/docker.list
+ sh -c apt-get update -qq >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq docker-ce docker-ce-cli containerd.io docker-compose-plugin docker-ce-rootless-extras docker-buildx-plugin >/dev/null
+ sh -c docker version
Client: Docker Engine - Community
Version: 24.0.2
...
...
Docker install success, start docker server.
```
##### Downloading the FunASR Docker image
Downloading and updating the FunASR Docker image selected in step 1.1
```text
[8/9]
Pull docker image(registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-latest)...
funasr-runtime-cpu-0.0.1: Pulling from funasr_repo/funasr
7608715873ec: Pull complete
3e1014c56f38: Pull complete
...
...
```
##### Starting the FunASR Docker
Starting the FunASR Docker and waiting for the model selected in step 1.2 to finish downloading and start the FunASR service
```text
[9/9]
Construct command and run docker ...
943d8f02b4e5011b71953a0f6c1c1b9bc5aff63e5a96e7406c83e80943b23474
Loading models:
[ASR ][Done ][==================================================][100%][1.10MB/s][v1.2.1]
[VAD ][Done ][==================================================][100%][7.26MB/s][v1.2.0]
[PUNC][Done ][==================================================][100%][ 474kB/s][v1.1.7]
The service has been started.
If you want to see an example of how to use the client, you can run sudo bash funasr-runtime-deploy.sh -c .
```
#### Starting the deployed FunASR service
If the computer is restarted or Docker is closed after one-click deployment, the following command can be used to start the FunASR service directly with the settings from the last one-click deployment.
SSL verification is enabled by default. If you need to disable it, you can set it when starting.
```shell
sudo bash funasr-runtime-deploy.sh start
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh --ssl 0
```
#### Shutting down the FunASR service
## Contact Us
```shell
sudo bash funasr-runtime-deploy.sh stop
```
If you encounter any problems during use, please join our user group for feedback.
| DingDing Group | Wechat |
|:----------------------------------------------------------------------------:|:--------------------------------------------------------------:|
| <div align="left"><img src="../../../docs/images/dingding.jpg" width="250"/> | <img src="../../../docs/images/wechat.png" width="232"/></div> |
#### Restarting the FunASR service
Restarting the FunASR service with the settings from the last one-click deployment
```shell
sudo bash funasr-runtime-deploy.sh restart
```
#### Replacing the model and restarting the FunASR service
Replacing the currently used model and restarting the FunASR service. The model must be an ASR/VAD/PUNC model from ModelScope.
```shell
sudo bash scripts/funasr-runtime-deploy.sh update model <model ID in ModelScope>
e.g
sudo bash scripts/funasr-runtime-deploy.sh update model damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
```
### How to test and use the offline file transcription service
After completing the FunASR service deployment on the server, you can test and use the offline file transcription service by following these steps. Currently, command line running is supported for Python, C++, and Java client versions, as well as an HTML web page version that can be directly experienced in the browser. For more client language support, please refer to the "FunASR Advanced Development Guide" documentation.
After the funasr-runtime-deploy.sh script finishes running, you can use the following command to automatically download the test samples to the funasr_samples directory in the current directory and run the program with the set parameters in an interactive manner:
```shell
sudo bash funasr-runtime-deploy.sh client
```
You can choose from the provided Python and Linux C++ sample programs. Taking the Python sample as an example:
```text
Will download sample tools for the client to show how speech recognition works.
Please select the client you want to run.
1) Python
2) Linux_Cpp
Enter your choice: 1
Please enter the IP of server, default(127.0.0.1):
Please enter the port of server, default(10095):
Please enter the audio path, default(/root/funasr_samples/audio/asr_example.wav):
Run pip3 install click>=8.0.4
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Requirement already satisfied: click>=8.0.4 in /usr/local/lib/python3.8/dist-packages (8.1.3)
Run pip3 install -r /root/funasr_samples/python/requirements_client.txt
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Requirement already satisfied: websockets in /usr/local/lib/python3.8/dist-packages (from -r /root/funasr_samples/python/requirements_client.txt (line 1)) (11.0.3)
Run python3 /root/funasr_samples/python/funasr_wss_client.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python
...
...
pid0_0: 欢迎大家来体验达摩院推出的语音识别模型。
Exception: sent 1000 (OK); then received 1000 (OK)
end
If failed, you can try (python3 /root/funasr_samples/python/funasr_wss_client.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /root/funasr_samples/audio/asr_example.wav --send_without_sleep --output_dir ./funasr_samples/python) in your Shell.
```
#### python-client
If you want to directly run the client for testing, you can refer to the following simple instructions, taking the Python version as an example:
```shell
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav" --send_without_sleep --output_dir "./results"
```
Command parameter instructions:
```text
--host: The IP address of the machine where the FunASR runtime-SDK service is deployed. The default is the local IP address (127.0.0.1). If the client and service are not on the same server, the IP address should be changed to that of the deployment machine.
--port 10095: The deployment port number.
--mode offline: Indicates offline file transcription.
--audio_in: The audio file(s) to be transcribed, which can be a file path or a file list (wav.scp).
--output_dir: The path to save the recognition results.
```
#### cpp-client
```shell
export LD_LIBRARY_PATH=/root/funasr_samples/cpp/libs:$LD_LIBRARY_PATH
/root/funasr_samples/cpp/funasr-wss-client --server-ip 127.0.0.1 --port 10095 --wav-path /root/funasr_samples/audio/asr_example.wav
```
Command parameter instructions:
```text
--server-ip: The IP address of the machine where the FunASR runtime-SDK service is deployed. The default is the local IP address (127.0.0.1). If the client and service are not on the same server, the IP address should be changed to that of the deployment machine.
--port 10095: The deployment port number.
--wav-path: The audio file(s) to be transcribed, which can be a file path.
```
### Video demo
[demo]()

View File

@ -1,3 +1,5 @@
(简体中文|[English](./SDK_tutorial.md))
# FunASR离线文件转写服务便捷部署教程
FunASR提供可便捷本地或者云端服务器部署的离线文件转写服务内核为FunASR已开源runtime-SDK。

View File

@ -1,72 +1,93 @@
([简体中文](./readme_zh.md)|English)
# Html5 server for asr service
# Speech Recognition Service Html5 Client Access Interface
The server deployment uses the websocket protocol. The client can support html5 webpage access and microphone input or file input. There are two ways to access the service:
- Method 1:
Directly connect to the html client, manually download the client ([click here](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/html5/static)) to the local computer, and open the index.html webpage to enter the wss address and port number.
- Method 2:
Html5 server, automatically download the client to the local computer, and support access by mobile phones and other devices.
## Starting Speech Recognition Service
Support the deployment of Python and C++ versions, where
- Python version
Directly deploy the Python pipeline, support streaming real-time speech recognition models, offline speech recognition models, streaming offline integrated error correction models, and output text with punctuation marks. Single server, supporting a single client.
- C++ version
funasr-runtime-sdk, supports one-key deployment, version 0.1.0, supports offline file transcription. Single server, supporting requests from hundreds of clients.
### Starting Python Version Service
#### Install Dependencies
## Requirement
#### Install the modelscope and funasr
```shell
pip install -U modelscope funasr
# For the users in China, you could install with the command:
# pip install -U modelscope funasr -i https://mirror.sjtu.edu.cn/pypi/web/simple
pip3 install -U modelscope funasr flask
# Users in mainland China, if encountering network issues, can install with the following command:
# pip3 install -U modelscope funasr -i https://mirror.sjtu.edu.cn/pypi/web/simple
git clone https://github.com/alibaba/FunASR.git && cd FunASR
```
#### Install the requirements for server
```shell
pip install flask
# pip install gevent (Optional)
# pip install pyOpenSSL (Optional)
```
### javascript (Optional)
[html5 recorder.js](https://github.com/xiangyuecn/Recorder)
```shell
Recorder
```
#### Start ASR Service
## demo
<div align="center"><img src="./demo.gif" width="150"/> </div>
## Steps
### Html5 demo
#### wss Method
```shell
usage: h5Server.py [-h] [--host HOST] [--port PORT] [--certfile CERTFILE] [--keyfile KEYFILE]
```
`e.g.`
```shell
cd funasr/runtime/html5
python h5Server.py --host 0.0.0.0 --port 1337
```
### asr service
[detail for asr](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket)
`Tips:` asr service and html5 service should be deployed on the same device.
```shell
cd ../python/websocket
cd funasr/runtime/python/websocket
python funasr_wss_server.py --port 10095
```
For detailed parameter configuration and analysis, please click [here](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/websocket).
#### Html5 Service (Optional)
If you need to use the client method mentioned above to access it, you can start the html5 service
```shell
h5Server.py [-h] [--host HOST] [--port PORT] [--certfile CERTFILE] [--keyfile KEYFILE]
```
As shown in the example below, pay attention to the IP address. If accessing from another device (such as a mobile phone), you need to set the IP address to the real public IP address.
```shell
cd funasr/runtime/html5
python h5Server.py --host 0.0.0.0 --port 1337
```
After starting, enter ([https://127.0.0.1:1337/static/index.html](https://127.0.0.1:1337/static/index.html)) in the browser to access it.
### Starting C++ Version Service
Since there are many dependencies for C++, it is recommended to deploy it using docker, which supports one-key start of the service.
```shell
curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-offline-cpu-zh.sh;
sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace /root/funasr-runtime-resources
```
For detailed parameter configuration and analysis, please click [here](https://github.com/alibaba-damo-academy/FunASR/blob/main/funasr/runtime/docs/SDK_tutorial_zh.md).
## Client Testing
### Method 1
Directly connect to the html client, manually download the client ([click here](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/html5/static)) to the local computer, and open the index.html webpage, enter the wss address and port number to use.
### Method 2
Html5 server, automatically download the client to the local computer, and support access by mobile phones and other devices. The IP address needs to be consistent with the html5 server. If it is a local computer, you can use 127.0.0.1.
### open browser to access html5 demo
```shell
https://127.0.0.1:1337/static/index.html
# https://30.220.136.139:1337/static/index.html
```
### open browser to open html5 file directly without h5Server
you can run html5 client by just clicking the index.html file directly in your computer.
1) lauch asr service without ssl, it must be in ws mode as ssl protocol will prohibit such access.
2) copy whole directory /funasr/runtime/html5/static to your computer
3) open /funasr/runtime/html5/static/index.html by browser
4) enter asr service ws address and connect
```shell
```
Enter the wss address and port number to use.
## Acknowledge
1. This project is maintained by [FunASR community](https://github.com/alibaba-damo-academy/FunASR).
2. We acknowledge [AiHealthx](http://www.aihealthx.com/) for contributing the html5 demo.
2. We acknowledge [AiHealthx](http://www.aihealthx.com/) for contributing the html5 demo.