diff --git a/README.md b/README.md
index 1e3707c5d..499bb4015 100644
--- a/README.md
+++ b/README.md
@@ -55,16 +55,16 @@ FunASR has open-sourced a large number of pre-trained models on industrial data.
(Note: 🤗 represents the Huggingface model zoo link, ⭐ represents the ModelScope model zoo link)
-| Model Name | Task Details | Training Data | Parameters |
-|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:--------------------------------:|:----------:|
-| paraformer-zh
([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) | speech recognition, with timestamps, non-streaming | 60000 hours, Mandarin | 220M |
-| paraformer-zh-spk
( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) [🤗]() ) | speech recognition with speaker diarization, with timestamps, non-streaming | 60000 hours, Mandarin | 220M |
-| paraformer-zh-online
( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) | speech recognition, streaming | 60000 hours, Mandarin | 220M |
-| paraformer-en
( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) | speech recognition, with timestamps, non-streaming | 50000 hours, English | 220M |
-| conformer-en
( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() ) | speech recognition, non-streaming | 50000 hours, English | 220M |
-| ct-punc
( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() ) | punctuation restoration | 100M, Mandarin and English | 1.1G |
-| fsmn-vad
( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() ) | voice activity detection | 5000 hours, Mandarin and English | 0.4M |
-| fa-zh
( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() ) | timestamp prediction | 5000 hours, Mandarin | 38M |
+| Model Name | Task Details | Training Data | Parameters |
+|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------:|:--------------------------------:|:----------:|
+| paraformer-zh
([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) | speech recognition, with timestamps, non-streaming | 60000 hours, Mandarin | 220M |
+| paraformer-zh-online
( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) | speech recognition, streaming | 60000 hours, Mandarin | 220M |
+| paraformer-en
( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) | speech recognition, with timestamps, non-streaming | 50000 hours, English | 220M |
+| conformer-en
( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() ) | speech recognition, non-streaming | 50000 hours, English | 220M |
+| ct-punc
( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() ) | punctuation restoration | 100M, Mandarin and English | 1.1G |
+| fsmn-vad
( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() ) | voice activity detection | 5000 hours, Mandarin and English | 0.4M |
+| fa-zh
( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() ) | timestamp prediction | 5000 hours, Mandarin | 38M |
+| cam++
( [⭐](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗]() ) | speaker verification/diarization | 5000 hours | 7.2M |
diff --git a/README_zh.md b/README_zh.md
index 552a5a0b2..9d7c151d1 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -57,16 +57,16 @@ FunASR开源了大量在工业数据上预训练模型,您可以在[模型许
(注:[🤗]()表示Huggingface模型仓库链接,[⭐]()表示ModelScope模型仓库链接)
-| 模型名字 | 任务详情 | 训练数据 | 参数量 |
+| 模型名字 | 任务详情 | 训练数据 | 参数量 |
|:------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
| paraformer-zh
([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗]() ) | 语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
-| paraformer-zh-spk
( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) [🤗]() ) | 分角色语音识别,带时间戳输出,非实时 | 60000小时,中文 | 220M |
-| paraformer-zh-streaming
( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) | 语音识别,实时 | 60000小时,中文 | 220M |
-| paraformer-en
( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
-| conformer-en
( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
-| ct-punc
( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() ) | 标点恢复 | 100M,中文与英文 | 1.1G |
-| fsmn-vad
( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() ) | 语音端点检测,实时 | 5000小时,中文与英文 | 0.4M |
-| fa-zh
( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() ) | 字级别时间戳预测 | 50000小时,中文 | 38M |
+| paraformer-zh-streaming
( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() ) | 语音识别,实时 | 60000小时,中文 | 220M |
+| paraformer-en
( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
+| conformer-en
( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() ) | 语音识别,非实时 | 50000小时,英文 | 220M |
+| ct-punc
( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() ) | 标点恢复 | 100M,中文与英文 | 1.1G |
+| fsmn-vad
( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() ) | 语音端点检测,实时 | 5000小时,中文与英文 | 0.4M |
+| fa-zh
( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() ) | 字级别时间戳预测 | 50000小时,中文 | 38M |
+| cam++
( [⭐](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗]() ) | 说话人确认/分割 | 5000小时 | 7.2M |
diff --git a/examples/industrial_data_pretraining/uniasr/demo.py b/examples/industrial_data_pretraining/uniasr/demo.py
index 125902174..6dcd557f8 100644
--- a/examples/industrial_data_pretraining/uniasr/demo.py
+++ b/examples/industrial_data_pretraining/uniasr/demo.py
@@ -5,11 +5,7 @@
from funasr import AutoModel
-model = AutoModel(model="/Users/zhifu/Downloads/modelscope_models/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online", model_revision="v2.0.4",
- # vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
- # vad_model_revision="v2.0.4",
- # punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
- # punc_model_revision="v2.0.4",
+model = AutoModel(model="iic/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline", model_revision="v2.0.4",
)
res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
diff --git a/funasr/auto/auto_model.py b/funasr/auto/auto_model.py
index 5f89dd671..5bfc08040 100644
--- a/funasr/auto/auto_model.py
+++ b/funasr/auto/auto_model.py
@@ -224,7 +224,7 @@ class AutoModel:
asr_result_list = []
num_samples = len(data_list)
disable_pbar = kwargs.get("disable_pbar", False)
- pbar = tqdm(colour="blue", total=num_samples+1, dynamic_ncols=True) if not disable_pbar else None
+ pbar = tqdm(colour="blue", total=num_samples, dynamic_ncols=True) if not disable_pbar else None
time_speech_total = 0.0
time_escape_total = 0.0
for beg_idx in range(0, num_samples, batch_size):
@@ -350,6 +350,7 @@ class AutoModel:
end_asr_total = time.time()
time_escape_total_per_sample = end_asr_total - beg_asr_total
+ pbar_sample.update(1)
pbar_sample.set_description(f"rtf_avg_per_sample: {time_escape_total_per_sample / time_speech_total_per_sample:0.3f}, "
f"time_speech_total_per_sample: {time_speech_total_per_sample: 0.3f}, "
f"time_escape_total_per_sample: {time_escape_total_per_sample:0.3f}")
diff --git a/funasr/models/uniasr/template.yaml b/funasr/models/uniasr/template.yaml
index f4815c131..35c6b2eb2 100644
--- a/funasr/models/uniasr/template.yaml
+++ b/funasr/models/uniasr/template.yaml
@@ -18,6 +18,7 @@ model_conf:
decoder_attention_chunk_type2: chunk
loss_weight_model1: 0.5
+
# encoder
encoder: SANMEncoderChunkOpt
encoder_conf:
@@ -34,11 +35,21 @@ encoder_conf:
kernel_size: 11
sanm_shfit: 0
selfattention_layer_type: sanm
- chunk_size: [20, 60]
- stride: [10, 40]
- pad_left: [5, 10]
- encoder_att_look_back_factor: [0, 0]
- decoder_att_look_back_factor: [0, 0]
+ chunk_size:
+ - 20
+ - 60
+ stride:
+ - 10
+ - 40
+ pad_left:
+ - 5
+ - 10
+ encoder_att_look_back_factor:
+ - 0
+ - 0
+ decoder_att_look_back_factor:
+ - 0
+ - 0
# decoder
decoder: FsmnDecoderSCAMAOpt
@@ -55,6 +66,7 @@ decoder_conf:
kernel_size: 11
concat_embeds: true
+# predictor
predictor: CifPredictorV2
predictor_conf:
idim: 320
@@ -62,6 +74,8 @@ predictor_conf:
l_order: 1
r_order: 1
+
+# encoder2
encoder2: SANMEncoderChunkOpt
encoder2_conf:
output_size: 320
@@ -77,12 +91,23 @@ encoder2_conf:
kernel_size: 21
sanm_shfit: 0
selfattention_layer_type: sanm
- chunk_size: [45, 70]
- stride: [35, 50]
- pad_left: [5, 10]
- encoder_att_look_back_factor: [0, 0]
- decoder_att_look_back_factor: [0, 0]
+ chunk_size:
+ - 45
+ - 70
+ stride:
+ - 35
+ - 50
+ pad_left:
+ - 5
+ - 10
+ encoder_att_look_back_factor:
+ - 0
+ - 0
+ decoder_att_look_back_factor:
+ - 0
+ - 0
+# decoder
decoder2: FsmnDecoderSCAMAOpt
decoder2_conf:
attention_dim: 320
@@ -108,10 +133,12 @@ stride_conv: stride_conv1d
stride_conv_conf:
kernel_size: 2
stride: 2
- pad: [0, 1]
+ pad:
+ - 0
+ - 1
# frontend related
-frontend: WavFrontendOnline
+frontend: WavFrontend
frontend_conf:
fs: 16000
window: hamming
@@ -120,6 +147,7 @@ frontend_conf:
frame_shift: 10
lfr_m: 7
lfr_n: 6
+ dither: 0.0
specaug: SpecAugLFR
specaug_conf:
diff --git a/model_zoo/modelscope_models_zh.md b/model_zoo/modelscope_models_zh.md
index 88fa23e76..1f501a36b 100644
--- a/model_zoo/modelscope_models_zh.md
+++ b/model_zoo/modelscope_models_zh.md
@@ -33,26 +33,26 @@
#### UniASR模型
-| 模型名字 | 语言 | 训练数据 | Vocab Size | Parameter | 非实时/实时 | 备注 |
-|:-------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
-| [UniASR](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-实时/summary) | 中文和英文 | 阿里巴巴语音数据 (60000 小时) | 8358 | 100M | 实时 | 流式离线一体化模型 |
-| [UniASR-large](https://modelscope.cn/models/damo/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-非实时/summary) | 中文和英文 | 阿里巴巴语音数据 (60000 小时) | 8358 | 220M | 非实时 | 流式离线一体化模型 |
-| [UniASR English](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-实时/summary) | 英文 | 阿里巴巴语音数据 (10000 小时) | 1080 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Russian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-实时/summary) | 俄语 | 阿里巴巴语音数据 (5000 小时) | 1664 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Japanese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-实时/summary) | 日语 | 阿里巴巴语音数据 (5000 小时) | 5977 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Korean](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-实时/summary) | 韩语 | 阿里巴巴语音数据 (2000 小时) | 6400 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Cantonese (CHS)](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-实时/summary) | 粤语(简体中文) | 阿里巴巴语音数据 (5000 小时) | 1468 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Indonesian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-实时/summary) | 印尼语 | 阿里巴巴语音数据 (1000 小时) | 1067 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Vietnamese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-实时/summary) | 越南语 | 阿里巴巴语音数据 (1000 小时) | 1001 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Spanish](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-实时/summary) | 西班牙语 | 阿里巴巴语音数据 (1000 小时) | 3445 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Portuguese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-实时/summary) | 葡萄牙语 | 阿里巴巴语音数据 (1000 小时) | 1617 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR French](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-实时/summary) | 法语 | 阿里巴巴语音数据 (1000 小时) | 3472 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR German](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-实时/summary) | 德语 | 阿里巴巴语音数据 (1000 小时) | 3690 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Persian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-实时/summary) | 波斯语 | 阿里巴巴语音数据 (1000 小时) | 1257 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Burmese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/summary) | 缅甸语 | 阿里巴巴语音数据 (1000 小时) | 696 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Hebrew](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/summary) | 希伯来语 | 阿里巴巴语音数据 (1000 小时) | 1085 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Urdu](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/summary) | 乌尔都语 | 阿里巴巴语音数据 (1000 小时) | 877 | 95M | 实时 | 流式离线一体化模型 |
-| [UniASR Turkish](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-tr-16k-common-vocab1582-pytorch/summary) | 土耳其语 | 阿里巴巴语音数据 (1000 小时) | 1582 | 95M | 实时 | 流式离线一体化模型 |
+| 模型名字 | 语言 | 训练数据 | Vocab Size | Parameter | 非实时/实时 | 备注 |
+|:---------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
+| [UniASR](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/summary) | 中文和英文 | 阿里巴巴语音数据 (60000 小时) | 8358 | 100M | 实时 | 流式离线一体化模型 |
+| [UniASR-large](https://modelscope.cn/models/damo/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/summary) | 中文和英文 | 阿里巴巴语音数据 (60000 小时) | 8358 | 220M | 非实时 | 流式离线一体化模型 |
+| [UniASR English](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-online/summary) | 英文 | 阿里巴巴语音数据 (10000 小时) | 1080 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Russian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-online/summary) | 俄语 | 阿里巴巴语音数据 (5000 小时) | 1664 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Japanese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-online/summary) | 日语 | 阿里巴巴语音数据 (5000 小时) | 5977 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Korean](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-online/summary) | 韩语 | 阿里巴巴语音数据 (2000 小时) | 6400 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Cantonese (CHS)](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/summary) | 粤语(简体中文) | 阿里巴巴语音数据 (5000 小时) | 1468 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Indonesian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online/summary) | 印尼语 | 阿里巴巴语音数据 (1000 小时) | 1067 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Vietnamese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/summary) | 越南语 | 阿里巴巴语音数据 (1000 小时) | 1001 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Spanish](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-online/summary) | 西班牙语 | 阿里巴巴语音数据 (1000 小时) | 3445 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Portuguese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-online/summary) | 葡萄牙语 | 阿里巴巴语音数据 (1000 小时) | 1617 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR French](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/summary) | 法语 | 阿里巴巴语音数据 (1000 小时) | 3472 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR German](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/summary) | 德语 | 阿里巴巴语音数据 (1000 小时) | 3690 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Persian](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/summary) | 波斯语 | 阿里巴巴语音数据 (1000 小时) | 1257 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Burmese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/summary) | 缅甸语 | 阿里巴巴语音数据 (1000 小时) | 696 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Hebrew](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/summary) | 希伯来语 | 阿里巴巴语音数据 (1000 小时) | 1085 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Urdu](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/summary) | 乌尔都语 | 阿里巴巴语音数据 (1000 小时) | 877 | 95M | 实时 | 流式离线一体化模型 |
+| [UniASR Turkish](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-tr-16k-common-vocab1582-pytorch/summary) | 土耳其语 | 阿里巴巴语音数据 (1000 小时) | 1582 | 95M | 实时 | 流式离线一体化模型 |
#### Conformer模型
@@ -115,7 +115,7 @@
| 模型名字 | 语言 | 训练数据 | 模型参数 | 备注 |
|:--------------------------------------------------------------------------------------------------:|:--------------:|:-------------------:|:----------:|:---------|
-| [TP-Aligner](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-非实时/summary) |中文| 阿里巴巴语音数据 (50000hours) | 37.8M | 时间戳模型,中文 |
+| [TP-Aligner](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) |中文| 阿里巴巴语音数据 (50000hours) | 37.8M | 时间戳模型,中文 |
### 逆文本正则化