Merge branch 'main' of https://github.com/alibaba-damo-academy/FunASR into main

2025-09-15 14:48:36 +08:00 · 2023-11-09 11:03:45 +08:00 · 2023-11-09 11:03:45 +08:00 · 6e26ad0e14
commit 6e26ad0e14
parent 71f95f256b c3e2ca4b73
36 changed files with 342 additions and 321 deletions
--- a/README.md
+++ b/README.md
@ -9,31 +9,32 @@
    <a href=""><img src="https://img.shields.io/badge/Pytorch-%3E%3D1.11-blue"></a>
 </p>

-<strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun！
+<strong>FunASR</strong> hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model, researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun！

 [**Highlights**](#highlights)
 | [**News**](https://github.com/alibaba-damo-academy/FunASR#whats-new) 
 | [**Installation**](#installation)
 | [**Quick Start**](#quick-start)
-| [**Runtime**](./funasr/runtime/readme.md)
-| [**Model Zoo**](./docs/model_zoo/modelscope_models.md)
+| [**Runtime**](./runtime/readme.md)
+| [**Model Zoo**](#model-zoo)
 | [**Contact**](#contact)


 <a name="highlights"></a>
 ## Highlights
 - FunASR is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR. FunASR provides convenient scripts and tutorials, supporting inference and fine-tuning of pre-trained models.
- We have released a vast collection of academic and industrial pretrained models on the [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition), which can be accessed through our [Model Zoo](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md). The representative [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), a non-autoregressive end-to-end speech recognition model, has the advantages of high accuracy, high efficiency, and convenient deployment, supporting the rapid construction of speech recognition services. For more details on service deployment, please refer to the [service deployment document](funasr/runtime/readme_cn.md). 
+- We have released a vast collection of academic and industrial pretrained models on the [ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition) and [huggingface](https://huggingface.co/FunASR), which can be accessed through our [Model Zoo](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md). The representative [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), a non-autoregressive end-to-end speech recognition model, has the advantages of high accuracy, high efficiency, and convenient deployment, supporting the rapid construction of speech recognition services. For more details on service deployment, please refer to the [service deployment document](runtime/readme_cn.md). 


 <a name="whats-new"></a>
 ## What's new: 
- 2023/10/17: The offline file transcription service (CPU) of English has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial_en.md)).
+- 2023/11/08: The offline file transcription service 3.0 (CPU) of Mandarin has been released, adding punctuation large model, Ngram language model, and wfst hot words. For detailed information, please refer to [docs](runtime#file-transcription-service-mandarin-cpu). 
+- 2023/10/17: The offline file transcription service (CPU) of English has been released. For more details, please refer to ([docs](runtime#file-transcription-service-english-cpu)).
 - 2023/10/13: [SlideSpeech](https://slidespeech.github.io/): A large scale multi-modal audio-visual corpus with a significant amount of real-time synchronized slides.
 - 2023/10/10: The ASR-SpeakersDiarization combined pipeline [Paraformer-VAD-SPK](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py) is now released. Experience the model to get recognition results with speaker information.
 - 2023/10/07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec.
- 2023/09/01: The offline file transcription service 2.0 (CPU) of Mandarin has been released, with added support for ffmpeg, timestamp, and hotword models. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial.md)).
- 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to ([Deployment documentation](funasr/runtime/docs/SDK_tutorial_online.md)).
+- 2023/09/01: The offline file transcription service 2.0 (CPU) of Mandarin has been released, with added support for ffmpeg, timestamp, and hotword models. For more details, please refer to ([docs](runtime#file-transcription-service-mandarin-cpu)).
+- 2023/08/07: The real-time transcription service (CPU) of Mandarin has been released. For more details, please refer to ([docs](runtime#the-real-time-transcription-service-mandarin-cpu)).
 - 2023/07/17: BAT is released, which is a low-latency and low-memory-consumption RNN-T model. For more details, please refer to ([BAT](egs/aishell/bat)).
 - 2023/06/26: ASRU2023 Multi-Channel Multi-Party Meeting Transcription Challenge 2.0 completed the competition and announced the results. For more details, please refer to ([M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2/index.html)).

@ -43,19 +44,89 @@

 Please ref to [installation docs](https://alibaba-damo-academy.github.io/FunASR/en/installation/installation.html)

-## Deployment Service
+## Model Zoo
+FunASR has open-sourced a large number of pre-trained models on industrial data. You are free to use, copy, modify, and share FunASR models under the [Model License Agreement](./MODEL_LICENSE). Below are some representative models, for more models please refer to the [Model Zoo]().

-FunASR supports pre-trained or further fine-tuned models for deployment as a service. The CPU version of the Chinese offline file conversion service has been released, details can be found in [docs](funasr/runtime/docs/SDK_tutorial.md). More detailed information about service deployment can be found in the [deployment roadmap](funasr/runtime/readme_cn.md).
+(Note: 🤗 represents the Huggingface model zoo link, ⭐ represents the ModelScope model zoo link)
+
+
+|                                                                              Model Name                                                                              |                                Task Details                                 |          Training Date           | Parameters |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:--------------------------------:|:----------:|
+| <nobr>paraformer-zh ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [🤗]() )</nobr> |             speech recognition, with timestamps, non-streaming              |      60000 hours, Mandarin       |    220M    |
+|             <nobr>paraformer-zh-spk ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)  [🤗]() )</nobr>             | speech recognition with speaker diarization, with timestamps, non-streaming |      60000 hours, Mandarin       |    220M    |
+|    <nobr>paraformer-zh-online ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() )</nobr>     |                      speech recognition, non-streaming                      |      60000 hours, Mandarin       |    220M    |
+|      <nobr>paraformer-en ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() )</nobr>      |             speech recognition, with timestamps, non-streaming              |       50000 hours, English       |    220M    |
+|                                                            <nobr>paraformer-en-spk ([🤗]() [⭐]() )</nobr>                                                            |         speech recognition with speaker diarization, non-streaming          |       50000 hours, English       |    220M    |
+|                  <nobr>conformer-en ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() )</nobr>                   |                      speech recognition, non-streaming                      |       50000 hours, English       |    220M    |
+|                  <nobr>ct-punc ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() )</nobr>                   |                           punctuation restoration                           |    100M, Mandarin and English    |    1.1G    | 
+|                       <nobr>fsmn-vad ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() )</nobr>                       |                          voice activity detection                           | 5000 hours, Mandarin and English |    0.4M    | 
+|                       <nobr>fa-zh ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() )</nobr>                        |                            timestamp prediction                             |       5000 hours, Mandarin       |    38M     | 
+
+
+
+
+[//]: # ()
+[//]: # (FunASR supports pre-trained or further fine-tuned models for deployment as a service. The CPU version of the Chinese offline file conversion service has been released, details can be found in [docs]&#40;funasr/runtime/docs/SDK_tutorial.md&#41;. More detailed information about service deployment can be found in the [deployment roadmap]&#40;funasr/runtime/readme_cn.md&#41;.)


 <a name="quick-start"></a>
 ## Quick Start
 Quick start for new users（[tutorial](https://alibaba-damo-academy.github.io/FunASR/en/funasr/quick_start.html)）

+FunASR supports inference and fine-tuning of models trained on industrial data for tens of thousands of hours. For more details, please refer to [modelscope_egs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html). It also supports training and fine-tuning of models on academic standard datasets. For more information, please refer to [egs](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html).

-FunASR supports inference and fine-tuning of models trained on industrial datasets of tens of thousands of hours. For more details, please refer to ([modelscope_egs](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html)). It also supports training and fine-tuning of models on academic standard datasets. For more details, please refer to([egs](https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html)). The models include speech recognition (ASR), speech activity detection (VAD), punctuation recovery, language model, speaker verification, speaker separation, and multi-party conversation speech recognition. For a detailed list of models, please refer to the [Model Zoo](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md):
+Below is a quick start tutorial. Test audio files ([Mandarin](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav), [English]()).
+### Speech Recognition (Non-streaming)
+```python
+from funasr import infer

-<a name="Community Communication"></a>
+p = infer(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", model_hub="ms")
+
+res = p("asr_example_zh.wav", batch_size_token=5000)
+print(res)
+```
+Note: `model_hub`: represents the model repository, `ms` stands for selecting ModelScope download, `hf` stands for selecting Huggingface download.
+
+### Speech Recognition (Streaming)
+```python
+from funasr import infer
+
+p = infer(model="paraformer-zh-streaming", model_hub="ms")
+
+chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
+param_dict = {"cache": dict(), "is_final": False, "chunk_size": chunk_size, "encoder_chunk_look_back": 4, "decoder_chunk_look_back": 1}
+
+import torchaudio
+speech = torchaudio.load("asr_example_zh.wav")[0][0]
+speech_length = speech.shape[0]
+
+stride_size = chunk_size[1] * 960
+sample_offset = 0
+for sample_offset in range(0, speech_length, min(stride_size, speech_length - sample_offset)):
+    param_dict["is_final"] = True if sample_offset + stride_size >= speech_length - 1 else False
+    input = speech[sample_offset: sample_offset + stride_size]
+    rec_result = p(input=input, param_dict=param_dict)
+    print(rec_result)
+```
+Note: `chunk_size` is the configuration for streaming latency.` [0,10,5]` indicates that the real-time display granularity is `10*60=600ms`, and the lookahead information is `5*60=300ms`. Each inference input is `600ms` (sample points are `16000*0.6=960`), and the output is the corresponding text. For the last speech segment input, `is_final=True` needs to be set to force the output of the last word.
+
+Quick start for new users can be found in [docs](https://alibaba-damo-academy.github.io/FunASR/en/funasr/quick_start_zh.html)
+
+
+[//]: # (FunASR supports inference and fine-tuning of models trained on industrial datasets of tens of thousands of hours. For more details, please refer to &#40;[modelscope_egs]&#40;https://alibaba-damo-academy.github.io/FunASR/en/modelscope_pipeline/quick_start.html&#41;&#41;. It also supports training and fine-tuning of models on academic standard datasets. For more details, please refer to&#40;[egs]&#40;https://alibaba-damo-academy.github.io/FunASR/en/academic_recipe/asr_recipe.html&#41;&#41;. The models include speech recognition &#40;ASR&#41;, speech activity detection &#40;VAD&#41;, punctuation recovery, language model, speaker verification, speaker separation, and multi-party conversation speech recognition. For a detailed list of models, please refer to the [Model Zoo]&#40;https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md&#41;:)
+
+## Deployment Service
+FunASR supports deploying pre-trained or further fine-tuned models for service. Currently, it supports the following types of service deployment:
+- File transcription service, Mandarin, CPU version, done
+- The real-time transcription service, Mandarin (CPU), done
+- File transcription service, English, CPU version, done
+- File transcription service, Mandarin, GPU version, in progress
+- and more.
+
+For more detailed information, please refer to the [service deployment documentation](runtime/readme.md).
+
+
+<a name="contact"></a>
 ## Community Communication
 If you encounter problems in use, you can directly raise Issues on the github page.

@ -67,8 +138,8 @@ You can also scan the following DingTalk group or WeChat group QR code to join t

 ## Contributors

-| <div align="left"><img src="docs/images/damo.png" width="180"/> | <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> |
-|:---------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:|
+| <div align="left"><img src="docs/images/nwpu.png" width="260"/> | <img src="docs/images/China_Telecom.png" width="200"/> </div>  | <img src="docs/images/RapidAI.png" width="200"/> </div> | <img src="docs/images/aihealthx.png" width="200"/> </div> | <img src="docs/images/XVERSE.png" width="250"/> </div> |
+|:---------------------------------------------------------------:|:--------------------------------------------------------------:|:-------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------:|

 The contributors can be found in [contributors list](./Acknowledge.md)

@ -91,12 +162,6 @@ The use of pretraining model is subject to [model license](./MODEL_LICENSE)
  year={2023},
  booktitle={INTERSPEECH},
 }
-@inproceedings{wang2023told,
-  author={Jiaming Wang and Zhihao Du and Shiliang Zhang},
-  title={{TOLD:} {A} Novel Two-Stage Overlap-Aware Framework for Speaker Diarization},
-  year={2023},
-  booktitle={ICASSP},
-}
@inproceedings{gao22b_interspeech,
  author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
  title={{Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}},
--- a/README_zh.md
+++ b/README_zh.md
@ -18,8 +18,8 @@ FunASR希望在语音识别的学术研究和工业应用之间架起一座桥
 ｜<a href="#安装教程"> 安装 </a>
 ｜<a href="#快速开始"> 快速开始 </a>
 ｜<a href="https://alibaba-damo-academy.github.io/FunASR/en/index.html"> 教程文档 </a>
-｜<a href="./docs/model_zoo/modelscope_models.md"> 模型仓库 </a>
-｜<a href="./funasr/runtime/readme_cn.md"> 服务部署 </a>
+｜<a href="#模型仓库"> 模型仓库 </a>
+｜<a href="#服务部署"> 服务部署 </a>
 ｜<a href="#联系我们"> 联系我们 </a>
 </h4>
 </div>
@ -27,16 +27,17 @@ FunASR希望在语音识别的学术研究和工业应用之间架起一座桥
 <a name="核心功能"></a>
 ## 核心功能
 - FunASR是一个基础语音识别工具包，提供多种功能，包括语音识别（ASR）、语音端点检测（VAD）、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等。FunASR提供了便捷的脚本和教程，支持预训练好的模型的推理与微调。
- 我们在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)与[huggingface](https://huggingface.co/FunAudio)上发布了大量开源数据集或者海量工业数据训练的模型，可以通过我们的[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)了解模型的详细信息。代表性的[Paraformer](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)非自回归端到端语音识别模型具有高精度、高效率、便捷部署的优点，支持快速构建语音识别服务，详细信息可以阅读([服务部署文档](funasr/runtime/readme_cn.md))。
+- 我们在[ModelScope](https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition)与[huggingface](https://huggingface.co/FunASR)上发布了大量开源数据集或者海量工业数据训练的模型，可以通过我们的[模型仓库](https://github.com/alibaba-damo-academy/FunASR/blob/main/docs/model_zoo/modelscope_models.md)了解模型的详细信息。代表性的[Paraformer](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)非自回归端到端语音识别模型具有高精度、高效率、便捷部署的优点，支持快速构建语音识别服务，详细信息可以阅读([服务部署文档](runtime/readme_cn.md))。

 <a name="最新动态"></a>
 ## 最新动态
- 20223/10/17: 英文离线文件转写服务一键部署的CPU版本发布，详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_en_zh.md))
+- 2023/11/08：中文离线文件转写服务3.0 CPU版本发布，新增标点大模型、Ngram语言模型与wfst热词，详细信息参阅([一键部署文档](runtime/readme_cn.md#中文离线文件转写服务cpu版本))
+- 2023/10/17: 英文离线文件转写服务一键部署的CPU版本发布，详细信息参阅([一键部署文档](runtime/readme_cn.md#英文离线文件转写服务cpu版本))
 - 2023/10/13: [SlideSpeech](https://slidespeech.github.io/): 一个大规模的多模态音视频语料库，主要是在线会议或者在线课程场景，包含了大量与发言人讲话实时同步的幻灯片。
 - 2023.10.10: [Paraformer-long-Spk](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr_vad_spk/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/demo.py)模型发布，支持在长语音识别的基础上获取每句话的说话人标签。
 - 2023.10.07: [FunCodec](https://github.com/alibaba-damo-academy/FunCodec): FunCodec提供开源模型和训练工具，可以用于音频离散编码，以及基于离散编码的语音识别、语音合成等任务。
- 2023.09.01: 中文离线文件转写服务2.0 CPU版本发布，新增ffmpeg、时间戳与热词模型支持，详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_zh.md))
- 2023.08.07: 中文实时语音听写服务一键部署的CPU版本发布，详细信息参阅([一键部署文档](funasr/runtime/docs/SDK_tutorial_online_zh.md))
+- 2023.09.01: 中文离线文件转写服务2.0 CPU版本发布，新增ffmpeg、时间戳与热词模型支持，详细信息参阅([一键部署文档](runtime/readme_cn.md#中文离线文件转写服务cpu版本))
+- 2023.08.07: 中文实时语音听写服务一键部署的CPU版本发布，详细信息参阅([一键部署文档](runtime/readme_cn.md#中文实时语音听写服务cpu版本))
 - 2023.07.17: BAT一种低延迟低内存消耗的RNN-T模型发布，详细信息参阅（[BAT](egs/aishell/bat)）
 - 2023.06.26: ASRU2023 多通道多方会议转录挑战赛2.0完成竞赛结果公布，详细信息参阅（[M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2_cn/index.html)）

@ -51,17 +52,17 @@ FunASR开源了大量在工业数据上预训练模型，您可以在[模型许
 （注：[🤗]()表示Huggingface模型仓库链接，[⭐]()表示ModelScope模型仓库链接）


-|                                                                          模型名字                                                                          |        任务详情        |     训练数据     | 参数量  |
-|:------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
-| paraformer-zh ([🤗]() [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) ) |  语音识别，带时间戳输出，非实时   |  60000小时，中文  | 220M |
-|             paraformer-zh-spk ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary) )              | 分角色语音识别，带时间戳输出，非实时 |  60000小时，中文  | 220M |
-|    paraformer-zh-online ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) )     |      语音识别，实时       |  60000小时，中文  | 220M |
-|      paraformer-en ([🤗]() [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) )      | 分角色语音识别，带时间戳输出，非实时 |  50000小时，英文  | 220M |
-|                                 paraformer-en-spk ([🤗]() [⭐]() )                                                                                      |      语音识别，非实时      |  50000小时，英文  | 220M |
-|                  conformer-en ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) )                   |      语音识别，非实时      |  50000小时，英文  | 220M |
-|                  ct-punc ([🤗]() [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) )                   |      标点恢复，非实时      |  100M，中文与英文  | 1.1G | 
-|                       fsmn-vad ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) )                       |     语音端点检测，实时      | 5000小时，中文与英文 | 0.4M | 
-|                       fa-zh ([🤗]() [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) )                        |   字级别时间戳预测         |  50000小时，中文  | 38M  | 
+|                                                                              模型名字                                                                               |        任务详情        |     训练数据     | 参数量  |
+|:---------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------:|:------------:|:----:|
+|     paraformer-zh ([⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)  [🤗]() )     |  语音识别，带时间戳输出，非实时   |  60000小时，中文  | 220M |
+|                 paraformer-zh-spk ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn/summary)  [🤗]() )                 | 分角色语音识别，带时间戳输出，非实时 |  60000小时，中文  | 220M |
+|        paraformer-zh-online ( [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗]() )         |      语音识别，实时       |  60000小时，中文  | 220M |
+|          paraformer-en ( [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020/summary) [🤗]() )          | 语音识别，非实时 |  50000小时，英文  | 220M |
+|                                                                paraformer-en-spk ([🤗]() [⭐]() )                                                                |      语音识别，非实时      |  50000小时，英文  | 220M |
+|                      conformer-en ( [⭐](https://modelscope.cn/models/damo/speech_conformer_asr-en-16k-vocab4199-pytorch/summary) [🤗]() )                       |      语音识别，非实时      |  50000小时，英文  | 220M |
+|                      ct-punc ( [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗]() )                       |      标点恢复      |  100M，中文与英文  | 1.1G | 
+|                           fsmn-vad ( [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗]() )                           |     语音端点检测，实时      | 5000小时，中文与英文 | 0.4M | 
+|                           fa-zh ( [⭐](https://modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary) [🤗]() )                            |   字级别时间戳预测         |  50000小时，中文  | 38M  |


 <a name="快速开始"></a>
@ -116,7 +117,7 @@ FunASR支持预训练或者进一步微调的模型进行服务部署。目前
 - 中文离线文件转写服务（GPU版本），进行中
 - 更多支持中

-详细信息可以参阅([服务部署文档](funasr/runtime/readme_cn.md))。
+详细信息可以参阅([服务部署文档](runtime/readme_cn.md))。


 <a name="社区交流"></a>
--- a/docs/index.rst
+++ b/docs/index.rst
@ -71,10 +71,10 @@ Overview
   :maxdepth: 1
   :caption: Runtime and Service

-   ./funasr/runtime/readme.md
-   ./funasr/runtime/docs/SDK_tutorial_online.md
-   ./funasr/runtime/docs/SDK_tutorial.md
-   ./funasr/runtime/html5/readme.md
+   ./runtime/readme.md
+   ./runtime/docs/SDK_tutorial_online.md
+   ./runtime/docs/SDK_tutorial.md
+   ./runtime/html5/readme.md



--- a/docs/runtime
+++ b/docs/runtime
@ -0,0 +1 @@
+../runtime
--- a/docs/runtime/demo.gif
+++ b/docs/runtime/demo.gif
--- a/docs/runtime/export.md
+++ b/docs/runtime/export.md
@ -1 +0,0 @@
-../../funasr/export/README.md
--- a/docs/runtime/grpc_cpp.md
+++ b/docs/runtime/grpc_cpp.md
@ -1 +0,0 @@
-../../funasr/runtime/grpc/Readme.md
--- a/docs/runtime/grpc_python.md
+++ b/docs/runtime/grpc_python.md
@ -1 +0,0 @@
-../../funasr/runtime/python/grpc/Readme.md
--- a/docs/runtime/html5.md
+++ b/docs/runtime/html5.md
@ -1 +0,0 @@
-../../funasr/runtime/html5/readme.md
--- a/docs/runtime/img.png
+++ b/docs/runtime/img.png
--- a/docs/runtime/libtorch_python.md
+++ b/docs/runtime/libtorch_python.md
@ -1 +0,0 @@
-../../funasr/runtime/python/libtorch/README.md
--- a/docs/runtime/onnxruntime_cpp.md
+++ b/docs/runtime/onnxruntime_cpp.md
@ -1 +0,0 @@
-../../funasr/runtime/onnxruntime/readme.md
--- a/docs/runtime/onnxruntime_python.md
+++ b/docs/runtime/onnxruntime_python.md
@ -1 +0,0 @@
-../../funasr/runtime/python/onnxruntime/README.md
--- a/docs/runtime/websocket_cpp.md
+++ b/docs/runtime/websocket_cpp.md
@ -1 +0,0 @@
-../../funasr/runtime/websocket/readme.md
--- a/docs/runtime/websocket_python.md
+++ b/docs/runtime/websocket_python.md
@ -1 +0,0 @@
-../../funasr/runtime/python/websocket/README.md
--- a/egs_modelscope/asr/TEMPLATE/README_zh.md
+++ b/egs_modelscope/asr/TEMPLATE/README_zh.md
@ -30,12 +30,10 @@ inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
    vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
-    #punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch',
    punc_model='damo/punc_ct-transformer_cn-en-common-vocab471067-large',
 )

-rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav', 
-                                batch_size_token=5000, batch_size_token_threshold_s=40, max_single_segment_time=6000)
+rec_result = inference_pipeline(audio_in='./vad_example.wav')
 print(rec_result)
 ```
 其中： 
--- a/funasr/quick_start.md
+++ b/funasr/quick_start.md
@ -26,7 +26,7 @@ python funasr_wss_server.py --port 10095
 python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
 ```

-For more examples, please refer to [docs](runtime/python/websocket/README.md).
+For more examples, please refer to [docs](../runtime/python/websocket/README.md).

 ### C++ version Example

@ -47,7 +47,7 @@ Testing [samples](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sam
 ```shell
 python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass
 ```
-For more examples, please refer to [docs](runtime/docs/SDK_tutorial_online_zh.md)
+For more examples, please refer to [docs](../runtime/docs/SDK_tutorial_online_zh.md)


 #### File Transcription Service, Mandarin (CPU)
@ -68,7 +68,7 @@ Testing [samples](https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sam
 python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
 ```

-For more examples, please refer to [docs](runtime/docs/SDK_tutorial_zh.md)
+For more examples, please refer to [docs](../runtime/docs/SDK_tutorial_zh.md)


 ## Industrial Model Egs
--- a/funasr/quick_start_zh.md
+++ b/funasr/quick_start_zh.md
@ -26,7 +26,7 @@ python funasr_wss_server.py --port 10095
 python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "5,10,5"
 #python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass --chunk_size "8,8,4" --audio_in "./data/wav.scp"
 ```
-更多例子可以参考（[点击此处](runtime/python/websocket/README.md)）
+更多例子可以参考（[点击此处](../runtime/python/websocket/README.md)）

 <a name="cpp版本示例"></a>
 #### c++版本示例
@ -46,7 +46,7 @@ sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-ru
 ```shell
 python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass
 ```
-更多例子参考（[点击此处](runtime/docs/SDK_tutorial_online_zh.md)）
+更多例子参考（[点击此处](../runtime/docs/SDK_tutorial_online_zh.md)）

 ##### 离线文件转写服务部署
 ###### 服务端部署
@ -59,7 +59,7 @@ sudo bash funasr-runtime-deploy-offline-cpu-zh.sh install --workspace ./funasr-r
 ```shell
 python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
 ```
-更多例子参考（[点击此处](runtime/docs/SDK_tutorial_zh.md)）
+更多例子参考（[点击此处](../runtime/docs/SDK_tutorial_zh.md)）



--- a/funasr/version.txt
+++ b/funasr/version.txt
@ -1 +1 @@
-0.8.2
+0.8.3
--- a/runtime/docs/SDK_advanced_guide_offline.md
+++ b/runtime/docs/SDK_advanced_guide_offline.md
@ -4,37 +4,28 @@ FunASR provides a Chinese offline file transcription service that can be deploye

 This document serves as a development guide for the FunASR offline file transcription service. If you wish to quickly experience the offline file transcription service, please refer to the one-click deployment example for the FunASR offline file transcription service ([docs](./SDK_tutorial.md)).

-## Installation of Docker
+<img src="images/offline_structure.jpg"  width="900"/>

-The following steps are for manually installing Docker and Docker images. If your Docker image has already been launched, you can ignore this step.

-### Installation of Docker environment
+| TIME       | INFO                                                                                                                             | IMAGE VERSION                | IMAGE ID     |
+|------------|----------------------------------------------------------------------------------------------------------------------------------|------------------------------|--------------|
+| 2023.11.08 | supporting punc-large model, Ngram model, fst hotwords, server-side loading of hotwords, adaptation to runtime structure changes | funasr-runtime-sdk-cpu-0.3.0 | caa64bddbb43 |
+| 2023.09.19 | supporting ITN model                                                                                                             | funasr-runtime-sdk-cpu-0.2.2 | 2c5286be13e9 |
+| 2023.08.22 | integrated ffmpeg to support various audio and video inputs, supporting nn-hotword model and timestamp model                     | funasr-runtime-sdk-cpu-0.2.0 | 1ad3d19e0707 |
+| 2023.07.03 | 1.0 released                                                                                                                     | funasr-runtime-sdk-cpu-0.1.0 | 1ad3d19e0707 |

+
+## Quick start
+### Docker install
+If you have already installed Docker, ignore this step!
 ```shell
-# Ubuntu：
-curl -fsSL https://test.docker.com -o test-docker.sh 
-sudo sh test-docker.sh 
-# Debian：
-curl -fsSL https://get.docker.com -o get-docker.sh 
-sudo sh get-docker.sh 
-# CentOS：
-curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun 
-# MacOS：
-brew install --cask --appdir=/Applications docker
-```
-
-More details could ref to [docs](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)
-
-### Starting Docker
-
-```shell
-sudo systemctl start docker
+curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
+sudo bash install_docker.sh
 ```
+If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

 ### Pulling and launching images
-
 Use the following command to pull and launch the Docker image for the FunASR runtime-SDK:
-
 ```shell
 sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.3.0

@ -46,11 +37,9 @@ Introduction to command parameters:
 -p <host port>:<mapped docker port>: In the example, host machine (ECS) port 10095 is mapped to port 10095 in the Docker container. Make sure that port 10095 is open in the ECS security rules.

 -v <host path>:<mounted Docker path>: In the example, the host machine path /root is mounted to the Docker path /workspace/models.
-
 ```

-## Starting the server
-
+### Starting the server
 Use the flollowing script to start the server ：
 ```shell
 nohup bash run_server.sh \
@ -59,13 +48,15 @@ nohup bash run_server.sh \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
-  --itn-dir thuduj12/fst_itn_zh > log.out 2>&1 &
+  --itn-dir thuduj12/fst_itn_zh \
+  --hotword /workspace/models/hotwords.txt > log.out 2>&1 &

 # If you want to close ssl，please add：--certfile 0
 # If you want to deploy the timestamp or nn hotword model, please set --model-dir to the corresponding model:
-# speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（timestamp）
-# damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（hotword）
-
+#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（timestamp）
+#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（hotword）
+# If you want to load hotwords on the server side, please configure the hotwords in the host machine file ./funasr-runtime-resources/models/hotwords.txt (docker mapping address: /workspace/models/hotwords.txt):
+# One hotword per line, format (hotword weight): 阿里巴巴 20"
 ```

 ### More details about the script run_server.sh:
@ -90,7 +81,6 @@ nohup bash run_server.sh \
 ```

 Introduction to run_server.sh parameters: 
-
 ```text
 --download-model-dir: Model download address, download models from Modelscope by setting the model ID.
 --model-dir: Modelscope model ID.
@ -139,19 +129,14 @@ After executing the above command, the real-time speech transcription service wi

 If you wish to deploy your fine-tuned model (e.g., 10epoch.pb), you need to manually rename the model to model.pb and replace the original model.pb in ModelScope. Then, specify the path as `model_dir`.

-
-
 ## Starting the client
-
 After completing the deployment of FunASR offline file transcription service on the server, you can test and use the service by following these steps. Currently, FunASR-bin supports multiple ways to start the client. The following are command-line examples based on python-client, c++-client, and custom client Websocket communication protocol: 

 ### python-client
 ```shell
 python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "./data/wav.scp" --send_without_sleep --output_dir "./results"
 ```
-
 Introduction to command parameters:
-
 ```text
 --host: the IP address of the server. It can be set to 127.0.0.1 for local testing.
 --port: the port number of the server listener.
@ -169,7 +154,6 @@ Introduction to command parameters:
 ```

 Introduction to command parameters:
-
 ```text
 --server-ip: the IP address of the server. It can be set to 127.0.0.1 for local testing.
 --port: the port number of the server listener.
@ -180,19 +164,15 @@ Introduction to command parameters:
 ```

 ### Custom client
-
 If you want to define your own client, see the [Websocket communication protocol](./websocket_protocol.md)

 ## How to customize service deployment
-
 The code for FunASR-runtime is open source. If the server and client cannot fully meet your needs, you can further develop them based on your own requirements:

 ### C++ client
-
 https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/websocket

 ### Python client
-
 https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/websocket

 ### C++ server
@ -216,7 +196,6 @@ FUNASR_HANDLE asr_hanlde=FunOfflineInit(model_path, thread_num);
 FUNASR_RESULT result=FunOfflineInfer(asr_hanlde, wav_file.c_str(), RASR_NONE, NULL, 16000);
 // Where: asr_hanlde is the return value of FunOfflineInit, wav_file is the path to the audio file, and sampling_rate is the sampling rate (default 16k).
 ```
-
 See the usage example for details, [docs](https://github.com/alibaba-damo-academy/FunASR/blob/main/runtime/onnxruntime/bin/funasr-onnx-offline.cpp)

 #### PUNC
--- a/runtime/docs/SDK_advanced_guide_offline_en.md
+++ b/runtime/docs/SDK_advanced_guide_offline_en.md
@ -4,54 +4,36 @@ FunASR provides a English offline file transcription service that can be deploye

 This document serves as a development guide for the FunASR offline file transcription service. If you wish to quickly experience the offline file transcription service, please refer to the one-click deployment example for the FunASR offline file transcription service ([docs](./SDK_tutorial.md)).

-## Installation of Docker
-
-The following steps are for manually installing Docker and Docker images. If your Docker image has already been launched, you can ignore this step.
-
-### Installation of Docker environment
+| TIME       | INFO                                    | IMAGE VERSION                   | IMAGE ID     |
+|------------|-----------------------------------------|---------------------------------|--------------|
+| 2023.11.08 | Adaptation to runtime structure changes | funasr-runtime-sdk-en-cpu-0.1.1 | 27017f70f72a |
+| 2023.10.16 | 1.0 released                            | funasr-runtime-sdk-en-cpu-0.1.0 | e0de03eb0163 |

+## Quick start
+### Docker install
+If you have already installed Docker, ignore this step!
 ```shell
-# Ubuntu：
-curl -fsSL https://test.docker.com -o test-docker.sh 
-sudo sh test-docker.sh 
-# Debian：
-curl -fsSL https://get.docker.com -o get-docker.sh 
-sudo sh get-docker.sh 
-# CentOS：
-curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun 
-# MacOS：
-brew install --cask --appdir=/Applications docker
-```
-
-More details could ref to [docs](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)
-
-### Starting Docker
-
-```shell
-sudo systemctl start docker
+curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
+sudo bash install_docker.sh
 ```
+If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

 ### Pulling and launching images
-
 Use the following command to pull and launch the Docker image for the FunASR runtime-SDK:
-
 ```shell
 sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-en-cpu-0.1.1

-sudo docker run -p 10095:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-en-cpu-0.1.1
+sudo docker run -p 10097:10095 -it --privileged=true -v /root:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-en-cpu-0.1.1
 ```
-
 Introduction to command parameters: 
 ```text
-p <host port>:<mapped docker port>: In the example, host machine (ECS) port 10095 is mapped to port 10095 in the Docker container. Make sure that port 10095 is open in the ECS security rules.
+-p <host port>:<mapped docker port>: In the example, host machine (ECS) port 10097 is mapped to port 10095 in the Docker container. Make sure that port 10097 is open in the ECS security rules.

 -v <host path>:<mounted Docker path>: In the example, the host machine path /root is mounted to the Docker path /workspace/models.

 ```

-
-## Starting the server
-
+### Starting the server
 Use the flollowing script to start the server ：
 ```shell
 nohup bash run_server.sh \
@ -61,11 +43,9 @@ nohup bash run_server.sh \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx > log.out 2>&1 &

 # If you want to close ssl，please add：--certfile 0
-
 ```

 ### More details about the script run_server.sh:
-
 The funasr-wss-server supports downloading models from Modelscope. You can set the model download address (--download-model-dir, default is /workspace/models) and the model ID (--model-dir, --vad-dir, --punc-dir). Here is an example:

 ```shell
@ -83,7 +63,6 @@ nohup bash run_server.sh \
 ```

 Introduction to run_server.sh parameters: 
-
 ```text
 --download-model-dir: Model download address, download models from Modelscope by setting the model ID.
 --model-dir: Modelscope model ID.
--- a/runtime/docs/SDK_advanced_guide_offline_en_zh.md
+++ b/runtime/docs/SDK_advanced_guide_offline_en_zh.md
@ -4,6 +4,12 @@ FunASR提供可一键本地或者云端服务器部署的英文离线文件转

 本文档为FunASR离线文件转写服务开发指南。如果您想快速体验离线文件转写服务，可参考[快速上手](#快速上手)。

+| 时间         | 详情            | 镜像版本                            | 镜像ID         |
+|------------|---------------|---------------------------------|--------------|
+| 2023.11.08 | runtime结构变化适配 | funasr-runtime-sdk-en-cpu-0.1.1 | 27017f70f72a |
+| 2023.10.16 | 1.0 发布        | funasr-runtime-sdk-en-cpu-0.1.0 | e0de03eb0163 |
+
+
 ## 服务器配置

 用户可以根据自己的业务需求，选择合适的服务器配置，推荐配置为：
@ -17,7 +23,6 @@ FunASR提供可一键本地或者云端服务器部署的英文离线文件转


 ## 快速上手
-
 ### docker安装
 如果您已安装docker，忽略本步骤！!
 通过下述命令在服务器上安装docker：
@ -25,20 +30,18 @@ FunASR提供可一键本地或者云端服务器部署的英文离线文件转
 curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh；
 sudo bash install_docker.sh
 ```
+docker安装失败请参考 [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

 ### 镜像启动
-
 通过下述命令拉取并启动FunASR runtime-SDK的docker镜像：
-
 ```shell
 sudo docker pull \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-en-cpu-0.1.1
 mkdir -p ./funasr-runtime-resources/models
-sudo docker run -p 10095:10095 -it --privileged=true \
+sudo docker run -p 10097:10095 -it --privileged=true \
  -v $PWD/funasr-runtime-resources/models:/workspace/models \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-en-cpu-0.1.1
 ```
-如果您没有安装docker，可参考[Docker安装](#Docker安装)

 ### 服务端启动

@ -67,33 +70,6 @@ python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --au
 ```

 ------------------
-## Docker安装
-
-下述步骤为手动安装docker环境的步骤：
-
-### docker环境安装
-```shell
-# Ubuntu：
-curl -fsSL https://test.docker.com -o test-docker.sh 
-sudo sh test-docker.sh 
-# Debian：
-curl -fsSL https://get.docker.com -o get-docker.sh 
-sudo sh get-docker.sh 
-# CentOS：
-curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun 
-# MacOS：
-brew install --cask --appdir=/Applications docker
-```
-
-安装详见：https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html
-
-### docker启动
-
-```shell
-sudo systemctl start docker
-```
-
-
 ## 客户端用法详解

 在服务器上完成FunASR服务部署以后，可以通过如下的步骤来测试和使用离线文件转写服务。
@ -155,8 +131,6 @@ FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode
 ```
 详细可以参考文档（[点击此处](../java/readme.md)）

-
-
 ## 服务端用法详解：

 ### 启动FunASR服务
@ -212,7 +186,6 @@ kill -9 PID
    --certfile 0
 ```

-
 执行上述指令后，启动英文离线文件转写服务。如果模型指定为ModelScope中model id，会自动从MoldeScope中下载如下模型：
 [FSMN-VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary),
 [Paraformer-lagre模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-en-16k-common-vocab10020-onnx/summary),
@ -220,7 +193,6 @@ kill -9 PID

 如果，您希望部署您finetune后的模型（例如10epoch.pb），需要手动将模型重命名为model.pb，并将原modelscope中模型model.pb替换掉，将路径指定为`model_dir`即可。

-
 ## 如何定制服务部署

 FunASR-runtime的代码已开源，如果服务端和客户端不能很好的满足您的需求，您可以根据自己的需求进行进一步的开发：
@ -236,9 +208,6 @@ https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/websocke

 如果您想定义自己的client，参考[websocket通信协议](./websocket_protocol_zh.md)

-
-```
-
 ### c++ 服务端：

 #### VAD
--- a/runtime/docs/SDK_advanced_guide_offline_zh.md
+++ b/runtime/docs/SDK_advanced_guide_offline_zh.md
@ -4,6 +4,15 @@ FunASR提供可一键本地或者云端服务器部署的中文离线文件转

 本文档为FunASR离线文件转写服务开发指南。如果您想快速体验离线文件转写服务，可参考[快速上手](#快速上手)。

+<img src="images/offline_structure.jpg"  width="900"/>
+
+| 时间         | 详情                                                | 镜像版本                         | 镜像ID         |
+|------------|---------------------------------------------------|------------------------------|--------------|
+| 2023.11.08 | 支持标点大模型、支持Ngram模型、支持fst热词、支持服务端加载热词、runtime结构变化适配 | funasr-runtime-sdk-cpu-0.3.0 | caa64bddbb43 |
+| 2023.09.19 | 支持ITN模型                                           | funasr-runtime-sdk-cpu-0.2.2 | 2c5286be13e9 |
+| 2023.08.22 | 集成ffmpeg支持多种音视频输入、支持热词模型、支持时间戳模型                  | funasr-runtime-sdk-cpu-0.2.0 | 1ad3d19e0707 |
+| 2023.07.03 | 1.0 发布                                            | funasr-runtime-sdk-cpu-0.1.0 | 1ad3d19e0707 |
+
 ## 服务器配置

 用户可以根据自己的业务需求，选择合适的服务器配置，推荐配置为：
@ -25,6 +34,7 @@ FunASR提供可一键本地或者云端服务器部署的中文离线文件转
 curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh；
 sudo bash install_docker.sh
 ```
+docker安装失败请参考 [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

 ### 镜像启动

@ -38,7 +48,6 @@ sudo docker run -p 10095:10095 -it --privileged=true \
  -v $PWD/funasr-runtime-resources/models:/workspace/models \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.3.0
 ```
-如果您没有安装docker，可参考[Docker安装](#Docker安装)

 ### 服务端启动

@ -51,15 +60,20 @@ nohup bash run_server.sh \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
-  --itn-dir thuduj12/fst_itn_zh > log.out 2>&1 &
+  --itn-dir thuduj12/fst_itn_zh \
+  --hotword /workspace/models/hotwords.txt > log.out 2>&1 &

 # 如果您想关闭ssl，增加参数：--certfile 0
 # 如果您想使用时间戳或者nn热词模型进行部署，请设置--model-dir为对应模型：
-# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（时间戳）
-# 或者 damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（热词）
-
+#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（时间戳）
+#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（nn热词）
+# 如果您想在服务端加载热词，请在宿主机文件./funasr-runtime-resources/models/hotwords.txt配置热词（docker映射地址为/workspace/models/hotwords.txt）:
+#   每行一个热词，格式(热词 权重)：阿里巴巴 20
 ```
+如果您想定制ngram，参考文档([如何训练LM](./lm_train_tutorial.md))
+
 服务端详细参数介绍可参考[服务端用法详解](#服务端用法详解)
+
 ### 客户端测试与使用

 下载客户端测试工具目录samples
@ -71,34 +85,6 @@ wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_sa
 python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"
 ```

------------------
-## Docker安装
-
-下述步骤为手动安装docker环境的步骤：
-
-### docker环境安装
-```shell
-# Ubuntu：
-curl -fsSL https://test.docker.com -o test-docker.sh 
-sudo sh test-docker.sh 
-# Debian：
-curl -fsSL https://get.docker.com -o get-docker.sh 
-sudo sh get-docker.sh 
-# CentOS：
-curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun 
-# MacOS：
-brew install --cask --appdir=/Applications docker
-```
-
-安装详见：https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html
-
-### docker启动
-
-```shell
-sudo systemctl start docker
-```
-
-
 ## 客户端用法详解

 在服务器上完成FunASR服务部署以后，可以通过如下的步骤来测试和使用离线文件转写服务。
@ -137,7 +123,6 @@ python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \
 ```

 命令参数说明：
-
 ```text
 --server-ip 为FunASR runtime-SDK服务部署机器ip，默认为本机ip（127.0.0.1），如果client与服务不在同一台服务器，
            需要改为部署机器ip
@ -148,13 +133,11 @@ python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline \
 ```

 ### Html网页版
-
 在浏览器中打开 html/static/index.html，即可出现如下页面，支持麦克风输入与文件上传，直接进行体验

 <img src="images/html.png"  width="900"/>

 ### Java-client
-
 ```shell
 FunasrWsClient --host localhost --port 10095 --audio_in ./asr_example.wav --mode offline
 ```
@ -228,6 +211,7 @@ kill -9 PID

 如果，您希望部署您finetune后的模型（例如10epoch.pb），需要手动将模型重命名为model.pb，并将原modelscope中模型model.pb替换掉，将路径指定为`model_dir`即可。

+------------------

 ## 如何定制服务部署

@ -244,9 +228,6 @@ https://github.com/alibaba-damo-academy/FunASR/tree/main/runtime/python/websocke

 如果您想定义自己的client，参考[websocket通信协议](./websocket_protocol_zh.md)

-
-```
-
 ### c++ 服务端：

 #### VAD
--- a/runtime/docs/SDK_advanced_guide_online.md
+++ b/runtime/docs/SDK_advanced_guide_online.md
@ -2,18 +2,32 @@

 FunASR provides a real-time speech transcription service that can be easily deployed on local or cloud servers, with the FunASR runtime-SDK as the core. It integrates the speech endpoint detection (VAD), Paraformer-large non-streaming speech recognition (ASR), Paraformer-large streaming speech recognition (ASR), punctuation (PUNC), and other related capabilities open-sourced by the speech laboratory of DAMO Academy on the Modelscope community. The software package can perform real-time speech-to-text transcription, and can also accurately transcribe text at the end of sentences for high-precision output. The output text contains punctuation and supports high-concurrency multi-channel requests.

+<img src="images/online_structure.png"  width="900"/>
+
+| TIME       | INFO                                                                                | IMAGE VERSION                       | IMAGE ID     |
+|------------|-------------------------------------------------------------------------------------|-------------------------------------|--------------|
+| 2023.11.08 | supporting server-side loading of hotwords, adaptation to runtime structure changes | funasr-runtime-sdk-online-cpu-0.1.4 | 691974017c38 |
+| 2023.09.19 | supporting hotwords, timestamps, and ITN model in 2pass mode                        | funasr-runtime-sdk-online-cpu-0.1.2 | 7222c5319bcf |
+| 2023.08.11 | addressing some known bugs (including server crashes)                               | funasr-runtime-sdk-online-cpu-0.1.1 | bdbdd0b27dee |
+| 2023.08.07 | 1.0 released                                                                        | funasr-runtime-sdk-online-cpu-0.1.0 | bdbdd0b27dee |
+
 ## Quick Start
-### Pull Docker Image
-
-Use the following command to pull and start the FunASR software package docker image:
-
+### Docker install
+If you have already installed Docker, ignore this step!
 ```shell
-sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.3
-mkdir -p ./funasr-runtime-resources/models
-sudo docker run -p 10095:10095 -it --privileged=true -v $PWD/funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.3
+curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
+sudo bash install_docker.sh
 ```
 If you do not have Docker installed, please refer to [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

+### Pull Docker Image
+Use the following command to pull and start the FunASR software package docker image:
+```shell
+sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4
+mkdir -p ./funasr-runtime-resources/models
+sudo docker run -p 10096:10095 -it --privileged=true -v $PWD/funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4
+```
+
 ### Launching the Server

 After Docker is launched, start the funasr-wss-server-2pass service program:
--- a/runtime/docs/SDK_advanced_guide_online_zh.md
+++ b/runtime/docs/SDK_advanced_guide_online_zh.md
@ -5,29 +5,38 @@ FunASR提供可便捷本地或者云端服务器部署的实时语音听写服

 本文档为FunASR实时转写服务开发指南。如果您想快速体验实时语音听写服务，可参考[快速上手](#快速上手)。

+<img src="images/online_structure.png"  width="900"/>
+
+| 时间         | 详情                                | 镜像版本                                | 镜像ID         |
+|:-----------|:----------------------------------|-------------------------------------|--------------|
+| 2023.11.08 | 支持服务端加载热词(更新热词通信协议)、runtime结构变化适配 | funasr-runtime-sdk-online-cpu-0.1.4 | 691974017c38 |
+| 2023.09.19 | 2pass模式支持热词、时间戳、ITN模型             | funasr-runtime-sdk-online-cpu-0.1.2 | 7222c5319bcf |
+| 2023.08.11 | 修复了部分已知的bug(包括server崩溃等)          | funasr-runtime-sdk-online-cpu-0.1.1 | bdbdd0b27dee |
+| 2023.08.07 | 1.0 发布                            | funasr-runtime-sdk-online-cpu-0.1.0 | bdbdd0b27dee |
+
+
 ## 快速上手

 ### docker安装
 如果您已安装docker，忽略本步骤！!
 通过下述命令在服务器上安装docker：
 ```shell
-curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh；
+curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh
 sudo bash install_docker.sh
 ```
+docker安装失败请参考 [Docker Installation](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker.html)

 ### 镜像启动
-
 通过下述命令拉取并启动FunASR软件包的docker镜像：

 ```shell
 sudo docker pull \
-  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.3
+  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4
 mkdir -p ./funasr-runtime-resources/models
-sudo docker run -p 10095:10095 -it --privileged=true \
+sudo docker run -p 10096:10095 -it --privileged=true \
  -v $PWD/funasr-runtime-resources/models:/workspace/models \
-  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.3
+  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.4
 ```
-如果您没有安装docker，可参考[Docker安装](https://alibaba-damo-academy.github.io/FunASR/en/installation/docker_zh.html)

 ### 服务端启动

@ -40,12 +49,15 @@ nohup bash run_server_2pass.sh \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \
  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
-  --itn-dir thuduj12/fst_itn_zh  > log.out 2>&1 &
+  --itn-dir thuduj12/fst_itn_zh \
+  --hotword /workspace/models/hotwords.txt > log.out 2>&1 &

 # 如果您想关闭ssl，增加参数：--certfile 0
-# 如果您想使用时间戳或者热词模型进行部署，请设置--model-dir为对应模型：
-# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（时间戳）
-# 或者 damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（热词）
+# 如果您想使用时间戳或者nn热词模型进行部署，请设置--model-dir为对应模型：
+#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（时间戳）
+#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（nn热词）
+# 如果您想在服务端加载热词，请在宿主机文件./funasr-runtime-resources/models/hotwords.txt配置热词（docker映射地址为/workspace/models/hotwords.txt）:
+#   每行一个热词，格式(热词 权重)：阿里巴巴 20
 ```
 服务端详细参数介绍可参考[服务端用法详解](#服务端用法详解)
 ### 客户端测试与使用
--- a/runtime/docs/images/offline_structure.jpg
+++ b/runtime/docs/images/offline_structure.jpg
--- a/runtime/docs/images/online_structure.png
+++ b/runtime/docs/images/online_structure.png
--- a/runtime/docs/images/sdk_roadmap.jpg
+++ b/runtime/docs/images/sdk_roadmap.jpg
--- a/runtime/html5/static/index.html
+++ b/runtime/html5/static/index.html
@ -52,7 +52,7 @@
 				</div>
 				<br>
 		        <div  style="border:2px solid #ccc;">
-					热词设置(一行一个关键字，空格隔开权重,如"阿里巴巴 20 hello world 40")：
+					热词设置(一行一个关键字，空格隔开权重,如"阿里巴巴 20")：
 					<br>
 	
 	
--- a/runtime/html5/static/main.js
+++ b/runtime/html5/static/main.js
@ -262,7 +262,7 @@ function getHotwords(){
 	var obj = document.getElementById("varHot");

 	if(typeof(obj) == 'undefined' || obj==null || obj.value.length<=0){
-	  return "";
+	  return null;
 	}
 	let val = obj.value.toString();
  
@ -279,11 +279,11 @@ function getHotwords(){
 			for(var i=0;i<result.length-1;i++)
 				wordstr=wordstr+result[i]+" ";
  
-			jsonresult[wordstr.trim()]=result[result.length-1];
+			jsonresult[wordstr.trim()]= parseInt(result[result.length-1]);
 		}
 	}
 	console.log("jsonresult="+JSON.stringify(jsonresult));
-	return jsonresult;
+	return  JSON.stringify(jsonresult);

 }
 function getAsrMode(){
--- a/runtime/html5/static/wsconnecter.js
+++ b/runtime/html5/static/wsconnecter.js
@ -85,12 +85,13 @@ function WebSocketConnectMethod( config ) { //定义socket连接方法类
 		}
 		
 		var hotwords=getHotwords();
-		if(hotwords.length>0)
+ 
+		if(hotwords!=null  )
 		{
 			request.hotwords=hotwords;
 		}
-		console.log(request);
-		speechSokt.send( JSON.stringify(request) );
+		console.log(JSON.stringify(request));
+		speechSokt.send(JSON.stringify(request));
 		console.log("连接成功");
 		stateHandle(0);
 
--- a/runtime/python/onnxruntime/funasr_onnx/paraformer_bin.py
+++ b/runtime/python/onnxruntime/funasr_onnx/paraformer_bin.py
@ -36,7 +36,6 @@ class Paraformer():
                 intra_op_num_threads: int = 4,
                 cache_dir: str = None
                 ):
-
        if not Path(model_dir).exists():
            try:
                from modelscope.hub.snapshot_download import snapshot_download
@ -241,6 +240,13 @@ class ContextualParaformer(Paraformer):
                 ):

        if not Path(model_dir).exists():
+            try:
+                from modelscope.hub.snapshot_download import snapshot_download
+            except:
+                raise "You are exporting model from modelscope, please install modelscope and try it again. To install modelscope, you could:\n" \
+                      "\npip3 install -U modelscope\n" \
+                      "For the users in China, you could install with the command:\n" \
+                      "\npip3 install -U modelscope -i https://mirror.sjtu.edu.cn/pypi/web/simple"
            try:
                model_dir = snapshot_download(model_dir, cache_dir=cache_dir)
            except:
--- a/runtime/python/websocket/README.md
+++ b/runtime/python/websocket/README.md
@ -110,15 +110,15 @@ python funasr_wss_client.py --host "0.0.0.0" --port 10095 --mode 2pass --chunk_s

 #### Websocket api
 ```shell
-    # class Funasr_websocket_recognizer example with 3 step
-    # 1.create an recognizer 
-    rcg=Funasr_websocket_recognizer(host="127.0.0.1",port="30035",is_ssl=True,mode="2pass")
-    # 2.send pcm data to asr engine and get asr result
-    text=rcg.feed_chunk(data)
-    print("text",text)
-    # 3.get last result, set timeout=3
-    text=rcg.close(timeout=3)
-    print("text",text)
+# class Funasr_websocket_recognizer example with 3 step
+# 1.create an recognizer 
+rcg=Funasr_websocket_recognizer(host="127.0.0.1",port="30035",is_ssl=True,mode="2pass")
+# 2.send pcm data to asr engine and get asr result
+text=rcg.feed_chunk(data)
+print("text",text)
+# 3.get last result, set timeout=3
+text=rcg.close(timeout=3)
+print("text",text)
 ```

 ## Acknowledge
--- a/runtime/python/websocket/funasr_wss_client.py
+++ b/runtime/python/websocket/funasr_wss_client.py
@ -27,20 +27,16 @@ parser.add_argument("--port",
                    help="grpc server port")
 parser.add_argument("--chunk_size",
                    type=str,
-                    default="0, 10, 5",
+                    default="5, 10, 5",
                    help="chunk")
-parser.add_argument("--encoder_chunk_look_back",
-                    type=int,
-                    default=4,
-                    help="number of chunks to lookback for encoder self-attention")
-parser.add_argument("--decoder_chunk_look_back",
-                    type=int,
-                    default=1,
-                    help="number of encoder chunks to lookback for decoder cross-attention")
 parser.add_argument("--chunk_interval",
                    type=int,
                    default=10,
                    help="chunk")
+parser.add_argument("--hotword",
+                    type=str,
+                    default="",
+                    help="hotword file path, one hotword perline (e.g.:阿里巴巴 20)")
 parser.add_argument("--audio_in",
                    type=str,
                    default=None,
@ -61,11 +57,14 @@ parser.add_argument("--output_dir",
                    type=str,
                    default=None,
                    help="output_dir")
-
 parser.add_argument("--ssl",
                    type=int,
                    default=1,
                    help="1 for ssl connect, 0 for no ssl")
+parser.add_argument("--use_itn",
+                    type=int,
+                    default=1,
+                    help="1 for using itn, 0 for not itn")
 parser.add_argument("--mode",
                    type=str,
                    default="2pass",
@ -106,10 +105,29 @@ async def record_microphone():
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
+    # hotwords
+    fst_dict = {}
+    hotword_msg = ""
+    if args.hotword.strip() != "":
+        f_scp = open(args.hotword)
+        hot_lines = f_scp.readlines()
+        for line in hot_lines:
+            words = line.strip().split(" ")
+            if len(words) < 2:
+                print("Please checkout format of hotwords")
+                continue
+            try:
+                fst_dict[" ".join(words[:-1])] = int(words[-1])
+            except ValueError:
+                print("Please checkout format of hotwords")
+        hotword_msg=json.dumps(fst_dict)

-    message = json.dumps({"mode": args.mode, "chunk_size": args.chunk_size, "encoder_chunk_look_back": args.encoder_chunk_look_back,
-                          "decoder_chunk_look_back": args.decoder_chunk_look_back, "chunk_interval": args.chunk_interval, 
-                          "wav_name": "microphone", "is_speaking": True})
+    use_itn=True
+    if args.use_itn == 0:
+        use_itn=False
+    
+    message = json.dumps({"mode": args.mode, "chunk_size": args.chunk_size, "chunk_interval": args.chunk_interval,
+                          "wav_name": "microphone", "is_speaking": True, "hotwords":hotword_msg, "itn": use_itn})
    #voices.put(message)
    await websocket.send(message)
    while True:
@ -127,6 +145,31 @@ async def record_from_scp(chunk_begin, chunk_size):
        wavs = f_scp.readlines()
    else:
        wavs = [args.audio_in]
+
+    # hotwords
+    fst_dict = {}
+    hotword_msg = ""
+    if args.hotword.strip() != "":
+        f_scp = open(args.hotword)
+        hot_lines = f_scp.readlines()
+        for line in hot_lines:
+            words = line.strip().split(" ")
+            if len(words) < 2:
+                print("Please checkout format of hotwords")
+                continue
+            try:
+                fst_dict[" ".join(words[:-1])] = int(words[-1])
+            except ValueError:
+                print("Please checkout format of hotwords")
+        hotword_msg=json.dumps(fst_dict)
+        print (hotword_msg)
+
+    sample_rate = 16000
+    wav_format = "pcm"
+    use_itn=True
+    if args.use_itn == 0:
+        use_itn=False
+     
    if chunk_size > 0:
        wavs = wavs[chunk_begin:chunk_begin + chunk_size]
    for wav in wavs:
@ -143,20 +186,13 @@ async def record_from_scp(chunk_begin, chunk_size):
            import wave
            with wave.open(wav_path, "rb") as wav_file:
                params = wav_file.getparams()
+                sample_rate = wav_file.getframerate()
                frames = wav_file.readframes(wav_file.getnframes())
                audio_bytes = bytes(frames)
        else:
-            import ffmpeg
-            try:
-                # This launches a subprocess to decode audio while down-mixing and resampling as necessary.
-                # Requires the ffmpeg CLI and `ffmpeg-python` package to be installed.
-                audio_bytes, _ = (
-                    ffmpeg.input(wav_path, threads=0)
-                    .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=16000)
-                    .run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
-                )
-            except ffmpeg.Error as e:
-                raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
+            wav_format = "others"
+            with open(wav_path, "rb") as f:
+                audio_bytes = f.read()

        # stride = int(args.chunk_size/1000*16000*2)
        stride = int(60 * args.chunk_size[1] / args.chunk_interval / 1000 * 16000 * 2)
@ -164,8 +200,9 @@ async def record_from_scp(chunk_begin, chunk_size):
        # print(stride)

        # send first time
-        message = json.dumps({"mode": args.mode, "chunk_size": args.chunk_size, "chunk_interval": args.chunk_interval,
-                              "wav_name": wav_name, "is_speaking": True})
+        message = json.dumps({"mode": args.mode, "chunk_size": args.chunk_size, "chunk_interval": args.chunk_interval, "audio_fs":sample_rate,
+                          "wav_name": wav_name, "wav_format": wav_format, "is_speaking": True, "hotwords":hotword_msg, "itn": use_itn})
+
        #voices.put(message)
        await websocket.send(message)
        is_speaking = True
@ -213,12 +250,17 @@ async def message(id):
        
            meg = await websocket.recv()
            meg = json.loads(meg)
-            # print(meg)
            wav_name = meg.get("wav_name", "demo")
            text = meg["text"]
+            timestamp=""
+            if "timestamp" in meg:
+                timestamp = meg["timestamp"]

            if ibest_writer is not None:
-                text_write_line = "{}\t{}\n".format(wav_name, text)
+                if timestamp !="":
+                    text_write_line = "{}\t{}\t{}\n".format(wav_name, text, timestamp)
+                else:
+                    text_write_line = "{}\t{}\n".format(wav_name, text)
                ibest_writer.write(text_write_line)
                
            if meg["mode"] == "online":
@ -227,15 +269,15 @@ async def message(id):
                os.system('clear')
                print("\rpid" + str(id) + ": " + text_print)
            elif meg["mode"] == "offline":
-                text_print += "{}".format(text)
+                if timestamp !="":
+                    text_print += "{} timestamp: {}".format(text, timestamp)
+                else:
+                    text_print += "{}".format(text)
+
                # text_print = text_print[-args.words_max_print:]
                # os.system('clear')
                print("\rpid" + str(id) + ": " + wav_name + ": " + text_print)
-                if ("is_final" in meg and meg["is_final"]==False):
-                    offline_msg_done = True
-                
-                if not "is_final" in meg:
-                    offline_msg_done = True
+                offline_msg_done = True
            else:
                if meg["mode"] == "2pass-online":
                    text_print_2pass_online += "{}".format(text)
--- a/runtime/readme.md
+++ b/runtime/readme.md
@ -17,7 +17,7 @@ Currently, the FunASR runtime-SDK supports the deployment of file transcription
 To meet the needs of different users, we have prepared different tutorials with text and images for both novice and advanced developers.

 ### Whats-new
- 2023/11/08: Adaptation to runtime structure changes (FunASR/funasr/runtime -> FunASR/runtime), docker image version funasr-runtime-sdk-en-cpu-0.1.1 ().
+- 2023/11/08: Adaptation to runtime structure changes (FunASR/funasr/runtime -> FunASR/runtime), docker image version funasr-runtime-sdk-en-cpu-0.1.1 (27017f70f72a).
 - 2023/10/16: English File Transcription Service 1.0 released, docker image version funasr-runtime-sdk-en-cpu-0.1.0 (e0de03eb0163), refer to the detailed documentation（[here](https://mp.weixin.qq.com/s/DZZUTj-6xwFfi-96ml--4A)）

 ### Technical Principles
@ -39,7 +39,7 @@ The FunASR real-time speech-to-text service software package not only performs r
 In order to meet the needs of different users for different scenarios, different tutorials are prepared:

 ### Whats-new
- 2023/11/08: Real-time Transcription Service 1.4 released, supporting server-side loading of hotwords (updated hotword communication protocol), adaptation to runtime structure changes (FunASR/funasr/runtime -> FunASR/runtime), docker image version funasr-runtime-sdk-online-cpu-0.1.4().
+- 2023/11/08: Real-time Transcription Service 1.4 released, supporting server-side loading of hotwords (updated hotword communication protocol), adaptation to runtime structure changes (FunASR/funasr/runtime -> FunASR/runtime), docker image version funasr-runtime-sdk-online-cpu-0.1.4(691974017c38).
 - 2023/09/19: Real-time Transcription Service 1.2 released, supporting hotwords, timestamps, and ITN model in 2pass mode, docker image version funasr-runtime-sdk-online-cpu-0.1.2 (7222c5319bcf).
 - 2023/08/11: Real-time Transcription Service 1.1 released, addressing some known bugs (including server crashes), docker image version funasr-runtime-sdk-online-cpu-0.1.1 (bdbdd0b27dee).
 - 2023/08/07: Real-time Transcription Service 1.0 released, docker image version funasr-runtime-sdk-online-cpu-0.1.0(bdbdd0b27dee), refer to the detailed documentation（[here](https://mp.weixin.qq.com/s/8He081-FM-9IEI4D-lxZ9w)）
@ -65,7 +65,7 @@ Currently, the FunASR runtime-SDK supports the deployment of file transcription
 To meet the needs of different users, we have prepared different tutorials with text and images for both novice and advanced developers.

 ### Whats-new
-2023/11/08: File Transcription Service 3.0 released, supporting punctuation large model, Ngram model, fst hotwords (updated hotword communication protocol), server-side loading of hotwords, adaptation to runtime structure changes (FunASR/funasr/runtime -> FunASR/runtime), docker image version funasr-runtime-sdk-cpu-0.3.0 (), refer to the detailed documentation （[here]()）
+2023/11/08: File Transcription Service 3.0 released, supporting punctuation large model, Ngram model, fst hotwords (updated hotword communication protocol), server-side loading of hotwords, adaptation to runtime structure changes (FunASR/funasr/runtime -> FunASR/runtime), docker image version funasr-runtime-sdk-cpu-0.3.0 (caa64bddbb43), refer to the detailed documentation （[here]()）
 2023/09/19: File Transcription Service 2.2 released, supporting ITN model, docker image version funasr-runtime-sdk-cpu-0.2.2 (2c5286be13e9).
 2023/08/22: File Transcription Service 2.0 released, integrated ffmpeg to support various audio and video inputs, supporting hotword model and timestamp model, docker image version funasr-runtime-sdk-cpu-0.2.0 (1ad3d19e0707), refer to the detailed documentation （[here](https://mp.weixin.qq.com/s/oJHe0MKDqTeuIFH-F7GHMg)）
 2023/07/03: File Transcription Service 1.0 released, docker image version funasr-runtime-sdk-cpu-0.1.0 (1ad3d19e0707), refer to the detailed documentation （[here](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww)）
--- a/runtime/readme_cn.md
+++ b/runtime/readme_cn.md
@ -2,8 +2,10 @@

 English Version（[docs](./readme.md)）

-FunASR是由达摩院语音实验室开源的一款语音识别基础框架，集成了语音端点检测、语音识别、标点断句等领域的工业级别模型，吸引了众多开发者参与体验和开发。为了解决工业落地的最后一公里，将模型集成到业务中去，我们开发了FunASR runtime-SDK。
-SDK 支持以下几种服务部署：
+FunASR是由阿里巴巴通义实验室语音团队开源的一款语音识别基础框架，集成了语音端点检测、语音识别、标点断句等领域的工业级别模型，吸引了众多开发者参与体验和开发。为了解决工业落地的最后一公里，将模型集成到业务中去，我们开发了社区软件包。
+支持以下几种服务部署：
+
+<img src="docs/images/sdk_roadmap.jpg"  width="900"/>

 - 中文离线文件转写服务（CPU版本），已完成
 - 中文流式语音识别服务（CPU版本），已完成
@ -17,20 +19,13 @@ SDK 支持以下几种服务部署：
 为了支持不同用户的需求，针对不同场景，准备了不同的图文教程：

 ### 最新动态
- 2023/11/08:   英文离线文件转写服务 1.1 发布，runtime结构变化适配（FunASR/funasr/runtime->FunASR/runtime），dokcer镜像版本funasr-runtime-sdk-en-cpu-0.1.1 ()
- 2023/10/16:   英文离线文件转写服务 1.0 发布，dokcer镜像版本funasr-runtime-sdk-en-cpu-0.1.0 (e0de03eb0163)，详细文档参考（[点击此处](https://mp.weixin.qq.com/s/DZZUTj-6xwFfi-96ml--4A)）
+- 2023/11/08:   英文离线文件转写服务 1.1 发布，runtime结构变化适配（FunASR/funasr/runtime->FunASR/runtime），dokcer镜像版本funasr-runtime-sdk-en-cpu-0.1.1 (27017f70f72a)
+- 2023/10/16:   英文离线文件转写服务 1.0 发布，dokcer镜像版本funasr-runtime-sdk-en-cpu-0.1.0 (e0de03eb0163)，原理介绍文档（[点击此处](https://mp.weixin.qq.com/s/DZZUTj-6xwFfi-96ml--4A)）

-### 便捷部署教程

-适用场景为，对服务部署SDK无修改需求，部署模型来自于ModelScope，或者用户finetune，详细教程参考（[点击此处](./docs/SDK_tutorial_en_zh.md)）
+### 部署与开发文档

-### 开发指南
-
-适用场景为，对服务部署SDK有修改需求，部署模型来自于ModelScope，或者用户finetune，详细文档参考（[点击此处](./docs/SDK_advanced_guide_offline_en_zh.md)）
-
-### 技术原理揭秘
-
-文档介绍了背后技术原理，识别准确率，计算效率等，以及核心优势介绍：便捷、高精度、高效率、长音频链路，详细文档参考（[点击此处](https://mp.weixin.qq.com/s/DZZUTj-6xwFfi-96ml--4A)）
+部署模型来自于ModelScope，或者用户finetune，支持用户定制服务，详细文档参考（[点击此处](./docs/SDK_advanced_guide_offline_en_zh.md)）


 ## 中文实时语音听写服务（CPU版本）
@ -38,23 +33,16 @@ FunASR实时语音听写服务软件包，既可以实时地进行语音转文
 为了支持不同用户的需求，针对不同场景，准备了不同的图文教程：

 ### 最新动态
- 2023/11/08:   中文实时语音听写服务 1.4 发布，支持服务端加载热词(更新热词通信协议)、runtime结构变化适配（FunASR/funasr/runtime->FunASR/runtime），dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.4 ()
+- 2023/11/08:   中文实时语音听写服务 1.4 发布，支持服务端加载热词(更新热词通信协议)、runtime结构变化适配（FunASR/funasr/runtime->FunASR/runtime），dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.4 (691974017c38)
 - 2023/09/19:   中文实时语音听写服务 1.2 发布，2pass模式支持热词、时间戳、ITN模型，dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.2 (7222c5319bcf)
 - 2023/08/11:   中文实时语音听写服务 1.1 发布，修复了部分已知的bug(包括server崩溃等)，dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.1 (bdbdd0b27dee)
- 2023/08/07:   中文实时语音听写服务 1.0 发布，dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.0 (bdbdd0b27dee)，详细文档参考（[点击此处](https://mp.weixin.qq.com/s/8He081-FM-9IEI4D-lxZ9w)）
-
-### 便捷部署教程
-
-适用场景为，对服务部署SDK无修改需求，部署模型来自于ModelScope，或者用户finetune，详细教程参考（[点击此处](./docs/SDK_tutorial_online_zh.md)）
+- 2023/08/07:   中文实时语音听写服务 1.0 发布，dokcer镜像版本funasr-runtime-sdk-online-cpu-0.1.0 (bdbdd0b27dee)，原理介绍文档（[点击此处](https://mp.weixin.qq.com/s/8He081-FM-9IEI4D-lxZ9w)）


-### 开发指南
+### 部署与开发文档

-适用场景为，对服务部署SDK有修改需求，部署模型来自于ModelScope，或者用户finetune，详细文档参考（[点击此处](./docs/SDK_advanced_guide_online_zh.md)）
+部署模型来自于ModelScope，或者用户finetune，支持用户定制服务，详细文档参考（[点击此处](./docs/SDK_advanced_guide_online_zh.md)）

-### 技术原理揭秘
-
-文档介绍了背后技术原理，识别准确率，计算效率等，以及核心优势介绍：便捷、高精度、高效率、长音频链路，详细文档参考（[点击此处](https://mp.weixin.qq.com/s/8He081-FM-9IEI4D-lxZ9w)）


 ## 中文离线文件转写服务（CPU版本）
@ -63,20 +51,14 @@ FunASR实时语音听写服务软件包，既可以实时地进行语音转文
 为了支持不同用户的需求，针对不同场景，准备了不同的图文教程：

 ### 最新动态
- 2023/11/08:   中文离线文件转写服务 3.0 发布，支持标点大模型、支持Ngram模型、支持fst热词(更新热词通信协议)、支持服务端加载热词、runtime结构变化适配（FunASR/funasr/runtime->FunASR/runtime），dokcer镜像版本funasr-runtime-sdk-cpu-0.3.0 ()，详细文档参考（[点击此处]()）
+
+- 2023/11/08:   中文离线文件转写服务 3.0 发布，支持标点大模型、支持Ngram模型、支持fst热词(更新热词通信协议)、支持服务端加载热词、runtime结构变化适配（FunASR/funasr/runtime->FunASR/runtime），dokcer镜像版本funasr-runtime-sdk-cpu-0.3.0 (caa64bddbb43)，原理介绍文档（[点击此处](https://mp.weixin.qq.com/s/jSbnKw_m31BUUbTukPSOIw)）
 - 2023/09/19:   中文离线文件转写服务 2.2 发布，支持ITN模型，dokcer镜像版本funasr-runtime-sdk-cpu-0.2.2 (2c5286be13e9)
- 2023/08/22:   中文离线文件转写服务 2.0 发布，集成ffmpeg支持多种音视频输入、支持热词模型、支持时间戳模型，dokcer镜像版本funasr-runtime-sdk-cpu-0.2.0 (1ad3d19e0707)，详细文档参考（[点击此处](https://mp.weixin.qq.com/s/oJHe0MKDqTeuIFH-F7GHMg)）
- 2023/07/03:   中文离线文件转写服务 1.0 发布，dokcer镜像版本funasr-runtime-sdk-cpu-0.1.0 (1ad3d19e0707)，详细文档参考（[点击此处](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww)）
+- 2023/08/22:   中文离线文件转写服务 2.0 发布，集成ffmpeg支持多种音视频输入、支持热词模型、支持时间戳模型，dokcer镜像版本funasr-runtime-sdk-cpu-0.2.0 (1ad3d19e0707)，原理介绍文档（[点击此处](https://mp.weixin.qq.com/s/oJHe0MKDqTeuIFH-F7GHMg)）
+- 2023/07/03:   中文离线文件转写服务 1.0 发布，dokcer镜像版本funasr-runtime-sdk-cpu-0.1.0 (1ad3d19e0707)，原理介绍文档（[点击此处](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww)）

-### 便捷部署教程
+### 部署与开发文档

-适用场景为，对服务部署SDK无修改需求，部署模型来自于ModelScope，或者用户finetune，详细教程参考（[点击此处](./docs/SDK_tutorial_zh.md)）
+部署模型来自于ModelScope，或者用户finetune，支持用户定制服务，详细文档参考（[点击此处](./docs/SDK_advanced_guide_offline_zh.md)）

-### 开发指南
-
-适用场景为，对服务部署SDK有修改需求，部署模型来自于ModelScope，或者用户finetune，详细文档参考（[点击此处](./docs/SDK_advanced_guide_offline_zh.md)）
-
-### 技术原理揭秘
-
-文档介绍了背后技术原理，识别准确率，计算效率等，以及核心优势介绍：便捷、高精度、高效率、长音频链路，详细文档参考（[点击此处](https://mp.weixin.qq.com/s/DHQwbgdBWcda0w_L60iUww)）
				`@ -1 +0,0 @@`
				`../../funasr/runtime/python/libtorch/README.md`
				`@ -1 +0,0 @@`
				`../../funasr/runtime/python/onnxruntime/README.md`
				`@ -1 +0,0 @@`
				`../../funasr/runtime/python/websocket/README.md`
 @ -1 +1 @@
 .8.2
 .8.3