diff --git a/docs/huggingface_models.md b/docs/huggingface_models.md
index 1568dd1e0..ad367dea5 100644
--- a/docs/huggingface_models.md
+++ b/docs/huggingface_models.md
@@ -9,11 +9,9 @@ Here we provided several pretrained models on different datasets. The details of
 ### Speech Recognition Models
 #### Paraformer Models
 
-[//]: # (|                                                                     Model Name                                                                     | Language |          Training Data           | Vocab Size | Parameter | Offline/Online | Notes                                                                                                                           |)
-
-[//]: # (|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|)
-
-[//]: # (|        [Paraformer-large]&#40;https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary&#41;        | CN & EN  | Alibaba Speech Data &#40;60000hours&#41; |    8404    |   220M    |    Offline     | Duration of input wav <= 20s                                                                                                    |)
+|                               Model Name                                | Language |           Training Data            | Vocab Size | Parameter | Offline/Online | Notes                                                                                                                           |
+|:-----------------------------------------------------------------------:|:--------:|:----------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
+| [Paraformer-large](https://huggingface.co/funasr/paraformer-large)      | CN & EN  | Alibaba Speech Data (60000hours)   |    8404    |   220M    |    Offline     | Duration of input wav <= 20s                                                                                                    |
 
 [//]: # (| [Paraformer-large-long]&#40;https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary&#41; | CN & EN  | Alibaba Speech Data &#40;60000hours&#41; |    8404    |   220M    |    Offline     | Which ould deal with arbitrary length input wav                                                                                 |)
 
@@ -77,21 +75,17 @@ Here we provided several pretrained models on different datasets. The details of
 
 ### Voice Activity Detection Models
 
-[//]: # (|                                           Model Name                                           |        Training Data         | Parameters | Sampling Rate | Notes |)
-
-[//]: # (|:----------------------------------------------------------------------------------------------:|:----------------------------:|:----------:|:-------------:|:------|)
-
-[//]: # (| [FSMN-VAD]&#40;https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary&#41; | Alibaba Speech Data &#40;5000hours&#41; |    0.4M    |     16000     |       |)
+|                      Model Name                      |        Training Data         | Parameters | Sampling Rate | Notes |
+|:----------------------------------------------------:|:----------------------------:|:----------:|:-------------:|:------|
+| [FSMN-VAD](https://huggingface.co/funasr/FSMN-VAD)   | Alibaba Speech Data (5000hours) |    0.4M    |     16000     |       |
 
 [//]: # (|   [FSMN-VAD]&#40;https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary&#41;        | Alibaba Speech Data &#40;5000hours&#41; |    0.4M    |     8000      |       |)
 
 ### Punctuation Restoration Models
 
-[//]: # (|                                                         Model Name                                                         |        Training Data         | Parameters | Vocab Size| Offline/Online | Notes |)
-
-[//]: # (|:--------------------------------------------------------------------------------------------------------------------------:|:----------------------------:|:----------:|:----------:|:--------------:|:------|)
-
-[//]: # (|      [CT-Transformer]&#40;https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary&#41;      | Alibaba Text Data |    70M     |    272727     |    Offline     |   offline punctuation model    |)
+|                              Model Name                              |        Training Data         | Parameters | Vocab Size| Offline/Online | Notes |
+|:--------------------------------------------------------------------:|:----------------------------:|:----------:|:----------:|:--------------:|:------|
+| [CT-Transformer](https://huggingface.co/funasr/CT-Transformer-punc)  | Alibaba Text Data |    70M     |    272727     |    Offline     |   offline punctuation model    |
 
 [//]: # (| [CT-Transformer]&#40;https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727/summary&#41;      | Alibaba Text Data |    70M     |    272727     |     Online     |  online punctuation model     |)
 
diff --git a/docs/index.rst b/docs/index.rst
index f7afe809e..73f57fd1f 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -31,12 +31,7 @@ Overview
    ./academic_recipe/sd_recipe.md
 
 
-.. toctree::
-   :maxdepth: 1
-   :caption: Model Zoo
 
-   ./modelscope_models.md
-   ./huggingface_models.md
 
 .. toctree::
    :maxdepth: 1
@@ -56,11 +51,13 @@ Overview
 
    Undo
 
+
 .. toctree::
    :maxdepth: 1
-   :caption: Funasr Library
+   :caption: Model Zoo
 
-   ./build_task.md
+   ./modelscope_models.md
+   ./huggingface_models.md
 
 .. toctree::
    :maxdepth: 1
@@ -82,6 +79,13 @@ Overview
    ./benchmark/benchmark_onnx_cpp.md
    ./benchmark/benchmark_libtorch.md
 
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Funasr Library
+
+   ./build_task.md
+
 .. toctree::
    :maxdepth: 1
    :caption: Papers
diff --git a/docs/modelscope_models.md b/docs/modelscope_models.md
index 3538ae0d3..b000fcaea 100644
--- a/docs/modelscope_models.md
+++ b/docs/modelscope_models.md
@@ -13,7 +13,7 @@ Here we provided several pretrained models on different datasets. The details of
 |:--------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:--------------------------------:|:----------:|:---------:|:--------------:|:--------------------------------------------------------------------------------------------------------------------------------|
 |        [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)        | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Duration of input wav <= 20s                                                                                                    |
 | [Paraformer-large-long](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which ould deal with arbitrary length input wav                                                                                 |
-| [paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
+| [Paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary) | CN & EN  | Alibaba Speech Data (60000hours) |    8404    |   220M    |    Offline     | Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords. |
 |              [Paraformer](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary)              | CN & EN  | Alibaba Speech Data (50000hours) |    8358    |    68M    |    Offline     | Duration of input wav <= 20s                                                                                                    |
 |          [Paraformer-online](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary)           | CN & EN  | Alibaba Speech Data (50000hours) |    8404    |    68M    |     Online     | Which could deal with streaming input                                                                                           |
 |       [Paraformer-tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary)       |    CN    |  Alibaba Speech Data (200hours)  |    544     |   5.2M    |    Offline     | Lightweight Paraformer model which supports Mandarin command words recognition                                                  |
diff --git a/egs_modelscope/asr/TEMPLATE/README.md b/egs_modelscope/asr/TEMPLATE/README.md
index 19acefeb9..c64503389 100644
--- a/egs_modelscope/asr/TEMPLATE/README.md
+++ b/egs_modelscope/asr/TEMPLATE/README.md
@@ -1,7 +1,7 @@
 # Speech Recognition
 
 > **Note**: 
-> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take typic model as example to demonstrate the usage.
+> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take the typic models as examples to demonstrate the usage.
 
 ## Inference
 
@@ -62,10 +62,10 @@ Undo
 ##### Define pipeline
 - `task`: `Tasks.auto_speech_recognition`
 - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
-- `ngpu`: `1` (Defalut), decoding on GPU. If ngpu=0, decoding on CPU
-- `ncpu`: `1` (Defalut), sets the number of threads used for intraop parallelism on CPU 
-- `output_dir`: `None` (Defalut), the output path of results if set
-- `batch_size`: `1` (Defalut), batch size when decoding
+- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
+- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU 
+- `output_dir`: `None` (Default), the output path of results if set
+- `batch_size`: `1` (Default), batch size when decoding
 ##### Infer pipeline
 - `audio_in`: the input to decode, which could be: 
   - wav_path, `e.g.`: asr_example.wav,
@@ -79,7 +79,7 @@ Undo
   ```
   In this case of `wav.scp` input, `output_dir` must be set to save the output results
 - `audio_fs`: audio sampling rate, only set when audio_in is pcm audio
-- `output_dir`: None (Defalut), the output path of results if set
+- `output_dir`: None (Default), the output path of results if set
 
 ### Inference with multi-thread CPUs or multi GPUs
 FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
diff --git a/egs_modelscope/vad/TEMPLATE/README.md b/egs_modelscope/vad/TEMPLATE/README.md
index df45b35e7..a4b5e795f 100644
--- a/egs_modelscope/vad/TEMPLATE/README.md
+++ b/egs_modelscope/vad/TEMPLATE/README.md
@@ -1,7 +1,7 @@
 # Voice Activity Detection
 
 > **Note**: 
-> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take model of FSMN-VAD as example to demonstrate the usage.
+> The modelscope pipeline supports all the models in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope) to inference and finetine. Here we take the model of FSMN-VAD as example to demonstrate the usage.
 
 ## Inference
 
@@ -47,10 +47,10 @@ Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/
 ##### Define pipeline
 - `task`: `Tasks.voice_activity_detection`
 - `model`: model name in [model zoo](https://alibaba-damo-academy.github.io/FunASR/en/modelscope_models.html#pretrained-models-on-modelscope), or model path in local disk
-- `ngpu`: `1` (Defalut), decoding on GPU. If ngpu=0, decoding on CPU
-- `ncpu`: `1` (Defalut), sets the number of threads used for intraop parallelism on CPU 
-- `output_dir`: `None` (Defalut), the output path of results if set
-- `batch_size`: `1` (Defalut), batch size when decoding
+- `ngpu`: `1` (Default), decoding on GPU. If ngpu=0, decoding on CPU
+- `ncpu`: `1` (Default), sets the number of threads used for intraop parallelism on CPU 
+- `output_dir`: `None` (Default), the output path of results if set
+- `batch_size`: `1` (Default), batch size when decoding
 ##### Infer pipeline
 - `audio_in`: the input to decode, which could be: 
   - wav_path, `e.g.`: asr_example.wav,
@@ -64,7 +64,7 @@ Full code of demo, please ref to [demo](https://github.com/alibaba-damo-academy/
   ```
   In this case of `wav.scp` input, `output_dir` must be set to save the output results
 - `audio_fs`: audio sampling rate, only set when audio_in is pcm audio
-- `output_dir`: None (Defalut), the output path of results if set
+- `output_dir`: None (Default), the output path of results if set
 
 ### Inference with multi-thread CPUs or multi GPUs
 FunASR also offer recipes [infer.sh](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/vad/TEMPLATE/infer.sh) to decode with multi-thread CPUs, or multi GPUs.
diff --git a/funasr/runtime/python/onnxruntime/README.md b/funasr/runtime/python/onnxruntime/README.md
index 1f7fcaa68..ed3deb6d3 100644
--- a/funasr/runtime/python/onnxruntime/README.md
+++ b/funasr/runtime/python/onnxruntime/README.md
@@ -19,7 +19,7 @@ python -m funasr.export.export_model --model-name damo/speech_paraformer-large_a
 ```
 
 
-## Install the `funasr_onnx`
+## Install `funasr_onnx`
 
 install from pip
 ```shell
@@ -46,16 +46,22 @@ pip install -e ./
  from funasr_onnx import Paraformer
 
  model_dir = "./export/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
- model = Paraformer(model_dir, batch_size=1)
+ model = Paraformer(model_dir, batch_size=1, quantize=True)
 
  wav_path = ['./export/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav']
 
  result = model(wav_path)
  print(result)
  ```
-- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
-- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
-- Output: `List[str]`: recognition result
+- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
+- `batch_size`: `1` (Default), the batch size duration inference
+- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
+- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
+- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
+
+Input: wav formt file, support formats: `str, np.ndarray, List[str]`
+
+Output: `List[str]`: recognition result
 
 #### Paraformer-online
 
@@ -71,9 +77,16 @@ model = Fsmn_vad(model_dir)
 result = model(wav_path)
 print(result)
 ```
-- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
-- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
-- Output: `List[str]`: recognition result
+- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
+- `batch_size`: `1` (Default), the batch size duration inference
+- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
+- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
+- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
+
+Input: wav formt file, support formats: `str, np.ndarray, List[str]`
+
+Output: `List[str]`: recognition result
+
 
 #### FSMN-VAD-online
 ```python
@@ -105,9 +118,16 @@ for sample_offset in range(0, speech_length, min(step, speech_length - sample_of
     if segments_result:
         print(segments_result)
 ```
-- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
-- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
-- Output: `List[str]`: recognition result
+- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
+- `batch_size`: `1` (Default), the batch size duration inference
+- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
+- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
+- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
+
+Input: wav formt file, support formats: `str, np.ndarray, List[str]`
+
+Output: `List[str]`: recognition result
+
 
 ### Punctuation Restoration
 #### CT-Transformer
@@ -121,9 +141,15 @@ text_in="跨境河流是养育沿岸人民的生命之源长期以来为帮助
 result = model(text_in)
 print(result[0])
 ```
-- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
-- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
-- Output: `List[str]`: recognition result
+- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
+- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
+- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
+- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
+
+Input: `str`, raw text of asr result
+
+Output: `List[str]`: recognition result
+
 
 #### CT-Transformer-online
 ```python
@@ -143,9 +169,14 @@ for vad in vads:
 
 print(rec_result_all)
 ```
-- Model_dir: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
-- Input: wav formt file, support formats: `str, np.ndarray, List[str]`
-- Output: `List[str]`: recognition result
+- `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
+- `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
+- `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
+- `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
+
+Input: `str`, raw text of asr result
+
+Output: `List[str]`: recognition result
 
 ## Performance benchmark