python runtime

2025-09-15 15:08:35 +08:00 · 2024-07-22 20:36:39 +08:00 · 2024-07-22 20:36:39 +08:00 · 39c02d1694
commit 39c02d1694
parent 63f1588980
6 changed files with 146 additions and 21 deletions
--- a/README.md
+++ b/README.md
@ -42,6 +42,7 @@ Online Demo:

 <a name="What's News"></a>
 # What's New 🔥
+- 2024/7: Added Export Features for [ONNX](./demo_onnx.py) and [libtorch](./demo_libtorch.py), as well as Python Version Runtimes: [funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/), [funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)
 - 2024/7: The [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) voice understanding model is open-sourced, which offers high-precision multilingual speech recognition, emotion recognition, and audio event detection capabilities for Mandarin, Cantonese, English, Japanese, and Korean and leads to exceptionally low inference latency.  
 - 2024/7: The CosyVoice for natural speech generation with multi-language, timbre, and emotion control. CosyVoice excels in multi-lingual voice generation, zero-shot voice generation, cross-lingual voice cloning, and instruction-following capabilities. [CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice space](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
 - 2024/7: [FunASR](https://github.com/modelscope/FunASR) is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR.
@ -180,20 +181,47 @@ text = rich_transcription_postprocess(res[0][0]["text"])
 print(text)
 ```

-### Export and Test (*On going*)
+### Export and Test
+<details><summary>ONNX and Libtorch Export</summary>

+#### ONNX
 ```python
-# pip3 install -U funasr-onnx
+# pip3 install -U funasr funasr-onnx
+from pathlib import Path
 from funasr_onnx import SenseVoiceSmall
+from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess

-model_dir = "iic/SenseVoiceCTC"
-model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)

-wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
+model_dir = "iic/SenseVoiceSmall"

-result = model(wav_path)
-print(result)
+model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
+
+# inference
+wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
+
+res = model(wav_or_scp, language="auto", use_itn=True)
+print([rich_transcription_postprocess(i) for i in res])
 ```
+Note: ONNX model is exported to the original model directory.
+
+#### Libtorch
+```python
+from pathlib import Path
+from funasr_torch import SenseVoiceSmall
+from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
+
+
+model_dir = "iic/SenseVoiceSmall"
+
+model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
+
+wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
+
+res = model(wav_or_scp, language="auto", use_itn=True)
+print([rich_transcription_postprocess(i) for i in res])
+```
+Note: Libtorch model is exported to the original model directory.
+<details>

 ## Service

@ -235,6 +263,9 @@ python webui.py

 <div align="center"><img src="image/webui.png" width="700"/> </div>

+
+
+
 <a name="Community"></a>
 # Community
 If you encounter problems in use, you can directly raise Issues on the github page.
--- a/README_ja.md
+++ b/README_ja.md
@ -41,6 +41,7 @@ SenseVoiceは、音声認識（ASR）、言語識別（LID）、音声感情認

 <a name="最新动态"></a>
 # 最新情報 🔥
+- 2024/7：新しく[ONNX](./demo_onnx.py)と[libtorch](./demo_libtorch.py)のエクスポート機能を追加し、Pythonバージョンのランタイム：[funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/)、[funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)も提供開始。
 - 2024/7: [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) 多言語音声理解モデルがオープンソース化されました。中国語、広東語、英語、日本語、韓国語の多言語音声認識、感情認識、およびイベント検出能力をサポートし、非常に低い推論遅延を実現しています。
 - 2024/7: CosyVoiceは自然な音声生成に取り組んでおり、多言語、音色、感情制御をサポートします。多言語音声生成、ゼロショット音声生成、クロスランゲージ音声クローン、および指示に従う能力に優れています。[CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice オンライン体験](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
 - 2024/7: [FunASR](https://github.com/modelscope/FunASR) は、音声認識（ASR）、音声活動検出（VAD）、句読点復元、言語モデル、話者検証、話者分離、およびマルチトーカーASRなどの機能を提供する基本的な音声認識ツールキットです。
@ -184,20 +185,48 @@ print(text)

 未完了

-### エクスポートとテスト（*進行中*）
+### エクスポートとテスト
+<details><summary>ONNXとLibtorchのエクスポート</summary>

+#### ONNX
 ```python
-# pip3 install -U funasr-onnx
+# pip3 install -U funasr funasr-onnx
+from pathlib import Path
 from funasr_onnx import SenseVoiceSmall
+from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
+

 model_dir = "iic/SenseVoiceSmall"
-model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)

-wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
+model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)

-result = model(wav_path)
-print(result)
+# inference
+wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
+
+res = model(wav_or_scp, language="auto", use_itn=True)
+print([rich_transcription_postprocess(i) for i in res])
 ```
+備考：ONNXモデルは元のモデルディレクトリにエクスポートされます。
+
+#### Libtorch
+```python
+from pathlib import Path
+from funasr_torch import SenseVoiceSmall
+from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
+
+
+model_dir = "iic/SenseVoiceSmall"
+
+model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
+
+wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
+
+res = model(wav_or_scp, language="auto", use_itn=True)
+print([rich_transcription_postprocess(i) for i in res])
+```
+備考：Libtorchモデルは元のモデルディレクトリにエクスポートされます。
+
+<details>

 ### 展開

--- a/README_zh.md
+++ b/README_zh.md
@ -41,6 +41,7 @@ SenseVoice是具有音频理解能力的音频基础模型，包括语音识别

 <a name="最新动态"></a>
 # 最新动态 🔥
+- 2024/7：新增加导出 [ONNX](./demo_onnx.py) 与 [libtorch](./demo_libtorch.py) 功能，以及 python 版本 runtime：[funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/)，[funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)
 - 2024/7: [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) 多语言音频理解模型开源，支持中、粤、英、日、韩语的多语言语音识别，情感识别和事件检测能力，具有极低的推理延迟。。
 - 2024/7: CosyVoice致力于自然语音生成，支持多语言、音色和情感控制，擅长多语言语音生成、零样本语音生成、跨语言语音克隆以及遵循指令的能力。[CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice 在线体验](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
 - 2024/7: [FunASR](https://github.com/modelscope/FunASR) 是一个基础语音识别工具包，提供多种功能，包括语音识别（ASR）、语音端点检测（VAD）、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等。
@ -188,21 +189,48 @@ print(text)

 Undo

-### 导出与测试（*进行中*）
-
+### 导出与测试
+<details><summary>ONNX 与 Libtorch 导出</summary>

+#### ONNX
 ```python
-# pip3 install -U funasr-onnx
+# pip3 install -U funasr funasr-onnx
+from pathlib import Path
 from funasr_onnx import SenseVoiceSmall
+from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
+

 model_dir = "iic/SenseVoiceSmall"
-model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)

-wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
+model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)

-result = model(wav_path)
-print(result)
+# inference
+wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
+
+res = model(wav_or_scp, language="auto", use_itn=True)
+print([rich_transcription_postprocess(i) for i in res])
 ```
+备注：ONNX模型导出到原模型目录中
+
+#### Libtorch
+```python
+from pathlib import Path
+from funasr_torch import SenseVoiceSmall
+from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
+
+
+model_dir = "iic/SenseVoiceSmall"
+
+model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
+
+wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
+
+res = model(wav_or_scp, language="auto", use_itn=True)
+print([rich_transcription_postprocess(i) for i in res])
+```
+备注：Libtorch模型导出到原模型目录中
+
+</details>

 ### 部署

--- a/demo_libtorch.py
+++ b/demo_libtorch.py
@ -0,0 +1,18 @@
+#!/usr/bin/env python3
+# -*- encoding: utf-8 -*-
+# Copyright FunASR (https://github.com/FunAudioLLM/SenseVoice). All Rights Reserved.
+#  MIT License  (https://opensource.org/licenses/MIT)
+
+from pathlib import Path
+from funasr_torch import SenseVoiceSmall
+from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
+
+
+model_dir = "iic/SenseVoiceSmall"
+
+model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
+
+wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
+
+res = model(wav_or_scp, language="auto", use_itn=True)
+print([rich_transcription_postprocess(i) for i in res])
--- a/demo_onnx.py
+++ b/demo_onnx.py
@ -0,0 +1,19 @@
+#!/usr/bin/env python3
+# -*- encoding: utf-8 -*-
+# Copyright FunASR (https://github.com/FunAudioLLM/SenseVoice). All Rights Reserved.
+#  MIT License  (https://opensource.org/licenses/MIT)
+
+from pathlib import Path
+from funasr_onnx import SenseVoiceSmall
+from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
+
+
+model_dir = "iic/SenseVoiceSmall"
+
+model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
+
+# inference
+wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
+
+res = model(wav_or_scp, language="auto", use_itn=True)
+print([rich_transcription_postprocess(i) for i in res])
--- a/requirements.txt
+++ b/requirements.txt
@ -3,6 +3,6 @@ torchaudio
 modelscope
 huggingface
 huggingface_hub
-funasr>=1.1.2
+funasr>=1.1.3
 numpy<=1.26.4
 gradio