mirror of
https://github.com/FunAudioLLM/SenseVoice.git
synced 2025-09-15 15:08:35 +08:00
python runtime
This commit is contained in:
parent
63f1588980
commit
39c02d1694
45
README.md
45
README.md
@ -42,6 +42,7 @@ Online Demo:
|
||||
|
||||
<a name="What's News"></a>
|
||||
# What's New 🔥
|
||||
- 2024/7: Added Export Features for [ONNX](./demo_onnx.py) and [libtorch](./demo_libtorch.py), as well as Python Version Runtimes: [funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/), [funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)
|
||||
- 2024/7: The [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) voice understanding model is open-sourced, which offers high-precision multilingual speech recognition, emotion recognition, and audio event detection capabilities for Mandarin, Cantonese, English, Japanese, and Korean and leads to exceptionally low inference latency.
|
||||
- 2024/7: The CosyVoice for natural speech generation with multi-language, timbre, and emotion control. CosyVoice excels in multi-lingual voice generation, zero-shot voice generation, cross-lingual voice cloning, and instruction-following capabilities. [CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice space](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
|
||||
- 2024/7: [FunASR](https://github.com/modelscope/FunASR) is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR.
|
||||
@ -180,20 +181,47 @@ text = rich_transcription_postprocess(res[0][0]["text"])
|
||||
print(text)
|
||||
```
|
||||
|
||||
### Export and Test (*On going*)
|
||||
### Export and Test
|
||||
<details><summary>ONNX and Libtorch Export</summary>
|
||||
|
||||
#### ONNX
|
||||
```python
|
||||
# pip3 install -U funasr-onnx
|
||||
# pip3 install -U funasr funasr-onnx
|
||||
from pathlib import Path
|
||||
from funasr_onnx import SenseVoiceSmall
|
||||
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
|
||||
|
||||
model_dir = "iic/SenseVoiceCTC"
|
||||
model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)
|
||||
|
||||
wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
|
||||
model_dir = "iic/SenseVoiceSmall"
|
||||
|
||||
result = model(wav_path)
|
||||
print(result)
|
||||
model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
|
||||
|
||||
# inference
|
||||
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
|
||||
|
||||
res = model(wav_or_scp, language="auto", use_itn=True)
|
||||
print([rich_transcription_postprocess(i) for i in res])
|
||||
```
|
||||
Note: ONNX model is exported to the original model directory.
|
||||
|
||||
#### Libtorch
|
||||
```python
|
||||
from pathlib import Path
|
||||
from funasr_torch import SenseVoiceSmall
|
||||
from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
|
||||
|
||||
|
||||
model_dir = "iic/SenseVoiceSmall"
|
||||
|
||||
model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
|
||||
|
||||
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
|
||||
|
||||
res = model(wav_or_scp, language="auto", use_itn=True)
|
||||
print([rich_transcription_postprocess(i) for i in res])
|
||||
```
|
||||
Note: Libtorch model is exported to the original model directory.
|
||||
<details>
|
||||
|
||||
## Service
|
||||
|
||||
@ -235,6 +263,9 @@ python webui.py
|
||||
|
||||
<div align="center"><img src="image/webui.png" width="700"/> </div>
|
||||
|
||||
|
||||
|
||||
|
||||
<a name="Community"></a>
|
||||
# Community
|
||||
If you encounter problems in use, you can directly raise Issues on the github page.
|
||||
|
||||
41
README_ja.md
41
README_ja.md
@ -41,6 +41,7 @@ SenseVoiceは、音声認識(ASR)、言語識別(LID)、音声感情認
|
||||
|
||||
<a name="最新动态"></a>
|
||||
# 最新情報 🔥
|
||||
- 2024/7:新しく[ONNX](./demo_onnx.py)と[libtorch](./demo_libtorch.py)のエクスポート機能を追加し、Pythonバージョンのランタイム:[funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/)、[funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)も提供開始。
|
||||
- 2024/7: [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) 多言語音声理解モデルがオープンソース化されました。中国語、広東語、英語、日本語、韓国語の多言語音声認識、感情認識、およびイベント検出能力をサポートし、非常に低い推論遅延を実現しています。
|
||||
- 2024/7: CosyVoiceは自然な音声生成に取り組んでおり、多言語、音色、感情制御をサポートします。多言語音声生成、ゼロショット音声生成、クロスランゲージ音声クローン、および指示に従う能力に優れています。[CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice オンライン体験](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
|
||||
- 2024/7: [FunASR](https://github.com/modelscope/FunASR) は、音声認識(ASR)、音声活動検出(VAD)、句読点復元、言語モデル、話者検証、話者分離、およびマルチトーカーASRなどの機能を提供する基本的な音声認識ツールキットです。
|
||||
@ -184,20 +185,48 @@ print(text)
|
||||
|
||||
未完了
|
||||
|
||||
### エクスポートとテスト(*進行中*)
|
||||
### エクスポートとテスト
|
||||
<details><summary>ONNXとLibtorchのエクスポート</summary>
|
||||
|
||||
#### ONNX
|
||||
```python
|
||||
# pip3 install -U funasr-onnx
|
||||
# pip3 install -U funasr funasr-onnx
|
||||
from pathlib import Path
|
||||
from funasr_onnx import SenseVoiceSmall
|
||||
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
|
||||
|
||||
|
||||
model_dir = "iic/SenseVoiceSmall"
|
||||
model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)
|
||||
|
||||
wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
|
||||
model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
|
||||
|
||||
result = model(wav_path)
|
||||
print(result)
|
||||
# inference
|
||||
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
|
||||
|
||||
res = model(wav_or_scp, language="auto", use_itn=True)
|
||||
print([rich_transcription_postprocess(i) for i in res])
|
||||
```
|
||||
備考:ONNXモデルは元のモデルディレクトリにエクスポートされます。
|
||||
|
||||
#### Libtorch
|
||||
```python
|
||||
from pathlib import Path
|
||||
from funasr_torch import SenseVoiceSmall
|
||||
from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
|
||||
|
||||
|
||||
model_dir = "iic/SenseVoiceSmall"
|
||||
|
||||
model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
|
||||
|
||||
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
|
||||
|
||||
res = model(wav_or_scp, language="auto", use_itn=True)
|
||||
print([rich_transcription_postprocess(i) for i in res])
|
||||
```
|
||||
備考:Libtorchモデルは元のモデルディレクトリにエクスポートされます。
|
||||
|
||||
<details>
|
||||
|
||||
### 展開
|
||||
|
||||
|
||||
42
README_zh.md
42
README_zh.md
@ -41,6 +41,7 @@ SenseVoice是具有音频理解能力的音频基础模型,包括语音识别
|
||||
|
||||
<a name="最新动态"></a>
|
||||
# 最新动态 🔥
|
||||
- 2024/7:新增加导出 [ONNX](./demo_onnx.py) 与 [libtorch](./demo_libtorch.py) 功能,以及 python 版本 runtime:[funasr-onnx-0.4.0](https://pypi.org/project/funasr-onnx/),[funasr-torch-0.1.1](https://pypi.org/project/funasr-torch/)
|
||||
- 2024/7: [SenseVoice-Small](https://www.modelscope.cn/models/iic/SenseVoiceSmall) 多语言音频理解模型开源,支持中、粤、英、日、韩语的多语言语音识别,情感识别和事件检测能力,具有极低的推理延迟。。
|
||||
- 2024/7: CosyVoice致力于自然语音生成,支持多语言、音色和情感控制,擅长多语言语音生成、零样本语音生成、跨语言语音克隆以及遵循指令的能力。[CosyVoice repo](https://github.com/FunAudioLLM/CosyVoice) and [CosyVoice 在线体验](https://www.modelscope.cn/studios/iic/CosyVoice-300M).
|
||||
- 2024/7: [FunASR](https://github.com/modelscope/FunASR) 是一个基础语音识别工具包,提供多种功能,包括语音识别(ASR)、语音端点检测(VAD)、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等。
|
||||
@ -188,21 +189,48 @@ print(text)
|
||||
|
||||
Undo
|
||||
|
||||
### 导出与测试(*进行中*)
|
||||
|
||||
### 导出与测试
|
||||
<details><summary>ONNX 与 Libtorch 导出</summary>
|
||||
|
||||
#### ONNX
|
||||
```python
|
||||
# pip3 install -U funasr-onnx
|
||||
# pip3 install -U funasr funasr-onnx
|
||||
from pathlib import Path
|
||||
from funasr_onnx import SenseVoiceSmall
|
||||
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
|
||||
|
||||
|
||||
model_dir = "iic/SenseVoiceSmall"
|
||||
model = SenseVoiceSmall(model_dir, batch_size=1, quantize=True)
|
||||
|
||||
wav_path = [f'~/.cache/modelscope/hub/{model_dir}/example/asr_example.wav']
|
||||
model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
|
||||
|
||||
result = model(wav_path)
|
||||
print(result)
|
||||
# inference
|
||||
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
|
||||
|
||||
res = model(wav_or_scp, language="auto", use_itn=True)
|
||||
print([rich_transcription_postprocess(i) for i in res])
|
||||
```
|
||||
备注:ONNX模型导出到原模型目录中
|
||||
|
||||
#### Libtorch
|
||||
```python
|
||||
from pathlib import Path
|
||||
from funasr_torch import SenseVoiceSmall
|
||||
from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
|
||||
|
||||
|
||||
model_dir = "iic/SenseVoiceSmall"
|
||||
|
||||
model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
|
||||
|
||||
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
|
||||
|
||||
res = model(wav_or_scp, language="auto", use_itn=True)
|
||||
print([rich_transcription_postprocess(i) for i in res])
|
||||
```
|
||||
备注:Libtorch模型导出到原模型目录中
|
||||
|
||||
</details>
|
||||
|
||||
### 部署
|
||||
|
||||
|
||||
18
demo_libtorch.py
Normal file
18
demo_libtorch.py
Normal file
@ -0,0 +1,18 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- encoding: utf-8 -*-
|
||||
# Copyright FunASR (https://github.com/FunAudioLLM/SenseVoice). All Rights Reserved.
|
||||
# MIT License (https://opensource.org/licenses/MIT)
|
||||
|
||||
from pathlib import Path
|
||||
from funasr_torch import SenseVoiceSmall
|
||||
from funasr_torch.utils.postprocess_utils import rich_transcription_postprocess
|
||||
|
||||
|
||||
model_dir = "iic/SenseVoiceSmall"
|
||||
|
||||
model = SenseVoiceSmall(model_dir, batch_size=10, device="cuda:0")
|
||||
|
||||
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
|
||||
|
||||
res = model(wav_or_scp, language="auto", use_itn=True)
|
||||
print([rich_transcription_postprocess(i) for i in res])
|
||||
19
demo_onnx.py
Normal file
19
demo_onnx.py
Normal file
@ -0,0 +1,19 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- encoding: utf-8 -*-
|
||||
# Copyright FunASR (https://github.com/FunAudioLLM/SenseVoice). All Rights Reserved.
|
||||
# MIT License (https://opensource.org/licenses/MIT)
|
||||
|
||||
from pathlib import Path
|
||||
from funasr_onnx import SenseVoiceSmall
|
||||
from funasr_onnx.utils.postprocess_utils import rich_transcription_postprocess
|
||||
|
||||
|
||||
model_dir = "iic/SenseVoiceSmall"
|
||||
|
||||
model = SenseVoiceSmall(model_dir, batch_size=10, quantize=True)
|
||||
|
||||
# inference
|
||||
wav_or_scp = ["{}/.cache/modelscope/hub/{}/example/en.mp3".format(Path.home(), model_dir)]
|
||||
|
||||
res = model(wav_or_scp, language="auto", use_itn=True)
|
||||
print([rich_transcription_postprocess(i) for i in res])
|
||||
@ -3,6 +3,6 @@ torchaudio
|
||||
modelscope
|
||||
huggingface
|
||||
huggingface_hub
|
||||
funasr>=1.1.2
|
||||
funasr>=1.1.3
|
||||
numpy<=1.26.4
|
||||
gradio
|
||||
Loading…
Reference in New Issue
Block a user