This commit is contained in:
游雁 2023-08-09 10:55:34 +08:00
parent 386bfbec76
commit 8e2b7a67b9
4 changed files with 91 additions and 2 deletions

View File

@ -29,7 +29,7 @@ curl -O https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funas
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-en.sh;
```
Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_offline.md)).
Execute the deployment tool and press the Enter key at the prompt to complete the installation and deployment of the server. Currently, the convenient deployment tool only supports Linux environments. For other environments, please refer to the development guide ([docs](./SDK_advanced_guide_online.md)).
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
```

View File

@ -30,7 +30,7 @@ curl -O https://raw.githubusercontent.com/alibaba-damo-academy/FunASR/main/funas
# curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-zh.sh;
```
执行部署工具在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境其他环境部署参考开发指南[点击此处](#客户端用法详解)
执行部署工具在提示处输入回车键即可完成服务端安装与部署。目前便捷部署工具暂时仅支持Linux环境其他环境部署参考开发指南[点击此处](./SDK_advanced_guide_online_zh.md)
```shell
sudo bash funasr-runtime-deploy-online-cpu-zh.sh install --workspace ./funasr-runtime-resources
```

View File

@ -0,0 +1,88 @@
([简体中文](./websocket_protocol_zh.md)|English)
# WebSocket/gRPC Communication Protocol
## Offline File Transcription
### Sending Data from Client to Server
#### Message Format
Configuration parameters and meta information are in JSON format, while audio data is in bytes.
#### Initial Communication
The message (which needs to be serialized in JSON) is:
```text
{"mode": "offline", "wav_name": "wav_name", "is_speaking": True,"wav_format":"pcm"}
```
Parameter explanation:
```text
`mode`: `offline`, indicating the inference mode for offline file transcription
`wav_name`: the name of the audio file to be transcribed
`wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc.
`is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
`audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
```
#### Sending Audio Data
For PCM format, directly send the audio data. For other audio formats, send the header information and audio and video bytes data together. Multiple sampling rates and audio and video formats are supported.
#### Sending End of Audio Flag
After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON):
```text
{"is_speaking": False}
```
### Sending Data from Server to Client
#### Sending Recognition Results
The message (serialized in JSON) is:
```text
{"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
```
Parameter explanation:
```text
`mode`: `offline`, indicating the inference mode for offline file transcription
`wav_name`: the name of the audio file to be transcribed
`text`: the text output of speech recognition
`is_final`: indicating the end of recognition
```
## Real-time Speech Recognition
### System Architecture Diagram
<div align="left"><img src="images/2pass.jpg" width="400"/></div>
### Sending Data from Client to Server
#### Message Format
Configuration parameters and meta information are in JSON format, while audio data is in bytes.
#### Initial Communication
The message (which needs to be serialized in JSON) is:
```text
{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5]
```
Parameter explanation:
```text
`mode`: `offline` indicates the inference mode for single-sentence recognition; `online` indicates the inference mode for real-time speech recognition; `2pass` indicates real-time speech recognition and offline model correction for sentence endings.
`wav_name`: the name of the audio file to be transcribed
`wav_format`: the audio and video file extension, such as pcm, mp3, mp4, etc. (Note: only PCM audio streams are supported in version 1.0)
`is_speaking`: False indicates the end of a sentence, such as a VAD segmentation point or the end of a WAV file
`chunk_size`: indicates the latency configuration of the streaming model, `[5,10,5]` indicates that the current audio is 600ms long, with a 300ms look-ahead and look-back time.
`audio_fs`: when the input audio is in PCM format, the audio sampling rate parameter needs to be added
```
#### Sending Audio Data
Directly send the audio data, removing the header information and sending only the bytes data. Supported audio sampling rates are 8000 (which needs to be specified as audio_fs in message), and 16000.
#### Sending End of Audio Flag
After sending the audio data, an end-of-audio flag needs to be sent (which needs to be serialized in JSON):
```text
{"is_speaking": False}
```
### Sending Data from Server to Client
#### Sending Recognition Results
The message (serialized in JSON) is:
```text
{"mode": "2pass-online", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}
```
Parameter explanation:
```text
`mode`: indicates the inference mode, divided into `2pass-online` for real-time recognition results and `2pass-offline` for 2-pass corrected recognition results.
`wav_name`: the name of the audio file to be transcribed
`text`: the text output of speech recognition
`is_final`: indicating the end of recognition
```

View File

@ -1,3 +1,4 @@
(简体中文|[English](./websocket_protocol.md))
# websocket/grpc通信协议
## 离线文件转写
### 从客户端往服务端发送数据