FunASR/funasr/runtime/docs/websocket_protocol_zh.md
Yabin Li 61ed60695a
coauthor:duj12, add itn;add timestamp、hotword to 2pass; (#966)
* Add ITN,include openfst/gflags in onnxruntime/third_party.

* 2pass server support Hotword and Timestamp. The start_time of each segment need to be fix.

* add global time start and end of each frame(both online and offline), support two-pass timestamp(both segment and token level).

* update websocket cmake.

* 2pass server support itn, hw and tp.

* Add local build and run. Add timestamp in 2pass server, update cmakelist.

* fix filemode bug in h5, avoid 2pass wss server close before final.

* offline server add itn.

* offline server add ITN.

* update hotword model dir.

* Add Acknowledgement to WeTextProcessing(https://github.com/wenet-e2e/WeTextProcessing)

* adapted to original FunASR.

* adapted to itn timestamp hotword

* merge from main (#949)

* fix empty timestamp list inference

* punc large

* fix decoding_ind none bug

* fix decoding_ind none bug

* docs

* setup

* change eng punc in offline model

* update contextual export

* update proc for oov in hotword onnx inference

* add python http code (#940)

* funasr-onnx 0.2.2

* funasr-onnx 0.2.3

* bug fix in timestamp inference

* fix bug in timestamp inference

* Update preprocessor.py

---------

Co-authored-by: shixian.shi <shixian.shi@alibaba-inc.com>
Co-authored-by: 游雁 <zhifu.gzf@alibaba-inc.com>
Co-authored-by: haoneng.lhn <haoneng.lhn@alibaba-inc.com>
Co-authored-by: mengzhe.cmz <mengzhe.cmz@alibaba-inc.com>
Co-authored-by: Xian Shi <40013335+R1ckShi@users.noreply.github.com>
Co-authored-by: chenmengzheAAA <123789350+chenmengzheAAA@users.noreply.github.com>
Co-authored-by: 夜雨飘零 <yeyupiaoling@foxmail.com>

* update docs

* update deploy_tools

---------

Co-authored-by: dujing <dujing@xmov.ai>
Co-authored-by: Jean Du <37294470+duj12@users.noreply.github.com>
Co-authored-by: shixian.shi <shixian.shi@alibaba-inc.com>
Co-authored-by: 游雁 <zhifu.gzf@alibaba-inc.com>
Co-authored-by: haoneng.lhn <haoneng.lhn@alibaba-inc.com>
Co-authored-by: mengzhe.cmz <mengzhe.cmz@alibaba-inc.com>
Co-authored-by: Xian Shi <40013335+R1ckShi@users.noreply.github.com>
Co-authored-by: chenmengzheAAA <123789350+chenmengzheAAA@users.noreply.github.com>
Co-authored-by: 夜雨飘零 <yeyupiaoling@foxmail.com>
2023-09-19 10:09:58 +08:00

96 lines
4.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

(简体中文|[English](./websocket_protocol.md))
# websocket/grpc通信协议
本协议为FunASR软件包通信协议分为离线文件转写[部署文档](./SDK_tutorial_zh.md)),实时语音识别([部署文档](./SDK_tutorial_online_zh.md)
## 离线文件转写
### 从客户端往服务端发送数据
#### 消息格式
配置参数与meta信息用json音频数据采用bytes
#### 首次通信
message为需要用json序列化
```text
{"mode": "offline", "wav_name": "wav_name","wav_format":"pcm","is_speaking": True,"hotwords":"阿里巴巴 达摩院 阿里云","itn":True}
```
参数介绍:
```text
`mode``offline`,表示推理模式为离线文件转写
`wav_name`:表示需要推理音频文件名
`wav_format`表示音视频文件后缀名可选pcm、mp3、mp4等
`is_speaking`False 表示断句尾点例如vad切割点或者一条wav结束
`audio_fs`当输入音频为pcm数据时需要加上音频采样率参数
`hotwords`如果AM为热词模型需要向服务端发送热词数据格式为字符串热词之间用" "分隔,例如 "阿里巴巴 达摩院 阿里云"
`itn`: 设置是否使用itn默认True
```
#### 发送音频数据
pcm直接将音频数据其他格式音频数据连同头部信息与音视频bytes数据发送支持多种采样率与音视频格式
#### 发送音频结束标志
音频数据发送结束后需要发送结束标志需要用json序列化
```text
{"is_speaking": False}
```
### 从服务端往客户端发数据
#### 发送识别结果
message为采用json序列化
```text
{"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True,"timestamp":"[[100,200], [200,500]]"}
```
参数介绍:
```text
`mode``offline`,表示推理模式为离线文件转写
`wav_name`:表示需要推理音频文件名
`text`:表示语音识别输出文本
`is_final`:表示识别结束
`timestamp`如果AM为时间戳模型会返回此字段表示时间戳格式为 "[[100,200], [200,500]]"(ms)
```
## 实时语音识别
### 系统架构图
<div align="left"><img src="images/2pass.jpg" width="600"/></div>
### 从客户端往服务端发送数据
#### 消息格式
配置参数与meta信息用json音频数据采用bytes
#### 首次通信
message为需要用json序列化
```text
{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5], "hotwords":"阿里巴巴 达摩院 阿里云","itn":True}
```
参数介绍:
```text
`mode``offline`,表示推理模式为一句话识别;`online`,表示推理模式为实时语音识别;`2pass`:表示为实时语音识别,并且说话句尾采用离线模型进行纠错。
`wav_name`:表示需要推理音频文件名
`wav_format`表示音视频文件后缀名可选pcm、mp3、mp4等备注1.0版本只支持pcm音频流
`is_speaking`表示断句尾点例如vad切割点或者一条wav结束
`chunk_size`表示流式模型latency配置`[5,10,5]`表示当前音频为600ms并且回看300ms又看300ms。
`audio_fs`当输入音频为pcm数据是需要加上音频采样率参数
`hotwords`如果AM为热词模型需要向服务端发送热词数据格式为字符串热词之间用" "分隔,例如 "阿里巴巴 达摩院 阿里云"
`itn`: 设置是否使用itn默认True
```
#### 发送音频数据
直接将音频数据移除头部信息后的bytes数据发送支持音频采样率为8000`message`中需要指定`audio_fs`为800016000
#### 发送结束标志
音频数据发送结束后需要发送结束标志需要用json序列化
```text
{"is_speaking": False}
```
### 从服务端往客户端发数据
#### 发送识别结果
message为采用json序列化
```text
{"mode": "2pass-online", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True, "timestamp":"[[100,200], [200,500]]"}
```
参数介绍:
```text
`mode`:表示推理模式,分为`2pass-online`,表示实时识别结果;`2pass-offline`表示2遍修正识别结果
`wav_name`:表示需要推理音频文件名
`text`:表示语音识别输出文本
`is_final`:表示识别结束
`timestamp`如果AM为时间戳模型会返回此字段表示时间戳格式为 "[[100,200], [200,500]]"(ms)
```