Merge branch 'release/v1.2.0' into 'master'

Release/v1.2.0

See merge request speech-recognition-framework/esp-sr!24
This commit is contained in:
Sun Xiang Yu 2023-03-08 16:00:31 +08:00
commit 018ed41024
5 changed files with 24 additions and 87 deletions

View File

@ -1,13 +1,15 @@
# Change log for esp-sr
## Unreleased
## 1.2.0
- ESP-DSP dependency is now installed from the component registry
- Add an English MultiNet6 model which is trained by RNNT and CTC
- Add a Chinese MultiNet6 model which is trained by RNNT and CTC
- Fixed CMake errors when esp-sr was installed from component registry
- Fixed the list of supported chips displayed in the component registry
## 1.1.0
- Support esp32c3 for Chinese TTS
- Update document of ESP-SR

View File

@ -47,14 +47,8 @@ Format of Speech Commands
Different MultiNets support different format:
- Chinese
MultiNet5 and MultiNet6 sse Pinyin for Chinese speech commands. Please use :project_file:`tool/multinet_pinyin.py` to get pinyin of Chinese.
- English
MultiNet5 use phonemes for English speech commands. Simplicity, we use chats to denote different phoneme.Please use :project_file:`tool/multinet_g2p.py` to do the convention.
MultiNet6 use grapheme for English speech commands. You do not need any convention.
- MultiNet5 use phonemes for English speech commands. For simplicity, we use characters to denote different phonemes. Please use :project_file:`tool/multinet_g2p.py` to do the convention.
- MultiNet6 use grapheme for English speech commands. You do not need any conversion.
Suggestions on Customizing Speech Commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -63,19 +57,16 @@ When customizing speech command words, please pay attention to the following sug
.. list::
- The recommended length of Chinese speech commands is generally 4-6 Chinese characters. Too short leads to high false recognition rate and too long is inconvenient for users to remember
:esp32s3: - The recommended length of English speech commands is generally 4-6 words
- Mixed Chinese and English is not supported in command words
- The command word cannot contain Arabic numerals and special characters
- Avoid common command words like "hello"
- The greater the pronunciation difference of each Chinese character / word in the command words, the better the performance
Speech Commands Customization Methods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MultiNet6 customize speech commands:
- For English, words are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_en.txt` by the following format:
MultiNet6 customize speech commands
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Words are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_en.txt` by the following format:
::
@ -83,25 +74,11 @@ MultiNet6 customize speech commands:
1 TELL ME A JOKE
2 MAKE A COFFEE
- For Chinese, pinyin are used as units. Please modify a text file :project_file:`model/multinet_model/fst/commands_cn.txt` by the following format. :project_file:`tool/multinet_pinyin.py` help tp get Pinyin of Chinese.
::
MultiNet5 customize speech commands
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# command_id command_sentence
1 da kai kong tiao
2 guan bi kong tiao
Multinet5 supports flexible methods to customize speech commands. Users can do it either online or offline and can also add/delete/modify speech commands dynamically.
.. only:: latex
.. figure:: ../../_static/QR_multinet_g2p.png
:alt: menuconfig_add_speech_commands
Customize Speech Commands Offline
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There are two methods for users to customize speech commands offline:
There are two methods to customize speech commands offline:
- Via ``menuconfig``
@ -114,7 +91,7 @@ There are two methods for users to customize speech commands offline:
Please note that a single ``Command ID`` can correspond to more than one commands. For example, "da kai kong tiao" and "kai kong tiao" have the same meaning. Therefore, users can assign the same command id to these two commands and separate them with "," (no space required before and after).
1. Call the following API:
2. Call the following API:
::
@ -135,19 +112,12 @@ There are two methods for users to customize speech commands offline:
- Via modifying code
Users directly customize the speech commands in the code and pass these commands to the MultiNet. In the actual user scenarios, users can pass these commands via various interfaces including network / UART / SPI. For details, see the example described in ESP-Skainet.
Customize speech commands online
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MultiNet allows users to add/delete/modify speech commands dynamically during the operation, without the need to change models or modifying parameters. For details, see the example described in ESP-Skainet.
For detailed description of APIs, please refer to :project_file:`src/esp_mn_speech_commands.c` .
Users directly customize the speech commands in the code and pass these commands to the MultiNet. In the actual user scenarios, users can pass these commands via various interfaces including network / UART / SPI. For detailed description of APIs. Please refer to :project_file:`src/esp_mn_speech_commands.c` and examples described in ESP-Skainet.
Use MultiNet
------------
MultiNet speech commands recognition must be used together with audio front-end (AFE) in ESP-SR (What's more, AFE must be used together with WakeNet). For details, see Section :doc:`AFE Introduction and Use <../audio_front_end/README>` .
We suggest to use MultiNet together with audio front-end (AFE) in ESP-SR. For details, see Section :doc:`AFE Introduction and Use <../audio_front_end/README>` .
After configuring AFE, users can follow the steps below to configure and run MultiNet.
@ -187,11 +157,6 @@ Users can start MultiNet after enabling AFE and WakeNet, but must pay attention
MultiNet Output
~~~~~~~~~~~~~~~
Speech commands recognition supports two basic modes:
* Single recognition
* Continuous recognition
Speech command recognition must be used with WakeNet. After wake-up, MultiNet detection can start.
Afer running, MultiNet returns the recognition output of the current frame in real time ``mn_state``, which is currently divided into the following identification states:
@ -228,13 +193,13 @@ Afer running, MultiNet returns the recognition output of the current frame in re
Users can use ``phrase_id[0]`` and ``prob[0]`` get the recognition result with the highest probability.
- ESP_MN_STATE_TIMEOUT
- ESP_MN_STATE_TIMEOUT
Indicates the speech commands has not been detected for a long time and will exit automatically and wait to be waked up again.
Therefore:
Single recognition mode and Continuous recognition mode:
* Single recognition mode: exit the speech recognition when the return status is ``ESP_MN_STATE_DETECTED``
* Continuous recognition: exit the speech recognition when the return status is ``ESP_MN_STATE_TIMEOUT``
* Continuous recognition mode: exit the speech recognition when the return status is ``ESP_MN_STATE_TIMEOUT``
Resource Occupancy
------------------

View File

@ -47,16 +47,8 @@ MultiNet 输入为经过前端语音算法AFE处理过的音频格式
不同版本的MultiNet命令词格式不同。命令词需要满足特定的格式具体如下
- 中文
MultiNet5和MultiNet6使用汉语拼音作为基本识别单元并且每个字的拼音拼写间隔一个空格。比如“打开空调”应该写成 “da kai kong tiao”请使用以下工具将汉字转为拼音 :project_file:`tool/multinet_pinyin.py`
- 英文
MultiNet5: 使用音标作为基本识别单元。为简单起见将每个音标映射为单个字母表示比如“turn on the light”需要写成“TkN nN jc LiT”。请使用我们提供的工具进行转换详细可见 :project_file:`tool/multinet_g2p.py`
MultiNet6: 使用subwords作为识别单元用户可以直接输入所需短语。比如“turn on the light”直接写为“turn on the light”即可。
自定义要求
~~~~~~~~~~~
@ -96,17 +88,7 @@ MultiNet6 离线设置命令词的方法:
1 da kai kong tiao
2 guan bi kong tiao
- 英语通过修改 :project_file:`model/multinet_model/fst/commands_en.txt`
格式如下第一个数字代表command id, 后面为指令的英语短语,两者由空格隔开,单词间也由空格隔开
::
# command_id command_sentence
1 TELL ME A JOKE
2 MAKE A COFFEE
MultiNet5 支持两种离线设置命令词的方法:
MultiNet5 离线设置命令词的方法:
- 通过 ``menuconfig``
@ -119,7 +101,7 @@ MultiNet5 支持两种离线设置命令词的方法:
注意,单个 Command ID 可以支持多个短语,比如“打开空调”和“开空调”表示的意义相同,则可以将其写在同一个 Command ID 对应的词条中,用英文字符“,”隔开相邻词条(“,”前后无需空格)。
1. 在代码里调用以下 API
2. 在代码里调用以下 API
::
@ -140,19 +122,12 @@ MultiNet5 支持两种离线设置命令词的方法:
- 通过修改代码
该方法中,用户直接在代码中编写命令词,并传给 MultiNet。在实际产品开发和使用中用户可以通过网络/UART/SPI 等多种接口,传递所需的命令词并随时更换命令词。详情可参考 ESP-Skainet 中的 example。
在线设置命令词
^^^^^^^^^^^^^^
MultiNet 还支持在运行过程中,在线动态设置命令词(添加/删除/修改),且整个过程无须更换模型或调整参数。详情可参考 ESP-Skainet 中 example。
具体 API 说明请参考 :project_file:`src/esp_mn_speech_commands.c`
该方法中,用户直接在代码中编写命令词,并传给 MultiNet。在实际产品开发和使用中用户可以通过网络/UART/SPI 等多种接口,传递所需的命令词并随时更换命令词。具体 API 说明请参考 :project_file:`src/esp_mn_speech_commands.c` 和 ESP-Skainet 中的 example。
MultiNet 的使用
----------------
MultiNet 命令词识别需要和 ESP-SR 中的 AFE 声学算法模块一起运行此外AFE 运行还需要使能 WakeNet 功能,具体请参考 :doc:`AFE 介绍及使用 <../audio_front_end/README>` )。
MultiNet 命令词识别建议和 ESP-SR 中的 AFE 声学算法模块一起运行,具体请参考 :doc:`AFE 介绍及使用 <../audio_front_end/README>` )。
当用户配置完成 AFE 后,请按照以下步骤配置和运行 MultiNet。
@ -192,11 +167,6 @@ MultiNet 运行
MultiNet 识别结果
~~~~~~~~~~~~~~~~~
MultiNet 命令词识别支持两种基本模式:
* 单次识别
* 连续识别
命令词识别必须和唤醒搭配使用,当唤醒后可以运行命令词的检测。
命令词模型在运行时,会实时返回当前帧的识别状态 ``mn_state`` ,目前分为以下几种识别状态:
@ -237,7 +207,7 @@ MultiNet 命令词识别支持两种基本模式:
该状态表示长时间未检测到命令词,自动退出。等待下次唤醒。
因此
单次识别模式和连续识别模式
当命令词识别返回状态为 ``ESP_MN_STATE_DETECTED`` 时退出命令词识别,则为单次识别模式;
当命令词识别返回状态为 ``ESP_MN_STATE_TIMEOUT`` 时退出命令词识别,则为连续识别模式;

View File

@ -1,4 +1,4 @@
version: "1.1.0"
version: "1.2.0"
description: esp_sr provides basic algorithms for Speech Recognition applications
url: https://github.com/espressif/esp-sr
dependencies:

View File

@ -10,7 +10,7 @@ For English, words are used as units. Please prepare a list of commands written
2 MAKE A COFFEE
```
For Chinese, pinyin are used as units. [multinet_pinyin.py](./multinet_pinyin.py) help tp get Pinyin of Chinese. Please prepare a list of commands written in a text file `commands_cn.txt` of the following format:
For Chinese, pinyin are used as units. [multinet_pinyin.py](./multinet_pinyin.py) help to get Pinyin of Chinese. Please prepare a list of commands written in a text file `commands_cn.txt` of the following format:
```
# command_id command_sentence
1 da kai kong tiao