mirror of
https://github.com/espressif/esp-sr.git
synced 2025-09-15 15:28:44 +08:00
update libs and README
This commit is contained in:
parent
ff9ffa5985
commit
ff334691dc
2
.gitignore
vendored
2
.gitignore
vendored
@ -1,3 +1,3 @@
|
||||
build/
|
||||
sdkconfig.old
|
||||
|
||||
sdkconfig
|
||||
|
||||
4
Makefile
4
Makefile
@ -3,8 +3,8 @@ PROJECT_NAME := esp_sr_public
|
||||
MODULE_PATH := $(abspath $(shell pwd))
|
||||
|
||||
EXTRA_COMPONENT_DIRS += $(MODULE_PATH)/lib
|
||||
EXTRA_COMPONENT_DIRS += $(MODULE_PATH)/wake_words_engine
|
||||
EXTRA_COMPONENT_DIRS += $(MODULE_PATH)/speech_commands_recognition
|
||||
EXTRA_COMPONENT_DIRS += $(MODULE_PATH)/wake_word_engine
|
||||
EXTRA_COMPONENT_DIRS += $(MODULE_PATH)/speech_command_recognition
|
||||
|
||||
include $(IDF_PATH)/make/project.mk
|
||||
|
||||
|
||||
10
README.md
10
README.md
@ -2,20 +2,20 @@
|
||||
|
||||
Espressif esp_sr provides basic algorithms for **Speech Recognition** applications. Now, this framework has two models:
|
||||
|
||||
* The wake word detection model [WakeNet](wake_words_engine/README.md)
|
||||
* The speech command recognition model [MultiNet](speech_commands_recognition/README.md)
|
||||
* The wake word detection model [WakeNet](wake_word_engine/README.md)
|
||||
* The speech command recognition model [MultiNet](speech_command_recognition/README.md)
|
||||
|
||||
These algorithms are provided in the form of a component, so they can be integrated into your projects with minimum efforts.
|
||||
|
||||
## Wake Word Engine
|
||||
|
||||
Espressif wake word engine [WakeNet](wake_words_engine/README.md) is specially designed to provide a high performance and low memory footprint wake word detection algorithm for users, which enables devices always listen wake words, such as “Alexa”, “天猫精灵” (Tian Mao Jing Ling) and “小爱同学” (Xiao Ai Tong Xue).
|
||||
Espressif wake word engine [WakeNet](wake_word_engine/README.md) is specially designed to provide a high performance and low memory footprint wake word detection algorithm for users, which enables devices always listen wake words, such as “Alexa”, “天猫精灵” (Tian Mao Jing Ling) and “小爱同学” (Xiao Ai Tong Xue).
|
||||
|
||||
Currently, Espressif has not only provided an official wake word "Hi, Lexin" to public for free, but also allows customized wake words. For details on how to customize your own wake words, please see [Espressif Speech Wake Words Customization Process](wake_words_engine/ESP_Wake_Words_Customization.md).
|
||||
Currently, Espressif has not only provided an official wake word "Hi, Lexin" to public for free, but also allows customized wake words. For details on how to customize your own wake words, please see [Espressif Speech Wake Words Customization Process](wake_word_engine/ESP_Wake_Words_Customization.md).
|
||||
|
||||
## Speech Command Recognition
|
||||
|
||||
Espressif's speech command recognition model [MultiNet](speech_commands_recognition/README.md) is specially designed to provide a flexible off-line speech command recognition model. With this model, you can easily add your own speech commands, eliminating the need to train model again.
|
||||
Espressif's speech command recognition model [MultiNet](speech_command_recognition/README.md) is specially designed to provide a flexible off-line speech command recognition model. With this model, you can easily add your own speech commands, eliminating the need to train model again.
|
||||
|
||||
Currently, Espressif **MultiNet** supports up to 100 Chinese speech commands, such as “打开空调” (Turn on the air conditioner) and “打开卧室灯” (Turn on the bedroom light).
|
||||
|
||||
|
||||
@ -3,4 +3,4 @@
|
||||
#
|
||||
# (Uses default behaviour of compiling all source files in directory, adding 'include' to include path.)
|
||||
|
||||
COMPONENT_DEPENDS := wake_words_engine
|
||||
COMPONENT_DEPENDS := wake_word_engine
|
||||
|
||||
@ -24,7 +24,7 @@ Please see the flow diagram below:
|
||||
|
||||
### User-defined Command
|
||||
|
||||
Currently, users can define their own speech commands in the `menuconfig`. You can refer to the method of adding speech commands in menuconfig->Component config > ESP Speech Recognition->Add speech commands, there are already 20 commands pre-stored in sdkconfig.
|
||||
Currently, users can define their own speech commands by using the command `make menuconfig`. You can refer to the method of adding speech commands in `menuconfig->Component config > ESP Speech Recognition->Add speech commands`, there are already 20 commands pre-stored in sdkconfig.
|
||||
|
||||
|Command ID|Command|Command ID|Command|Command ID|Command|Command ID|Command|
|
||||
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
||||
@ -40,11 +40,11 @@ MultiNet supports user-defined commands. You can add your own commands to MultiN
|
||||
|
||||
Users can define their own speech commands in the `menuconfig` in Pinyin, for example:
|
||||
|
||||
the command of “打开空调”, which means turn on the air conditioner, should be provided to the blank as "dai kai kong tiao".
|
||||
the command of “打开空调”, which means turn on the air conditioner, should be provided to the blank as "da kai kong tiao".
|
||||
|
||||
- One speech commands ID can correspond to multiple speech command phrases;
|
||||
- Up to 100 speech commands ID or speech command phrases, including customized commands, are supported;
|
||||
- The corresponding multiple phrases in an ID need to be used ',' separated.
|
||||
- The corresponding multiple phrases for one Command ID need to be separated by ','.
|
||||
|
||||
### Basic Configuration
|
||||
|
||||
@ -91,11 +91,11 @@ Define the following two variables before using the command recognition model:
|
||||
|
||||
**Parameter**
|
||||
|
||||
model: The model object to query
|
||||
model: The model object to query.
|
||||
|
||||
**Return**
|
||||
|
||||
The amount of samples to feed the detection function
|
||||
The amount of samples to feed the detection function.
|
||||
|
||||
|
||||
- `typedef int (*esp_mn_iface_op_get_samp_chunknum_t)(model_iface_data_t *model);`
|
||||
@ -106,25 +106,25 @@ Define the following two variables before using the command recognition model:
|
||||
|
||||
**Parameter**
|
||||
|
||||
model: The model object to query
|
||||
model: The model object to query.
|
||||
|
||||
**Return**
|
||||
|
||||
The number of the frames recognized by the speech command
|
||||
The number of the frames recognized by the speech command.
|
||||
|
||||
- `typedef int (*esp_mn_iface_op_get_samp_rate_t)(model_iface_data_t *model);`
|
||||
|
||||
**Definition**
|
||||
|
||||
Get the sample rate of the samples to feed to the detection function
|
||||
Get the sample rate of the samples to feed to the detection function.
|
||||
|
||||
**Parameter**
|
||||
|
||||
model: The model object to query
|
||||
model: The model object to query.
|
||||
|
||||
**Return**
|
||||
|
||||
The sample rate, in Hz
|
||||
The sample rate, in Hz.
|
||||
|
||||
- `typedef float* (*esp_mn_iface_op_detect_t)(model_iface_data_t *model, int16_t *samples);`
|
||||
|
||||
@ -138,15 +138,15 @@ Define the following two variables before using the command recognition model:
|
||||
|
||||
**Return**
|
||||
|
||||
* The command id, if a matching command is found
|
||||
* -1, if no matching command is found
|
||||
* The command id, if a matching command is found.
|
||||
* -1, if no matching command is found.
|
||||
|
||||
- `typedef void (*esp_mn_iface_op_destroy_t)(model_iface_data_t *model);`
|
||||
|
||||
**Definition**
|
||||
|
||||
Destroy a voiceprint recognition model
|
||||
Destroy a voiceprint recognition model.
|
||||
|
||||
**Parameters**
|
||||
|
||||
model: Model object to destroy
|
||||
model: Model object to destroy.
|
||||
@ -6,7 +6,6 @@
|
||||
|
||||
char *get_id_name(int i)
|
||||
{
|
||||
// char command_phrase[128];
|
||||
if (i == 0)
|
||||
return MN_SPEECH_COMMAND_ID0;
|
||||
else if (i == 1)
|
||||
@ -1,131 +0,0 @@
|
||||
# Recognizing Speech Commands with ESP32-LyraT-Mini
|
||||
|
||||
Currently, Espressif's ESP32-based speech command recognition model [MultiNet](README.md) supports up to 100 Chinese speech commands (We will add supports for English speech commands in the next release of [esp-sr](../README.md)).
|
||||
|
||||
This demo demonstrates the basic process of recognizing Chinese speech commands with ESP32-LyraT-Mini. Please also see a flow diagram below.
|
||||
|
||||

|
||||
|
||||
For more information about ESP32-LyraT-Mini, please see [ESP32-LyraT-Mini Getting Started Guide]().
|
||||
|
||||
# 1. Quick Start
|
||||
|
||||
### 1.1 Basic Configuration
|
||||
|
||||
Go to `make menuconfig`, and complete the following configuration:
|
||||
|
||||
- Basic hardware configuration
|
||||
|
||||
Navigate to `Audio Media HAL`, and configure the following parameters as instructed.
|
||||
- `Audio hardware board`: select `ESP32-Lyrat Mini V1.1`;
|
||||
- `Audio codec chip`: select `CODEC IS ES8311`;
|
||||
- `use external adc`: select `use es7243`;
|
||||
- `Audio DSP chip`: select `No DSP chip`.
|
||||
|
||||

|
||||
|
||||
- Basic software configuration
|
||||
|
||||
Navigate to `ESP32 Hotword Detection`, and configure the following parameters as instructed.
|
||||
- `Speech recognition audio source`: select `Live microphone on LyraT-board`;
|
||||
- `wake word model to use`: select `WakeNet 6 (quantized)`;
|
||||
- `wake word name`: select `hilexin (WakeNet6)`;
|
||||
- `LVCSR model to use`: select `MultiNet 1 (quantized)`;
|
||||
- `langugae`: select `chinese (MultiNet1)`
|
||||
|
||||

|
||||
|
||||
Then save the configuration and exit.
|
||||
|
||||
### 1.2 Compiling and Running
|
||||
|
||||
Run `make flash monitor` to compile, flash and run this example, and check the output log:
|
||||
|
||||
```
|
||||
...
|
||||
I (126) MSC_DSP: CONFIG_CODEC_CHIP_IS_ES8311
|
||||
wake word number = 1, word1 name = hilexin
|
||||
-----------awaits to be waken up-----------
|
||||
```
|
||||
|
||||
### 1.3 Waking up the Board
|
||||
|
||||
Find the pre-defined wake word of the board in the printed log. In this example, the wake word is “Hi Lexin" [Ləsɪ:n].
|
||||
|
||||
Then, say “Hi Lexin" ([Ləsɪ:n]) to wake up the board, which then wakes up and prints the following log:
|
||||
|
||||
```
|
||||
hilexin DETECTED.
|
||||
-----------LISTENING-----------
|
||||
```
|
||||
|
||||
### 1.4 Recognizing Speech Commands
|
||||
|
||||
Then, the board enters the Listening status, waiting for new speech commands.
|
||||
|
||||
Currently, the MultiNet model already defined 20 speech commands, which can be seen in [MultiNet](README.md).
|
||||
|
||||
Now, you can give one speech command, for example, “打开空调 (turn on the air conditioner)”,
|
||||
|
||||
* If this command exists in the supported speech command list, the board prints out the command id of this command in its log:
|
||||
|
||||
```
|
||||
-----------LISTENING-----------
|
||||
phrase:d a k ai k ong ti ao, prob:0.423639
|
||||
command_id:0
|
||||
--------------END--------------
|
||||
|
||||
```
|
||||
* If this command does not exist in the supported speech command list, the board prints an error message of "cannot recognize any speech commands" in its log:
|
||||
|
||||
|
||||
```
|
||||
-----------LISTENING-----------
|
||||
cannot recognize any speech commands
|
||||
--------------END--------------
|
||||
|
||||
```
|
||||
|
||||
Also, the board prints `--------------END--------------` when it ends the current recognition cycle and re-enters the Waiting-for-Wakeup status.
|
||||
|
||||
**Notices:**
|
||||
|
||||
The board can only stay in the Listening status for up to six seconds. After that, it ends the current recognition cycle and re-enters the Waiting-for-wakeup status. Therefore, you must give speech commands in six seconds after the board wakes up.
|
||||
|
||||
### 1.5 Adding Customized Speech Commands
|
||||
|
||||
Now, the MultiNet model supports 20 pre-defined speech commands, and also allows more customized speech commands by providing users an easy-to-use `add_speech_commands` API.
|
||||
|
||||
Note that you should use mandarin syllables when creating your speech commands, and each syllable should be provided to the API in the form of **one Type A element** and **one Type B element**, which can be seen below:
|
||||
|
||||
* Type A element: `b bi c ch chu cu d di du f g gu h hu j ji ju k ku l li lu m mi n ni nu p pi q qi qu r ru s sh shu su t ti tu w
|
||||
x xi xu y yu z zhu zu`
|
||||
|
||||
* Type B element: `a ai an ang ao e ei en eng er i ie in ing iu o ong ou u ue ui un v ve`
|
||||
|
||||
|
||||
For example, the Type A and Type B elements for "tiao" are "ti" and "ao", and the syllable "tiao" should provided to the API as "ti ao". Similarly, the command of "dai kai kong tiao", which means turn on the air conditioner, should be provided to the API as "d ai k ai k ong ti ao".
|
||||
|
||||
For details on how to use API `add_speech_commands`, please click [Here](./README.md).
|
||||
|
||||
|
||||
# 2. Workflow Walkthrough
|
||||
### 2.1 Hardware Initialization
|
||||
|
||||
You don't need any special-purpose boards to run the **WakeNet** and **MultiNet** examples. Currently, Espressif has launched several audio boards and one of them is ESP32-LyraT-Mini, which is what we use in this example.
|
||||
|
||||
For details on the initialization of the ESP32-LyraT-Mini board, please see codes in `components/hardware_driver`.
|
||||
|
||||
If you want to choose other development boards other than ESP32-LyraT-Mini, please go to [esp-adf](https://github.com/espressif/esp-adf), which is Espressif's development framework for building audio applications based on ESP32 products, for more detailed information on hardware drivers.
|
||||
|
||||
### 2.2 Wake-up by Wake Word
|
||||
|
||||
The board enters the Waiting-for-wakeup status after waking up, during which the board will pick up audio data with the on-board microphone, and feed them to the **WakeNet** model frame by frame (30 ms, 16 KHz, 6 bit, mono).
|
||||
|
||||
Currently, you cannot customize wake word yourself. Therefore, please contact us for such requests.
|
||||
|
||||
### 2.3 Recognizing Speech Commands
|
||||
|
||||
During the recognition, the board feeds data frame by frame (30 ms, 16 KHz, 16 bit, mono) to the **MultiNet** model for six seconds. Then, the model compares the speech command received against the pre-defined commands in the list, and return the command id or an error message depending on the recognition result.
|
||||
|
||||
Please see section 1.5 on how to customize your speech command.
|
||||
Loading…
Reference in New Issue
Block a user