mirror of
https://github.com/espressif/esp-sr.git
synced 2025-09-15 15:28:44 +08:00
bugfix(doc): modify some incorrent link
This commit is contained in:
parent
6b917e383b
commit
ac5bd8a6c4
18
README.md
18
README.md
@ -1,30 +1,30 @@
|
||||
# esp_sr
|
||||
|
||||
Espressif `esp_sr` provides basic algorithms for **Speech Recognition** applications. Now, this framework has four modules:
|
||||
Espressif esp_sr provides basic algorithms for **Speech Recognition** applications. Now, this framework has four modules:
|
||||
|
||||
* The wake word detection model [WakeNet](doc/wake_word_engine/README.md)
|
||||
* The speech command recognition model [MultiNet](doc/speech_command_recognition/README.md)
|
||||
* Audio Front-End [AFE](doc/audio_front_end/README.md)
|
||||
* The txt to speech model [esp-tts](doc/audio_front_end/README.md)
|
||||
* The wake word detection model [WakeNet](docs/wake_word_engine/README.md)
|
||||
* The speech command recognition model [MultiNet](docs/speech_command_recognition/README.md)
|
||||
* Audio Front-End [AFE](docs/audio_front_end/README.md)
|
||||
* The txt to speech model [esp-tts](esp-tts/README.md)
|
||||
|
||||
These algorithms are provided in the form of a component, so they can be integrated into your projects with minimum efforts.
|
||||
|
||||
## Wake Word Engine
|
||||
|
||||
Espressif wake word engine [WakeNet](doc/wake_word_engine/README.md) is specially designed to provide a high performance and low memory footprint wake word detection algorithm for users, which enables devices always listen wake words, such as “Alexa”, “天猫精灵” (Tian Mao Jing Ling) and “小爱同学” (Xiao Ai Tong Xue).
|
||||
Espressif wake word engine [WakeNet](docs/wake_word_engine/README.md) is specially designed to provide a high performance and low memory footprint wake word detection algorithm for users, which enables devices always listen wake words, such as “Alexa”, “天猫精灵” (Tian Mao Jing Ling) and “小爱同学” (Xiao Ai Tong Xue).
|
||||
|
||||
Currently, Espressif has not only provided an official wake word "Hi,Lexin","Hi,ESP" to public for free, but also allows customized wake words. For details on how to customize your own wake words, please see [Espressif Speech Wake Words Customization Process](wake_word_engine/ESP_Wake_Words_Customization.md).
|
||||
Currently, Espressif has not only provided an official wake word "Hi,Lexin","Hi,ESP" to public for free, but also allows customized wake words. For details on how to customize your own wake words, please see [Espressif Speech Wake Words Customization Process](docs/wake_word_engine/ESP_Wake_Words_Customization.md).
|
||||
|
||||
## Speech Command Recognition
|
||||
|
||||
Espressif's speech command recognition model [MultiNet](doc/speech_command_recognition/README.md) is specially designed to provide a flexible off-line speech command recognition model. With this model, you can easily add your own speech commands, eliminating the need to train model again.
|
||||
Espressif's speech command recognition model [MultiNet](docs/speech_command_recognition/README.md) is specially designed to provide a flexible off-line speech command recognition model. With this model, you can easily add your own speech commands, eliminating the need to train model again.
|
||||
|
||||
Currently, Espressif **MultiNet** supports up to 200 Chinese or English speech commands, such as “打开空调” (Turn on the air conditioner) and “打开卧室灯” (Turn on the bedroom light).
|
||||
|
||||
|
||||
## Audio Front End
|
||||
|
||||
Espressif Audio Front-End [AFE](doc/audio_front_end/README.md) integrates AEC (Acoustic Echo Cancellation), VAD (Voice Activity Detection),MASE(Mic Array Speech Enhancement) and NS (Noise Suppression).
|
||||
Espressif Audio Front-End [AFE](docs/audio_front_end/README.md) integrates AEC (Acoustic Echo Cancellation), VAD (Voice Activity Detection),MASE(Mic Array Speech Enhancement) and NS (Noise Suppression).
|
||||
|
||||
Our two-mic Audio Front-End (AFE) have been qualified as a “Software Audio Front-End Solution” for [Amazon Alexa Built-in devices](https://developer.amazon.com/en-US/alexa/solution-providers/dev-kits#software-audio-front-end-dev-kits).
|
||||
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 31 KiB After Width: | Height: | Size: 25 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 46 KiB After Width: | Height: | Size: 43 KiB |
@ -1,4 +1,4 @@
|
||||
# MultiNet Introduction [[中文]](./README_cn.md)
|
||||
# MultiNet Introduction
|
||||
|
||||
MultiNet is a lightweight model specially designed based on [CRNN](https://arxiv.org/pdf/1703.05390.pdf) and [CTC](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.6306&rep=rep1&type=pdf) for the implementation of multi-command recognization. Now, up to 200 speech commands, including customized commands, are supported.
|
||||
|
||||
@ -39,18 +39,23 @@ Define the following two variables before using the command recognition model:
|
||||
|
||||
### Modify Speech Commands
|
||||
|
||||
For Chinese MultiNet, we use Pinyin without tone as units.
|
||||
For English MultiNet, we use international phonetic alphabet as unit. [multinet_g2p.py](../../tool/multinet_g2p.py) is used to convert English phrase into phonemes which can be recognized by multinet.
|
||||
Now, the MultiNet support two methods to modify speech commands.
|
||||
For Chinese MultiNet, we use Pinyin without tone as units.
|
||||
For English MultiNet, we use international phonetic alphabet as unit. [multinet_g2p.py](../../tool/multinet_g2p.py) is used to convert English phrase into phonemes which can be recognized by multinet.
|
||||
Now, the MultiNet support two methods to modify speech commands.
|
||||
|
||||
##### 1. menuconfig (before compilation)
|
||||
- 1.menuconfig (before compilation)
|
||||
|
||||
Users can define their own speech commands by `idf.py menuconfig -> ESP Speech Recognition -> add speech commands`
|
||||
|
||||
Chinese predefined commands:
|
||||
|
||||

|
||||
|
||||
English predefined commands:
|
||||
|
||||

|
||||
|
||||
##### 2. reset API
|
||||
- 2.reset API (after compilation)
|
||||
|
||||
Users also can modify speech commands in the code.
|
||||
|
||||
@ -69,7 +74,7 @@ multinet->reset(model_data, en_commands_en, err_id);
|
||||
|
||||
- One speech commands ID can correspond to multiple speech command phrases;
|
||||
- Up to 200 speech commands ID or speech command phrases, including customized commands, are supported;
|
||||
- Different Command IDs need to be separated by by ';'. The corresponding multiple phrases for one Command ID need to be separated by ','.
|
||||
- Different Command IDs need to be separated by ';'. The corresponding multiple phrases for one Command ID need to be separated by ','.
|
||||
- `err_id` return the spelling that does not meet the requirements.
|
||||
|
||||
### API Reference
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
# WakeNet [[中文]](./README_cn.md)
|
||||
# WakeNet
|
||||
|
||||
WakeNet, which is a wake word engine built upon neural network, is specially designed for low-power embedded MCUs. Now, the WakeNet model supports up to 5 wake words.
|
||||
|
||||
@ -80,6 +80,8 @@ Please see the flow diagram of WakeNet below:
|
||||
|Quantised WakeNet5X3|371 K|24 KB|18 ms|30 ms|
|
||||
|
||||
### 2. Resource Occupancy(ESP32S3)
|
||||
|Model Type|Parameter Num|RAM|Average Running Time per Frame| Frame Length|
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
|Quantised WakeNet7_2CH|810 K|45 KB|10 ms|32 ms|
|
||||
|Quantised WakeNet8_2CH|821 K|50 KB|10 ms|32 ms|
|
||||
|
||||
@ -88,11 +90,11 @@ Please see the flow diagram of WakeNet below:
|
||||
|
||||
|Distance| Quiet | Stationary Noise (SNR = 4 dB)| Speech Noise (SNR = 4 dB)| AEC Interruption (-10 dB)|
|
||||
|:---:|:---:|:---:|:---:|:---:|
|
||||
|1 m|98%|96%|95%|95%|
|
||||
|3 m|98%|95%|94%|94%|
|
||||
|1 m|98%|96%|94%|96%|
|
||||
|3 m|98%|94%|92%|94%|
|
||||
|
||||
False triggering rate: 1 time in 12 hours
|
||||
|
||||
|
||||
**Note**: We use the ESP32-S3-Korvo V4.0 development board and the WakeNet8(Alexa) model in our test.
|
||||
|
||||
## Wake Word Customization
|
||||
|
||||
Loading…
Reference in New Issue
Block a user