update README

2025-09-15 15:28:44 +08:00 · 2025-05-06 19:55:49 +08:00 · 2025-05-06 19:55:49 +08:00 · cf2d1c6443
commit cf2d1c6443
parent d42180b411
23 changed files with 25 additions and 14 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,9 @@
 # Change log for esp-sr

+## 2.1.1
+- Add 8KHz AEC for VoIP
+- Add more wakenet9 models
+
 ## 2.1.0
 - esp32c3 support wakenet9s and aec
 - esp32c5 support wakenet9s and aec
--- a/README.md
+++ b/README.md
@ -3,7 +3,7 @@
 [![Documentation Status](./docs/_static/sr_doc_latest.svg)](https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3/index.html)
 [![Component Registry](https://components.espressif.com/components/espressif/esp-sr/badge.svg)](https://components.espressif.com/components/espressif/esp-sr)

-Espressif [ESP-SR](https://github.com/espressif/esp-sr) helps users build AI speech solutions based on ESP32-S3 or ESP32-P4 chips.
+Espressif [ESP-SR](https://github.com/espressif/esp-sr) helps users build AI speech solutions.

 Overview
 --------
@ -18,18 +18,19 @@ ESP-SR framework includes the following modules:

 These algorithms are provided in the form of a component, so they can be integrated into your projects with minimum effort.

-ESP32-S3/ESP32-P4 are recommended, which support AI instructions and larger, high-speed octal SPI PSRAM.
-The new algorithms will no longer support ESP32 chips.

 News
 ----
-[21/4/2025]: We add a new model WakeNet9s, which can run on chips that do not have PSRAM and do not support SIMD, such as ESP32C3 and ESP32C5.     
+[21/4/2025]: We add a new model WakeNet9s, which can run on chips that do not have PSRAM and do not support SIMD, such as ESP32C3 and ESP32C5. [examples](https://github.com/espressif/esp-skainet/tree/master/examples/wake_word_detection)  
 [17/4/2025]: We add a new DOA(Direction of Arrival) algorithm.  
 [14/2/2025]: We release **ESP-SR V2.0**. [Migration from ESP-SR V1.* to ESP-SR V2.*](https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3/audio_front_end/migration_guide.html)   
 [13/2/2025]: We release **VADNet**, a voice activaty detection model. You can use it to replace the WebRTC VAD and improve the performance.

 ## Wake Word Engine

+| Supported Targets | ESP32    | ESP32-S2 | ESP32-S3 | ESP32-P4 | ESP32-C3 | ESP32-C5 | ESP32-C6 | 
+| ----------------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
+
 Espressif wake word engine **WakeNet** is specially designed to provide a high performance and low memory footprint wake word detection algorithm for users, which enables devices always listen to wake words, such as “Alexa”, “Hi,lexin” and “Hi,ESP”. WakeNet9 and WakeNet9s models are supported. WakeNet9s is a cost-down version of WakeNet9, with fewer parameters and lower computational requirements. 

 Espressif offers two ways to customize the wake word, please refer to the following document to choose the one that meets your needs:   
@ -80,6 +81,9 @@ The following wake words are supported in esp-sr:

 ## Speech Command Recognition

+| Supported Targets | ESP32    | ESP32-S3 | ESP32-P4 | 
+| ----------------- | -------- | -------- | -------- |
+
 Espressif's speech command recognition model **MultiNet** is specially designed to provide a flexible off-line speech command recognition model. With this model, you can easily add your own speech commands, eliminating the need to train model again. 

 Currently, Espressif **MultiNet** supports up to 300 Chinese or English speech commands, such as “打开空调” (Turn on the air conditioner) and “打开卧室灯” (Turn on the bedroom light).
@ -93,6 +97,9 @@ The following MultiNet models are supported in esp-sr:

 ## Audio Front End

+| Supported Targets | ESP32    | ESP32-S3 | ESP32-P4 | 
+| ----------------- | -------- | -------- | -------- |
+
 Espressif Audio Front-End **AFE** integrates AEC (Acoustic Echo Cancellation), VAD (Voice Activity Detection), BSS (Blind Source Separation) and NS (Noise Suppression), NSNET(Deep noise suppression) and other functions. It is designed to be used with the ESP-SR library.

 Our two-mic Audio Front-End (AFE) have been qualified as a “Software Audio Front-End Solution” for [Amazon Alexa Built-in devices](https://developer.amazon.com/en-US/alexa/solution-providers/alexa-connect-kit).
--- a/idf_component.yml
+++ b/idf_component.yml
@ -1,4 +1,4 @@
-version: "2.1.0"
+version: "2.1.1"
 description: esp_sr provides basic algorithms for Speech Recognition applications
 url: https://github.com/espressif/esp-sr
 dependencies:
--- a/include/esp32/esp_afe_aec.h
+++ b/include/esp32/esp_afe_aec.h
@ -40,11 +40,10 @@ typedef struct {
 * esp32c5.
 * @param type             The type of afe, AFE_TYPE_SR or AFE_TYPE_VC
 * @param mode             The mode of afe, AFE_MODE_LOW_COST or AFE_MODE_HIGH_PERF
- * @param sample_rate      The sample rate of input data
 *
 * @return afe_config_t*  The default config of afe
 */
-afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode, int sample_rate);
+afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode);

 /**
 * @brief Performs echo cancellation a frame, based on the audio sent to the speaker and frame from mic.
--- a/include/esp32/esp_afe_config.h
+++ b/include/esp32/esp_afe_config.h
@ -33,7 +33,8 @@ typedef enum {
 // Set AFE type
 typedef enum {
    AFE_TYPE_SR = 0, // Speech recognition scenarios, excluding nonlinear noise suppression
-    AFE_TYPE_VC = 1, // Voice communication scenarios, including nonlinear noise suppression
+    AFE_TYPE_VC = 1, // Voice communication scenarios, 16KHz input, including nonlinear noise suppression
+    AFE_TYPE_VC_8K = 2, // Voice communication scenarios, 8KHz input, note that the input data must be 8KHz
 } afe_type_t;

 typedef enum {
--- a/include/esp32p4/esp_afe_aec.h
+++ b/include/esp32p4/esp_afe_aec.h
@ -40,11 +40,10 @@ typedef struct {
 * esp32c5.
 * @param type             The type of afe, AFE_TYPE_SR or AFE_TYPE_VC
 * @param mode             The mode of afe, AFE_MODE_LOW_COST or AFE_MODE_HIGH_PERF
- * @param sample_rate      The sample rate of input data
 *
 * @return afe_config_t*  The default config of afe
 */
-afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode, int sample_rate);
+afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode);

 /**
 * @brief Performs echo cancellation a frame, based on the audio sent to the speaker and frame from mic.
--- a/include/esp32p4/esp_afe_config.h
+++ b/include/esp32p4/esp_afe_config.h
@ -33,7 +33,8 @@ typedef enum {
 // Set AFE type
 typedef enum {
    AFE_TYPE_SR = 0, // Speech recognition scenarios, excluding nonlinear noise suppression
-    AFE_TYPE_VC = 1, // Voice communication scenarios, including nonlinear noise suppression
+    AFE_TYPE_VC = 1, // Voice communication scenarios, 16KHz input, including nonlinear noise suppression
+    AFE_TYPE_VC_8K = 2, // Voice communication scenarios, 8KHz input, note that the input data must be 8KHz
 } afe_type_t;

 typedef enum {
--- a/include/esp32s3/esp_afe_aec.h
+++ b/include/esp32s3/esp_afe_aec.h
@ -40,11 +40,10 @@ typedef struct {
 * esp32c5.
 * @param type             The type of afe, AFE_TYPE_SR or AFE_TYPE_VC
 * @param mode             The mode of afe, AFE_MODE_LOW_COST or AFE_MODE_HIGH_PERF
- * @param sample_rate      The sample rate of input data
 *
 * @return afe_config_t*  The default config of afe
 */
-afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode, int sample_rate);
+afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode);

 /**
 * @brief Performs echo cancellation a frame, based on the audio sent to the speaker and frame from mic.
--- a/include/esp32s3/esp_afe_config.h
+++ b/include/esp32s3/esp_afe_config.h
@ -33,7 +33,8 @@ typedef enum {
 // Set AFE type
 typedef enum {
    AFE_TYPE_SR = 0, // Speech recognition scenarios, excluding nonlinear noise suppression
-    AFE_TYPE_VC = 1, // Voice communication scenarios, including nonlinear noise suppression
+    AFE_TYPE_VC = 1, // Voice communication scenarios, 16KHz input, including nonlinear noise suppression
+    AFE_TYPE_VC_8K = 2, // Voice communication scenarios, 8KHz input, note that the input data must be 8KHz
 } afe_type_t;

 typedef enum {
--- a/lib/esp32/libesp_audio_front_end.a
+++ b/lib/esp32/libesp_audio_front_end.a
--- a/lib/esp32/libesp_audio_processor.a
+++ b/lib/esp32/libesp_audio_processor.a
--- a/lib/esp32/libmultinet.a
+++ b/lib/esp32/libmultinet.a
--- a/lib/esp32/libwakenet.a
+++ b/lib/esp32/libwakenet.a
--- a/lib/esp32p4/libesp_audio_front_end.a
+++ b/lib/esp32p4/libesp_audio_front_end.a
--- a/lib/esp32p4/libesp_audio_processor.a
+++ b/lib/esp32p4/libesp_audio_processor.a
--- a/lib/esp32p4/libmultinet.a
+++ b/lib/esp32p4/libmultinet.a
--- a/lib/esp32p4/libvadnet.a
+++ b/lib/esp32p4/libvadnet.a
--- a/lib/esp32p4/libwakenet.a
+++ b/lib/esp32p4/libwakenet.a
--- a/lib/esp32s3/libesp_audio_front_end.a
+++ b/lib/esp32s3/libesp_audio_front_end.a
--- a/lib/esp32s3/libmultinet.a
+++ b/lib/esp32s3/libmultinet.a
--- a/lib/esp32s3/libnsnet.a
+++ b/lib/esp32s3/libnsnet.a
--- a/lib/esp32s3/libvadnet.a
+++ b/lib/esp32s3/libvadnet.a
--- a/lib/esp32s3/libwakenet.a
+++ b/lib/esp32s3/libwakenet.a