Merge branch 'rename_webrtc' into 'master'

Rename webrtc See merge request speech-recognition-framework/esp-sr!141
2025-09-15 15:28:44 +08:00 · 2025-02-18 17:28:58 +08:00 · 2025-02-18 17:28:58 +08:00 · 1c0503e7e2
commit 1c0503e7e2
parent 79556def1c b4f1200553
32 changed files with 24411 additions and 538 deletions
--- a/docs/_static/kconfig.png
+++ b/docs/_static/kconfig.png
--- a/docs/en/flash_model/README.rst
+++ b/docs/en/flash_model/README.rst
@ -1,216 +1,80 @@
-Flashing Models
-===============
+Model Selection and Loading
+===========================

 :link_to_translation:`zh_CN:[中文]`

-In the AI industry, a model refers to a mathematical representation of a system or process. It is used to make predictions or decisions based on input data. There are many types of models, such as decision trees, neural networks, and support vector machines, each with their own strengths and weaknesses. Esprssif also provides our trained models such as WakeNet and MultiNet (see the model data used in :project:`model`)
+This document explains how to select and load models for ESP-SR.

-To use our models in your project, you need to flash these models. Currently, ESP-SR supports the following methods to flash models:
+Model Selection
+---------------

-.. only:: esp32
+ESP-SR allows you to choose required models through the ``menuconfig`` interface. To configure models:

-    ESP32: Load directly from Flash
+1. Run ``idf.py menuconfig``
+2. Navigate to **ESP Speech Recognition**
+3. Configure the following options:
+   - **Noise Suppression Model**
+   - **VAD Model**
+   - **WakeNet Model**
+   - **MultiNet Model**

-.. only:: esp32s3
+.. figure:: ../../_static/kconfig.png
+    :alt: kconfig

-    ESP32-S3:

-    -  Load directly from SIP Flash File System (flash)
-    -  Load from external SD card
+Updating Partition Table
+------------------------
+You must add a `partition.csv` file and ensure that there is enough space for the selected models. 
+Add the following line to your project's ``partitions.csv`` file to allocate space for models:

-    So that on ESP32-S3 you can:
+.. code-block::

-        -  Greatly reduce the size of the user application APP BIN
-        -  Supports the selection of up to two wake words
-        -  Support online switching of Chinese and English Speech Command Recognition
-        -  Convenient for users to perform OTA
-        -  Supports reading and changing models from SD card, which is more convenient and can reduce the size of module Flash used in the project
-        -  When the user is developing the code, when the modification does not involve the model, it can avoid flashing the model data every time, greatly reducing the flashing time and improving the development efficiency
+    model,  data,        ,         ,    6000K

-Configuration
+- Replace ``6000K`` with your custom partition size according to the selected models.
+- ``model`` is the partition label (fixed value).
+
+Model Loading
 -------------

-Run ``idf.py menuconfig`` to navigate to ``ESP Speech Recognition``:
+ESP-IDF Framework
+~~~~~~~~~~~~~~~~~

-.. figure:: ../../_static/model-1.png
-    :alt: overview
+ESP-SR automatically handles model loading through its CMake scripts:  

-    overview
+1. Flash the device with all components:  
+   ``idf.py flash``  
+   *This command automatically loads the selected models.*

-.. only:: esp32s3
+2. For code debugging (without re-flashing models):  
+   ``idf.py app-flash``  

-    Model Data Path
-    ~~~~~~~~~~~~~~~
+.. note::  
+   The model loading script is defined in ``esp-sr/CMakeLists.txt``. Models are flashed to the partition labeled ``model`` during initial flashing.

-    This option indicates the storage location of the model data: ``Read model data from flash`` or ``Read model data from SD card``.
+Arduino Framework
+~~~~~~~~~~~~~~~~~

-    -  ``Read model data from flash`` means that the model data is stored in the flash, and the model data will be loaded from the flash partition
-    -  ``Read model data from SD card`` means that the model data is stored in the SD card, and the model data will be loaded from the SD card
+To manually generate and load models:  

-Use AFE
-~~~~~~~
+1. Use the provided Python script to generate ``srmodels.bin``:

-This option is enabled by default. Users do not need to modify it. Please keep the default configuration.
+   .. code-block:: bash

-Use WakeNet
-~~~~~~~~~~~
+      python {esp-sr_path}/movemodel.py -d1 {sdkconfig_path} -d2 {esp-sr_path} -d3 {build_path}

-This option is enabled by default. When the user only uses ``AEC`` or ``BSS``, etc., and does not need ``WakeNet`` or ``MultiNet``, please disable this option, which reduces the size of the project firmware.
+   **Parameters:**

-Select wake words by via ``menuconfig`` by navigating to ``ESP Speech Recognition`` > ``Select wake words``. The model name of wake word in parentheses must be used to initialize WakeNet handle.
+   - ``esp-sr_path``: Path to your ESP-SR component directory

-    |select wake wake|
+   - ``sdkconfig_path``: Project's ``sdkconfig`` file path

-If you want to select multiple wake words, please select ``Load Multiple Wake Words``
+   - ``build_path``: Project's build directory (typically ``your_project_path/build``)

-    |multi wake wake|
+2. The generated ``srmodels.bin`` will be located at:  
+   ``{build_path}/srmodels/srmodels.bin``

-Then you can select multiple wake words at the same time:
+3. Flash the generated binary to your device.

-    |image1|
-
-.. only:: esp32
-
-    .. note::
-        ESP32 doesn't support multiple wake words.
-
-.. only:: esp32s3
-
-    .. note::
-        ESP32-S3 does support multiple wake words. Users can select more than one wake words according to the hardware flash size.
-
-For more details, please refer to :doc:`WakeNet <../wake_word_engine/README>` .
-
-Use Multinet
-~~~~~~~~~~~~
-
-This option is enabled by default. When users only use WakeNet or other algorithm modules, please disable this option, which reduces the size of the project firmware in some cases.
-
-Chinese Speech Commands Model
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. only:: esp32
-
-    ESP32 only supports command words in Chinese:
-
-    -  None
-    -  Chinese single recognition (MultiNet2)
-
-.. only:: esp32s3
-
-    ESP32-S3 supports command words in both Chinese and English:
-
-    -  None
-    -  Chinese single recognition (MultiNet4.5)
-    -  Chinese single recognition (MultiNet4.5 quantized with 8-bit)
-    -  English Speech Commands Model
-
-    The user needs to add Chinese Speech Command words to this item when ``Chinese Speech Commands Model`` is not ``None``.
-
-.. only:: esp32s3
-
-    English Speech Commands Model
-    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-    ESP32-S3 supports command words in both Chinese and English, and allows users to switch between these two languages.
-
-    -  None
-    -  English recognition (MultiNet5 quantized with 8-bit, depends on WakeNet8)
-    -  Add Chinese speech commands
-
-    The user needs to add English Speech Command words to this item when ``English Speech Commands Model`` is not ``None``.
-
-For more details, please refer to Section :doc:`MultiNet <../speech_command_recognition/README>` .
-
-How To Use
----------
-
-After the above-mentioned configuration, users can initialize and start using the models following the examples described in the `ESP-Skainet <https://github.com/espressif/esp-skainet>`_ repo.
-
-Here, we only introduce the code implementation, which can also be found in :project_file:`src/model_path.c`.
-
-.. only:: esp32
-
-    ESP32 can only load model data from flash. Therefore, the model data in the code will automatically read the required data from the Flash according to the address. Note that, ESP32 and ESP32-S3 APIs are compatible.
-
-.. only:: esp32s3
-
-    ESP32-S3 can load model data from flash or SD card.
-
-Load Model Data from flash
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-#. Write a partition table:
-
-    ::
-
-        model,  data, spiffs,         , SIZE,
-
-    Among them, ``SIZE`` can refer to the recommended size when the user uses ``idf.py build`` to compile, for example: ``Recommended model partition size: 500K``
-
-#. Initialize the flash partition: User can use ``esp_srmodel_init(partition_label)`` API to initialize flash and return all loaded models.
-
-    -  base_path: The model storage ``base_path`` is ``srmodel`` and cannot be changed
-    -  partition_label: The partition label of the model is ``model``, which needs to be consistent with the ``Name`` in the above partition table
-
-After completing the above configuration, the project will automatically generate ``model.bin`` after the project is compiled, and flash it to the flash partition.
-
-.. only:: esp32s3
-
-    Load Model Data from SD Card
-    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-    When configured to load model data from ``Read model data from SD card``, users need to:
-
-    -  Manually load model data from SD card
-        After the above-mentioned configuration, users can compile the code, and copy the files in ``model/target`` to the root directory of the SD card.
-
-    -  Initialize SD card
-        Users must initialize SD card so the chip can load SD card. Users of `ESP-Skainet <https://github.com/espressif/esp-skainet>`_ can call  ``esp_sdcard_init("/sdcard", num);`` to initialize any board supported SD cards. Otherwise, users need to write the initialization code themselves.
-        After the above-mentioned steps, users can flash the project.
-
-    - Read models
-         User use ``esp_srmodel_init(model_path)`` to read models in ``model_path`` of SD card.
-
-
-.. |select wake wake| image:: ../../_static/wn_menu1.png
-.. |multi wake wake| image:: ../../_static/wn_menu2.png
-.. |image1| image:: ../../_static/wn_menu3.png
-
-
-.. only:: html
-
-    Model initialization and Usage
-    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-    ::
-
-        //
-        // step1: return models in flash or in sdcard
-        //
-        char *model_path = your_model_path: // partition_label or model_path in sdcard;
-        models = esp_srmodel_init(model_path);
-
-       //
-       // step2: select the specific model by keywords
-       //
-       char *wn_name = esp_srmodel_filter(models, ESP_WN_PREFIX, NULL); // select WakeNet model
-       char *nm_name = esp_srmodel_filter(models, ESP_MN_PREFIX, NULL); // select MultiNet model
-       char *alexa_wn_name = esp_srmodel_filter(models, ESP_WN_PREFIX, "alexa"); // select WakeNet with "alexa" wake word.
-       char *en_mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_ENGLISH); // select english MultiNet model
-       char *cn_mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_CHINESE); // select english MultiNet model
-
-       // It also works if you use the model name directly in your code.
-       char *my_wn_name = "wn9_hilexin"
-       // we recommend you to check that it is loaded correctly
-        if (!esp_srmodel_exists(models, my_wn_name))
-            printf("%s can not be loaded correctly\n")
-
-       //
-       // step3: initialize model
-       //
-       esp_wn_iface_t *wakenet = esp_wn_handle_from_name(wn_name);
-       model_iface_data_t *wn_model_data = wakenet->create(wn_name, DET_MODE_2CH_90);
-
-       esp_mn_iface_t *multinet = esp_mn_handle_from_name(mn_name);
-       model_iface_data_t *mn_model_data = multinet->create(mn_name, 6000);
+.. important::  
+   Just regenerate ``srmodels.bin`` after changing model configurations in ``menuconfig``.
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@ -19,7 +19,7 @@ ESP-SR User Guide
    VAD Model vadnet <vadnet/README>
    Speech Command Word MultiNet <speech_command_recognition/README>
    Speech Synthesis (Only Supports Chinese Language) <speech_synthesis/readme>
-    Flashing Models <flash_model/README>
+    Model Selection and Loading <flash_model/README>
    Benchmark <benchmark/README>
    Test Methods <test_report/README>
    Glossary <glossary/glossary>
--- a/docs/zh_CN/flash_model/README.rst
+++ b/docs/zh_CN/flash_model/README.rst
@ -1,215 +1,78 @@
-模型加载
-========
+模型选择和加载
+===========================

 :link_to_translation:`en:[English]`

-在人工智能行业中，模型是指一个系统或过程的数学表示。它用于基于输入数据做出预测或决策，有许多不同类型的模型，如决策树、神经网络和支持向量机，每种模型都有其优缺点。乐鑫也提供经过训练的 WakeNet 和 MultiNet 模型（数据模型见 :project:`model`） 。
+本文档解释了如何为ESP-SR选择和加载模型。

-使用模型前需先将其加载至你的项目，目前 ESP-SR 支持以下模型加载方式：
+模型选择
+---------------

-.. only:: esp32
+ESP-SR允许您通过 ``menuconfig`` 界面选择所需的模型。要配置模型：

-    ESP32：从 Flash 中直接加载
+1. 运行 ``idf.py menuconfig``
+2. 导航到 **ESP Speech Recognition**
+3. 配置以下选项：
+   - **噪声抑制模型**
+   - **VAD模型**
+   - **WakeNet模型**
+   - **MultiNet模型**

-.. only:: esp32s3
+.. figure:: ../../_static/kconfig.png
+    :alt: kconfig

-    ESP32-S3：

-    -  从 SPI 闪存(flash)文件系统分区加载
-    -  从外部 SD 卡加载
+更新分区表
+------------------------
+您必须添加一个 `partition.csv` 文件，并确保有足够的空间来存储所选的模型。
+在项目的 ``partitions.csv`` 文件中添加以下行，以分配模型所需的空间：

-    因此具有以下优势：
+.. code-block::

-    -  大大减小用户应用 APP BIN 的大小
-    -  支持选择最多两个唤醒词
-    -  支持中文和英文命令词识别在线切换
-    -  方便用户进行 OTA
-    -  支持从 SD 卡读取和更换模型，更加便捷且可以缩减项目使用的模组 Flash 大小
-    -  当用户进行开发时，当修改不涉及模型时，可以避免每次烧录模型数据，大大缩减烧录时间，提高开发效率
+    model,  data,        ,         ,    6000K

-配置方法
--------
+- 将 ``6000K`` 替换为您根据所选模型自定义的分区大小。
+- ``model`` 是分区标签（固定值）。

-运行 ``idf.py menuconfig`` 进入 ``ESP Speech Recognition``:
+模型加载
+-------------

-.. figure:: ../../_static/model-1.png
-    :alt: overview
+ESP-IDF框架
+~~~~~~~~~~~~~~~~~

-    overview
+ESP-SR通过其CMake脚本自动处理模型加载：

-.. only:: esp32s3
+1. 烧写设备并包含所有组件：
+   ``idf.py flash``
+   *此命令会自动加载所选模型。*

-    Model Data Path
-    ~~~~~~~~~~~~~~~
+2. 在代码调试时（不重新烧写模型）：
+   ``idf.py app-flash``

-    该选项表示模型数据的存储位置，支持选择 ``Read model data from flash`` 或 ``Read model data from SD Card`` 。
+.. note::  
+   模型加载脚本在 ``esp-sr/CMakeLists.txt`` 中定义。模型在初始烧写时会被写入标签为 ``model`` 的分区。

-    -  ``Read model data from flash`` 表示模型数据存储在 flash 分区中，模型数据将会从 flash 分区中加载
-    -  ``SD Card`` 表示模型数据存储在 SD 卡中，模型数据将会从 SD 卡中加载
+Arduino框架
+~~~~~~~~~~~~~~~~~

-使用 AFE
-~~~~~~~~
+手动生成和加载模型：

-此选项需要打开，用户无须修改，请保持默认配置。
+1. 使用提供的Python脚本生成 ``srmodels.bin``：

-使用 WakeNet
-~~~~~~~~~~~~~
+   .. code-block:: bash

-此选项默认打开。当用户只使用 AEC 或者 BSS 等，而无须运行 WakeNet 或 MultiNet 时，请关闭次选项，这将会减小工程固件的大小。
+      python {esp-sr_path}/movemodel.py -d1 {sdkconfig_path} -d2 {esp-sr_path} -d3 {build_path}

-根据 ``menuconfig`` 列表选择唤醒词模型， ``ESP Speech Recognition`` > ``Select wake words``。括号中为唤醒词模型的名字，在代码中初始化 WakeNet 时需写入对应的名字。
+   **参数：**

-    |select wake wake|
+   - ``esp-sr_path``：您的ESP-SR组件目录路径
+   - ``sdkconfig_path``：项目的 ``sdkconfig`` 文件路径
+   - ``build_path``：项目的构建目录（通常是 ``your_project_path/build``）

-如果想加载多个唤醒词，以便在代码中进行唤醒词的切换，首选选择 ``Load Multiple Wake Words``
+2. 生成的 ``srmodels.bin`` 将位于：
+   ``{build_path}/srmodels/srmodels.bin``

-    |multi wake wake|
+3. 将生成的二进制文件烧写到设备上。

-然后按照列表选择多个唤醒词：
-
-    |image1|
-
-.. only:: esp32
-
-    .. note::
-        ESP32 不支持多唤醒词选项。
-
-.. only:: esp32s3
-
-    .. note::
-        ESP32-S3 支持多唤醒词选项。用户可根据具体硬件 flash 容量，选择合适数量的唤醒词。
-
-更多细节请参考 :doc:`WakeNet <../wake_word_engine/README>` 。
-
-使用 MultiNet
-~~~~~~~~~~~~~~
-
-此选项默认打开。当用户只使用 WakeNet 或者其他算法模块时，请关闭此选项，将会在一些情况下减小工程固件的大小。
-
-中文命令词识别模型 (Chinese Speech Commands Model)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. only:: esp32
-
-    ESP32 芯片只支持中文命令词识别：
-
-    -  None
-    -  Chinese single recognition (MultiNet2)
-
-.. only:: esp32s3
-
-    ESP32-S3 支持中文和英文命令词识别，且支持中英文识别模型切换。
-
-    -  None
-    -  Chinese single recognition (MultiNet4.5)
-    -  Chinese single recognition (MultiNet4.5 quantized with 8-bit)
-    -  English Speech Commands Model
-
-    当用户在 ``Chinese Speech Commands Model`` 中选择非 ``None`` 时，需要在该项处添加中文命令词。
-
-.. only:: esp32s3
-
-    英文命令词识别模型 (English Speech Commands Model)
-    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-    ESP32-S3 支持中文和英文命令词识别，且支持中英文识别模型切换。
-
-    -  None
-    -  English recognition (MultiNet5 quantized with 8-bit, depends on WakeNet8)
-    -  Add Chinese speech commands
-
-    当用户在 ``English Speech Commands Model`` 中选择非 ``None`` 时，需要在该项处添加英文命令词。
-
-用户按照需求自定义添加命令词，具体请参考 :doc:`MultiNet <../speech_command_recognition/README>` 。
-
-模型使用
---------
-
-当用户完成以上的配置选择后，可参考 `ESP-Skainet <https://github.com/espressif/esp-skainet>`_ 应用层仓库中的介绍，进行初始化和使用。
-
-这里主要介绍模型加载在用户工程中的代码实现，用户也可直接参考代码 :project_file:`src/model_path.c`。
-
-.. only:: esp32
-
-    ESP32 仅支持从 Flash 中直接加载模型数据，因此代码中模型数据会自动按照地址从 Flash 中读取所需数据。为了和 ESP32-S3 进行兼容，ESP32 代码中模型的初始化方法与 ESP32-S3 相同。
-
-.. only:: esp32s3
-
-    ESP32-S3 支持从 Flash 或 SD 卡中直接加载模型数据，下方将分别介绍。
-
-模型数据存储在 Flash
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-#.  编写分区表：
-
-    ::
-
-        model,  data, data,         , SIZE,
-
-    其中 SIZE 可以参考在用户使用 ``idf.py build`` 编译时的推荐大小，例如： ``Recommended model partition size: 500K`` 。
-
-#.  初始化 partition 分区：用户可以直接调用提供的 ``esp_srmodel_init(partition_label)`` API 来获取 partition 中的模型。
-
-    -  partition_label：为partition table 中定义的模型的分区，需要和上述函数的入参保持一致
-
-完成上述配置后，模型会在工程编译完成后自动生成 ``srmodels.bin`` ，并在用户调用 ``idf.py flash`` 时烧写到指定 分区。
-
-.. only:: esp32s3
-
-    模型数据存储在 SD 卡
-    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-    当用户配置模型数据存储位置是 ``SD Card`` 时，用户需要：
-
-    -  手动移动模型数据至 SD 卡中
-        用户完成以上配置后，可以先进行编译，编译完成后将 ``model/target`` 目录下的文件拷贝至 SD 卡的根目录。
-
-    -  初始化 SD 卡
-        用户需要初始化 SD 卡，来使系统能够记载 SD 卡。如果用户使用 `ESP-Skainet <https://github.com/espressif/esp-skainet>`_ ，可以直接调用 ``esp_sdcard_init("/sdcard", num);`` 来初始化其支持开发板的 SD 卡。否则，需要自己编写初始化程序。
-        完成以上操作后，便可以进行工程的烧录。
-
-    -  自定义路径
-        使用``esp_srmodel_init(model_path)``来获取sdcard指定路径``esp_srmodel_init(partition_label)``中的所有model name。
-
-
-.. |select wake wake| image:: ../../_static/wn_menu1.png
-.. |multi wake wake| image:: ../../_static/wn_menu2.png
-.. |image1| image:: ../../_static/wn_menu3.png
-
-
-.. only:: html
-
-    代码中模型初始化与使用
-    ~~~~~~~~~~~~~~~~~~~~~~
-
-    ::
-
-            //
-            // step1: return models in flash
-            //
-            char *model_path = your_model_path: // partition_label or model_path in sdcard;
-            models = esp_srmodel_init(model_path);
-
-            //
-            // step2: select the specific model by keywords
-            //
-            char *wn_name = esp_srmodel_filter(models, ESP_WN_PREFIX, NULL); // select WakeNet model
-            char *nm_name = esp_srmodel_filter(models, ESP_MN_PREFIX, NULL); // select MultiNet model
-            char *alexa_wn_name = esp_srmodel_filter(models, ESP_WN_PREFIX, "alexa"); // select WakeNet with "alexa" wake word.
-            char *en_mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_ENGLISH); // select english MultiNet model
-            char *cn_mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_CHINESE); // select english MultiNet model
-
-            // It also works if you use the model name directly in your code.
-            char *my_wn_name = "wn9_hilexin"
-            // we recommend you to check that it is loaded correctly
-            if (!esp_srmodel_exists(models, my_wn_name))
-                printf("%s can not be loaded correctly\n")
-
-            //
-            // step3: initialize model
-            //
-            esp_wn_iface_t *wakenet = esp_wn_handle_from_name(wn_name);
-            model_iface_data_t *wn_model_data = wakenet->create(wn_name, DET_MODE_2CH_90);
-
-            esp_mn_iface_t *multinet = esp_mn_handle_from_name(mn_name);
-            model_iface_data_t *mn_model_data = multinet->create(mn_name, 6000);
+.. important::  
+   仅在 ``menuconfig`` 中更改模型配置后，请重新生成 ``srmodels.bin``。
--- a/docs/zh_CN/index.rst
+++ b/docs/zh_CN/index.rst
@ -20,7 +20,7 @@ ESP-SR 用户指南
    VAD vadnet <vadnet/README>
    语音指令 MultiNet <speech_command_recognition/README>
    语音合成（仅支持中文）<speech_synthesis/readme>
-    模型加载 <flash_model/README>
+    模型选择和加载 <flash_model/README>
    性能测试结果 <benchmark/README>
    性能测试方法 <test_report/README>
    术语表 <glossary/glossary>
--- a/idf_component.yml
+++ b/idf_component.yml
@ -1,4 +1,4 @@
-version: "2.0.0"
+version: "2.0.1"
 description: esp_sr provides basic algorithms for Speech Recognition applications
 url: https://github.com/espressif/esp-sr
 dependencies:
--- a/include/esp32/esp_afe_aec.h
+++ b/include/esp32/esp_afe_aec.h
@ -2,9 +2,8 @@
 #ifndef _ESP_AFE_AEC_H_
 #define _ESP_AFE_AEC_H_

-
-#include "esp_afe_config.h"
 #include "esp_aec.h"
+#include "esp_afe_config.h"

 #include <stdint.h>

@ -13,19 +12,19 @@ extern "C" {
 #endif

 typedef struct {
-    aec_handle_t* handle;
+    aec_handle_t *handle;
    aec_mode_t mode;
    afe_pcm_config_t pcm_config;
    int frame_size;
-    int16_t  *data;
-}afe_aec_handle_t;
-
+    int16_t *data;
+} afe_aec_handle_t;

 /**
- * @brief Creates an instance to the AEC structure. 
- * 
- * @warning Currently only support 1 microphone channel and 1 playback channe. 
- * If input has multiple microphone channels and playback channels, just the first microphone channel and playback channel will be selected.
+ * @brief Creates an instance to the AEC structure.
+ *
+ * @warning Currently only support 1 microphone channel and 1 playback channe.
+ * If input has multiple microphone channels and playback channels, just the first microphone channel and playback
+ * channel will be selected.
 *
 * The input format, same as afe config:
 * M to represent the microphone channel
@ -37,7 +36,8 @@ typedef struct {
 *
 * @param input_format     The input format
 * @param filter_length    The length of filter. The larger the filter, the higher the CPU loading.
- *                         Recommended filter_length = 4 for esp32s3 and esp32p4. Recommended filter_length = 2 for esp32c5.
+ *                         Recommended filter_length = 4 for esp32s3 and esp32p4. Recommended filter_length = 2 for
+ * esp32c5.
 * @param type             The type of afe, AFE_TYPE_SR or AFE_TYPE_VC
 * @param mode             The mode of afe, AFE_MODE_LOW_COST or AFE_MODE_HIGH_PERF
 *
@ -45,17 +45,17 @@ typedef struct {
 */
 afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode);

-
 /**
 * @brief Performs echo cancellation a frame, based on the audio sent to the speaker and frame from mic.
- * 
+ *
 * @param inst        The instance of AEC.
- * @param indata      Input audio data, format is define by input_format. Note indata will be modified in function call.
- * @param outdata     Returns near-end signal with echo removed. 
+ * @param indata      Input audio data, format is define by input_format.
+ * @param outdata     Near-end signal with echo removed.  outdata must be 16-bit aligned.
+ *                    please use heap_caps_aligned_calloc(16, n, size, caps) to allocate an aligned chunk of memory

 * @return The bytes of outdata.
 */
-size_t afe_aec_process(afe_aec_handle_t *handel, int16_t *indata, int16_t *outdata);
+size_t afe_aec_process(afe_aec_handle_t *handel, const int16_t *indata, int16_t *outdata);

 /**
 * @brief Get frame size of AEC (the samples of one frame)
@ -64,7 +64,6 @@ size_t afe_aec_process(afe_aec_handle_t *handel, int16_t *indata, int16_t *outda
 */
 int afe_aec_get_chunksize(afe_aec_handle_t *handle);

-
 /**
 * @brief Free the AEC instance
 *
--- a/include/esp32/esp_sr_webrtc.h
+++ b/include/esp32/esp_sr_webrtc.h
@ -14,31 +14,30 @@
 #ifndef _ESP_WEBRTC_H_
 #define _ESP_WEBRTC_H_

-
 #ifdef __cplusplus
 extern "C" {
 #endif
-#include <stdint.h>
-#include "sr_ringbuf.h"
-#include "esp_log.h"
 #include "esp_agc.h"
+#include "esp_log.h"
 #include "esp_ns.h"
+#include "sr_ringbuf.h"
+#include <stdint.h>

 #include "esp_heap_caps.h"

 typedef struct {
-    void* ns_handle;
-    void* agc_handle;
+    void *ns_handle;
+    void *agc_handle;
    int frame_size;
    int sample_rate;
    int16_t *buff;
    int16_t *out_data;
    sr_ringbuf_handle_t rb;
-}webrtc_handle_t;
+} webrtc_handle_t;

 /**
 * @brief Creates an instance of webrtc.
- * 
+ *
 * @warning frame_length can supports be 10 ms, 20 ms, 30 ms, 32 ms.
 *
 * @param frame_length_ms    The length of the audio processing
@ -46,19 +45,14 @@ typedef struct {
 * @param agc_mode           The model of AGC
 * @param agc_gain           The gain of AGC. default is 9
 * @param agc_target_level   The target level of AGC. default is -3 dbfs
- * @param sample_rate        The sample rate of the audio. 
+ * @param sample_rate        The sample rate of the audio.
 *
 * @return
 *         - NULL: Create failed
 *         - Others: The instance of webrtc
 */
-webrtc_handle_t* webrtc_create(
-    int frame_length_ms, 
-    int ns_mode, 
-    agc_mode_t agc_mode, 
-    int agc_gain, 
-    int agc_target_level, 
-    int sample_rate);
+webrtc_handle_t *webrtc_create(
+    int frame_length_ms, int ns_mode, agc_mode_t agc_mode, int agc_gain, int agc_target_level, int sample_rate);

 /**
 * @brief Feed samples of an audio stream to the webrtc and get the audio stream after Noise suppression.
@ -71,7 +65,7 @@ webrtc_handle_t* webrtc_create(
 *
 * @return data after noise suppression
 */
-int16_t* webrtc_process(webrtc_handle_t *handle, int16_t *indata, int *size, bool enable_ns, bool enable_agc);
+int16_t *webrtc_process(webrtc_handle_t *handle, int16_t *indata, int *size, bool enable_ns, bool enable_agc);

 /**
 * @brief Free the webrtc instance
--- a/include/esp32p4/esp_afe_aec.h
+++ b/include/esp32p4/esp_afe_aec.h
@ -2,9 +2,8 @@
 #ifndef _ESP_AFE_AEC_H_
 #define _ESP_AFE_AEC_H_

-
-#include "esp_afe_config.h"
 #include "esp_aec.h"
+#include "esp_afe_config.h"

 #include <stdint.h>

@ -13,19 +12,19 @@ extern "C" {
 #endif

 typedef struct {
-    aec_handle_t* handle;
+    aec_handle_t *handle;
    aec_mode_t mode;
    afe_pcm_config_t pcm_config;
    int frame_size;
-    int16_t  *data;
-}afe_aec_handle_t;
-
+    int16_t *data;
+} afe_aec_handle_t;

 /**
- * @brief Creates an instance to the AEC structure. 
- * 
- * @warning Currently only support 1 microphone channel and 1 playback channe. 
- * If input has multiple microphone channels and playback channels, just the first microphone channel and playback channel will be selected.
+ * @brief Creates an instance to the AEC structure.
+ *
+ * @warning Currently only support 1 microphone channel and 1 playback channe.
+ * If input has multiple microphone channels and playback channels, just the first microphone channel and playback
+ * channel will be selected.
 *
 * The input format, same as afe config:
 * M to represent the microphone channel
@ -37,7 +36,8 @@ typedef struct {
 *
 * @param input_format     The input format
 * @param filter_length    The length of filter. The larger the filter, the higher the CPU loading.
- *                         Recommended filter_length = 4 for esp32s3 and esp32p4. Recommended filter_length = 2 for esp32c5.
+ *                         Recommended filter_length = 4 for esp32s3 and esp32p4. Recommended filter_length = 2 for
+ * esp32c5.
 * @param type             The type of afe, AFE_TYPE_SR or AFE_TYPE_VC
 * @param mode             The mode of afe, AFE_MODE_LOW_COST or AFE_MODE_HIGH_PERF
 *
@ -45,17 +45,17 @@ typedef struct {
 */
 afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode);

-
 /**
 * @brief Performs echo cancellation a frame, based on the audio sent to the speaker and frame from mic.
- * 
+ *
 * @param inst        The instance of AEC.
- * @param indata      Input audio data, format is define by input_format. Note indata will be modified in function call.
- * @param outdata     Returns near-end signal with echo removed. 
+ * @param indata      Input audio data, format is define by input_format.
+ * @param outdata     Near-end signal with echo removed.  outdata must be 16-bit aligned.
+ *                    please use heap_caps_aligned_calloc(16, n, size, caps) to allocate an aligned chunk of memory

 * @return The bytes of outdata.
 */
-size_t afe_aec_process(afe_aec_handle_t *handel, int16_t *indata, int16_t *outdata);
+size_t afe_aec_process(afe_aec_handle_t *handel, const int16_t *indata, int16_t *outdata);

 /**
 * @brief Get frame size of AEC (the samples of one frame)
@ -64,7 +64,6 @@ size_t afe_aec_process(afe_aec_handle_t *handel, int16_t *indata, int16_t *outda
 */
 int afe_aec_get_chunksize(afe_aec_handle_t *handle);

-
 /**
 * @brief Free the AEC instance
 *
--- a/include/esp32p4/esp_sr_webrtc.h
+++ b/include/esp32p4/esp_sr_webrtc.h
@ -14,31 +14,30 @@
 #ifndef _ESP_WEBRTC_H_
 #define _ESP_WEBRTC_H_

-
 #ifdef __cplusplus
 extern "C" {
 #endif
-#include <stdint.h>
-#include "sr_ringbuf.h"
-#include "esp_log.h"
 #include "esp_agc.h"
+#include "esp_log.h"
 #include "esp_ns.h"
+#include "sr_ringbuf.h"
+#include <stdint.h>

 #include "esp_heap_caps.h"

 typedef struct {
-    void* ns_handle;
-    void* agc_handle;
+    void *ns_handle;
+    void *agc_handle;
    int frame_size;
    int sample_rate;
    int16_t *buff;
    int16_t *out_data;
    sr_ringbuf_handle_t rb;
-}webrtc_handle_t;
+} webrtc_handle_t;

 /**
 * @brief Creates an instance of webrtc.
- * 
+ *
 * @warning frame_length can supports be 10 ms, 20 ms, 30 ms, 32 ms.
 *
 * @param frame_length_ms    The length of the audio processing
@ -46,19 +45,14 @@ typedef struct {
 * @param agc_mode           The model of AGC
 * @param agc_gain           The gain of AGC. default is 9
 * @param agc_target_level   The target level of AGC. default is -3 dbfs
- * @param sample_rate        The sample rate of the audio. 
+ * @param sample_rate        The sample rate of the audio.
 *
 * @return
 *         - NULL: Create failed
 *         - Others: The instance of webrtc
 */
-webrtc_handle_t* webrtc_create(
-    int frame_length_ms, 
-    int ns_mode, 
-    agc_mode_t agc_mode, 
-    int agc_gain, 
-    int agc_target_level, 
-    int sample_rate);
+webrtc_handle_t *webrtc_create(
+    int frame_length_ms, int ns_mode, agc_mode_t agc_mode, int agc_gain, int agc_target_level, int sample_rate);

 /**
 * @brief Feed samples of an audio stream to the webrtc and get the audio stream after Noise suppression.
@ -71,7 +65,7 @@ webrtc_handle_t* webrtc_create(
 *
 * @return data after noise suppression
 */
-int16_t* webrtc_process(webrtc_handle_t *handle, int16_t *indata, int *size, bool enable_ns, bool enable_agc);
+int16_t *webrtc_process(webrtc_handle_t *handle, int16_t *indata, int *size, bool enable_ns, bool enable_agc);

 /**
 * @brief Free the webrtc instance
--- a/include/esp32s3/esp_afe_aec.h
+++ b/include/esp32s3/esp_afe_aec.h
@ -2,9 +2,8 @@
 #ifndef _ESP_AFE_AEC_H_
 #define _ESP_AFE_AEC_H_

-
-#include "esp_afe_config.h"
 #include "esp_aec.h"
+#include "esp_afe_config.h"

 #include <stdint.h>

@ -13,19 +12,19 @@ extern "C" {
 #endif

 typedef struct {
-    aec_handle_t* handle;
+    aec_handle_t *handle;
    aec_mode_t mode;
    afe_pcm_config_t pcm_config;
    int frame_size;
-    int16_t  *data;
-}afe_aec_handle_t;
-
+    int16_t *data;
+} afe_aec_handle_t;

 /**
- * @brief Creates an instance to the AEC structure. 
- * 
- * @warning Currently only support 1 microphone channel and 1 playback channe. 
- * If input has multiple microphone channels and playback channels, just the first microphone channel and playback channel will be selected.
+ * @brief Creates an instance to the AEC structure.
+ *
+ * @warning Currently only support 1 microphone channel and 1 playback channe.
+ * If input has multiple microphone channels and playback channels, just the first microphone channel and playback
+ * channel will be selected.
 *
 * The input format, same as afe config:
 * M to represent the microphone channel
@ -37,7 +36,8 @@ typedef struct {
 *
 * @param input_format     The input format
 * @param filter_length    The length of filter. The larger the filter, the higher the CPU loading.
- *                         Recommended filter_length = 4 for esp32s3 and esp32p4. Recommended filter_length = 2 for esp32c5.
+ *                         Recommended filter_length = 4 for esp32s3 and esp32p4. Recommended filter_length = 2 for
+ * esp32c5.
 * @param type             The type of afe, AFE_TYPE_SR or AFE_TYPE_VC
 * @param mode             The mode of afe, AFE_MODE_LOW_COST or AFE_MODE_HIGH_PERF
 *
@ -45,17 +45,17 @@ typedef struct {
 */
 afe_aec_handle_t *afe_aec_create(const char *input_format, int filter_length, afe_type_t type, afe_mode_t mode);

-
 /**
 * @brief Performs echo cancellation a frame, based on the audio sent to the speaker and frame from mic.
- * 
+ *
 * @param inst        The instance of AEC.
- * @param indata      Input audio data, format is define by input_format. Note indata will be modified in function call.
- * @param outdata     Returns near-end signal with echo removed. 
+ * @param indata      Input audio data, format is define by input_format.
+ * @param outdata     Near-end signal with echo removed.  outdata must be 16-bit aligned.
+ *                    please use heap_caps_aligned_calloc(16, n, size, caps) to allocate an aligned chunk of memory

 * @return The bytes of outdata.
 */
-size_t afe_aec_process(afe_aec_handle_t *handel, int16_t *indata, int16_t *outdata);
+size_t afe_aec_process(afe_aec_handle_t *handel, const int16_t *indata, int16_t *outdata);

 /**
 * @brief Get frame size of AEC (the samples of one frame)
@ -64,7 +64,6 @@ size_t afe_aec_process(afe_aec_handle_t *handel, int16_t *indata, int16_t *outda
 */
 int afe_aec_get_chunksize(afe_aec_handle_t *handle);

-
 /**
 * @brief Free the AEC instance
 *
--- a/include/esp32s3/esp_sr_webrtc.h
+++ b/include/esp32s3/esp_sr_webrtc.h
@ -0,0 +1,84 @@
+// Copyright 2015-2019 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License
+#ifndef _ESP_WEBRTC_H_
+#define _ESP_WEBRTC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+#include "esp_agc.h"
+#include "esp_log.h"
+#include "esp_ns.h"
+#include "sr_ringbuf.h"
+#include <stdint.h>
+
+#include "esp_heap_caps.h"
+
+typedef struct {
+    void *ns_handle;
+    void *agc_handle;
+    int frame_size;
+    int sample_rate;
+    int16_t *buff;
+    int16_t *out_data;
+    sr_ringbuf_handle_t rb;
+} webrtc_handle_t;
+
+/**
+ * @brief Creates an instance of webrtc.
+ *
+ * @warning frame_length can supports be 10 ms, 20 ms, 30 ms, 32 ms.
+ *
+ * @param frame_length_ms    The length of the audio processing
+ * @param ns_mode            The mode of NS. -1 means NS is disabled. 0: Mild, 1: Medium, 2: Aggressive
+ * @param agc_mode           The model of AGC
+ * @param agc_gain           The gain of AGC. default is 9
+ * @param agc_target_level   The target level of AGC. default is -3 dbfs
+ * @param sample_rate        The sample rate of the audio.
+ *
+ * @return
+ *         - NULL: Create failed
+ *         - Others: The instance of webrtc
+ */
+webrtc_handle_t *webrtc_create(
+    int frame_length_ms, int ns_mode, agc_mode_t agc_mode, int agc_gain, int agc_target_level, int sample_rate);
+
+/**
+ * @brief Feed samples of an audio stream to the webrtc and get the audio stream after Noise suppression.
+ *
+ * @param handle        The instance of NS.
+ * @param in_data       An array of 16-bit signed audio samples.
+ * @param out_size      The sample size of output data
+ * @param enable_ns     Enable noise suppression
+ * @param enable_agc    Enable automatic gain control
+ *
+ * @return data after noise suppression
+ */
+int16_t *webrtc_process(webrtc_handle_t *handle, int16_t *indata, int *size, bool enable_ns, bool enable_agc);
+
+/**
+ * @brief Free the webrtc instance
+ *
+ * @param handle The instance of webrtc.
+ *
+ * @return None
+ *
+ */
+void webrtc_destroy(webrtc_handle_t *handle);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif //_ESP_NS_H_
--- a/lib/esp32/libesp_audio_front_end.a
+++ b/lib/esp32/libesp_audio_front_end.a
--- a/lib/esp32/libesp_audio_processor.a
+++ b/lib/esp32/libesp_audio_processor.a
--- a/lib/esp32p4/libesp_audio_front_end.a
+++ b/lib/esp32p4/libesp_audio_front_end.a
--- a/lib/esp32p4/libesp_audio_processor.a
+++ b/lib/esp32p4/libesp_audio_processor.a
--- a/lib/esp32p4/libmultinet.a
+++ b/lib/esp32p4/libmultinet.a
--- a/lib/esp32p4/libvadnet.a
+++ b/lib/esp32p4/libvadnet.a
--- a/lib/esp32p4/libwakenet.a
+++ b/lib/esp32p4/libwakenet.a
--- a/lib/esp32s3/libc_speech_features.a
+++ b/lib/esp32s3/libc_speech_features.a
--- a/lib/esp32s3/libdl_lib.a
+++ b/lib/esp32s3/libdl_lib.a
--- a/lib/esp32s3/libesp_audio_front_end.a
+++ b/lib/esp32s3/libesp_audio_front_end.a
--- a/lib/esp32s3/libesp_audio_processor.a
+++ b/lib/esp32s3/libesp_audio_processor.a
--- a/lib/esp32s3/libflite_g2p.a
+++ b/lib/esp32s3/libflite_g2p.a
--- a/lib/esp32s3/libfst.a
+++ b/lib/esp32s3/libfst.a
--- a/lib/esp32s3/libhufzip.a
+++ b/lib/esp32s3/libhufzip.a
--- a/lib/esp32s3/libmultinet.a
+++ b/lib/esp32s3/libmultinet.a
--- a/lib/esp32s3/libnsnet.a
+++ b/lib/esp32s3/libnsnet.a
--- a/lib/esp32s3/libvadnet.a
+++ b/lib/esp32s3/libvadnet.a
--- a/lib/esp32s3/libwakenet.a
+++ b/lib/esp32s3/libwakenet.a
--- a/test_apps/esp-sr/main/samples/audio_test_file.h
+++ b/test_apps/esp-sr/main/samples/audio_test_file.h
--- a/test_apps/esp-sr/main/test_afe.cpp
+++ b/test_apps/esp-sr/main/test_afe.cpp
@ -6,35 +6,33 @@
   software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
   CONDITIONS OF ANY KIND, either express or implied.
 */
-#include <stdio.h>
-#include <stdlib.h>
-#include "string.h"
-#include <limits.h>
-#include "unity.h"
-#include "esp_log.h"
-#include "esp_timer.h"
-#include "model_path.h"
-#include "esp_wn_iface.h"
-#include "esp_wn_models.h"
-#include "esp_afe_sr_models.h"
+#include "audio_test_file.h"
 #include "dl_lib_convq_queue.h"
 #include "esp_afe_aec.h"
+#include "esp_afe_sr_models.h"
+#include "esp_heap_caps.h"
+#include "esp_log.h"
+#include "esp_nsn_iface.h"
+#include "esp_nsn_models.h"
+#include "esp_timer.h"
+#include "esp_wn_iface.h"
+#include "esp_wn_models.h"
+#include "model_path.h"
+#include "string.h"
+#include "unity.h"
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
 #include <sys/time.h>

-#if (CONFIG_IDF_TARGET_ESP32S3 || CONFIG_IDF_TARGET_ESP32P4)
-#include "esp_nsn_models.h"
-#include "esp_nsn_iface.h"
-#endif
-
-#define ARRAY_SIZE_OFFSET                   8       // Increase this if audio_sys_get_real_time_stats returns ESP_ERR_INVALID_SIZE
-#define AUDIO_SYS_TASKS_ELAPSED_TIME_MS     1000    // Period of stats measurement
+#define ARRAY_SIZE_OFFSET 8 // Increase this if audio_sys_get_real_time_stats returns ESP_ERR_INVALID_SIZE
+#define AUDIO_SYS_TASKS_ELAPSED_TIME_MS 1000 // Period of stats measurement

 static const char *TAG = "AFE_TEST";
 static int detect_cnt = 0;
 static int fetch_task_flag = 0;

-
-void test_afe_by_config(afe_config_t *afe_config, int frame_num, int* memory, float* cpu, int idx)
+void test_afe_by_config(afe_config_t *afe_config, int frame_num, int *memory, float *cpu, int idx)
 {
    int start_size = heap_caps_get_free_size(MALLOC_CAP_8BIT);
    int start_internal_size = heap_caps_get_free_size(MALLOC_CAP_INTERNAL);
@ -43,13 +41,13 @@ void test_afe_by_config(afe_config_t *afe_config, int frame_num, int* memory, fl
    int mem_leak = 0;
    uint32_t feed_cpu_time = 0;
    uint32_t fetch_cpu_time = 0;
-    uint32_t start=0, end = 0;
+    uint32_t start = 0, end = 0;
    int loop = 3;
    int feed_chunksize = 0;
    int create_size = 0;
    int create_internal_size = 0;

-    for (int i=0; i<loop; i++) {
+    for (int i = 0; i < loop; i++) {
        // init config and handle
        esp_afe_sr_iface_t *afe_handle = esp_afe_handle_from_config(afe_config);
        // afe_config_print(afe_config);
@ -62,17 +60,17 @@ void test_afe_by_config(afe_config_t *afe_config, int frame_num, int* memory, fl
        feed_chunksize = afe_handle->get_feed_chunksize(afe_data);
        int feed_nch = afe_handle->get_feed_channel_num(afe_data);

-        int16_t *feed_buff = (int16_t *) malloc(feed_chunksize * sizeof(int16_t) * feed_nch);
+        int16_t *feed_buff = (int16_t *)malloc(feed_chunksize * sizeof(int16_t) * feed_nch);
        start = esp_timer_get_time();
-        for (int j=0; j<frame_num; j++) {
+        for (int j = 0; j < frame_num; j++) {
            afe_handle->feed(afe_data, feed_buff);
        }
        end = esp_timer_get_time();
        feed_cpu_time += end - start;

-        //run afe fetch
+        // run afe fetch
        start = esp_timer_get_time();
-        while(1) {
+        while (1) {
            afe_fetch_result_t *res = afe_handle->fetch_with_delay(afe_data, 1 / portTICK_PERIOD_MS);
            if (res->ret_value != ESP_OK) {
                break;
@ -84,19 +82,22 @@ void test_afe_by_config(afe_config_t *afe_config, int frame_num, int* memory, fl
        afe_handle->destroy(afe_data);
        end_size = heap_caps_get_free_size(MALLOC_CAP_8BIT);

-        if (i==0) {
+        if (i == 0) {
            first_end_size = end_size;
-        } 
+        }
        mem_leak = start_size - end_size;
        ESP_LOGI(TAG, "create&destroy times:%d, memory leak:%d\n", i, mem_leak);
    }
    uint32_t feed_data_time = loop * frame_num * feed_chunksize / 16 * 1000; // us
-    memory[idx*2] = create_internal_size;
-    memory[idx*2+1] = create_size - create_internal_size;
-    cpu[idx*2] = feed_cpu_time*1.0/feed_data_time;
-    cpu[idx*2+1] = fetch_cpu_time*1.0/feed_data_time;
-    printf("Internal RAM: %d, PSRAM:%d, feed cpu loading:%f, fetch cpu loading:%f\n", 
-            memory[idx*2], memory[idx*2+1], cpu[idx*2], cpu[idx*2+1]);
+    memory[idx * 2] = create_internal_size;
+    memory[idx * 2 + 1] = create_size - create_internal_size;
+    cpu[idx * 2] = feed_cpu_time * 1.0 / feed_data_time;
+    cpu[idx * 2 + 1] = fetch_cpu_time * 1.0 / feed_data_time;
+    printf("Internal RAM: %d, PSRAM:%d, feed cpu loading:%f, fetch cpu loading:%f\n",
+           memory[idx * 2],
+           memory[idx * 2 + 1],
+           cpu[idx * 2],
+           cpu[idx * 2 + 1]);
    TEST_ASSERT_EQUAL(true, mem_leak < 1000 && end_size == first_end_size);
 }

@ -111,17 +112,22 @@ TEST_CASE(">>>>>>>> AFE create/destroy API & memory leak <<<<<<<<", "[afe]")

    // test all setting
    srmodel_list_t *models = esp_srmodel_init("model");
-    for (int format_id=0; format_id<2; format_id++) {
-        for (int type_id=0; type_id<2; type_id++) {
-            for (int mode_id=0; mode_id<2; mode_id++) {
+    for (int format_id = 0; format_id < 2; format_id++) {
+        for (int type_id = 0; type_id < 2; type_id++) {
+            for (int mode_id = 0; mode_id < 2; mode_id++) {
                for (int aec_init = 0; aec_init < 2; aec_init++) {
                    for (int se_init = 0; se_init < 2; se_init++) {
                        for (int ns_init = 0; ns_init < 2; ns_init++) {
                            for (int vad_init = 0; vad_init < 2; vad_init++) {
                                for (int wakenet_init = 0; wakenet_init < 2; wakenet_init++) {
-                                    printf("format: %s, type: %d, mode: %d, memory size:%d %d\n", 
-                                    input_format[format_id], afe_type[type_id], afe_mode[mode_id], heap_caps_get_free_size(MALLOC_CAP_8BIT), count);
-                                    afe_config_t *afe_config = afe_config_init(input_format[format_id], models, afe_type[type_id], afe_mode[mode_id]);
+                                    printf("format: %s, type: %d, mode: %d, memory size:%d %d\n",
+                                           input_format[format_id],
+                                           afe_type[type_id],
+                                           afe_mode[mode_id],
+                                           heap_caps_get_free_size(MALLOC_CAP_8BIT),
+                                           count);
+                                    afe_config_t *afe_config = afe_config_init(
+                                        input_format[format_id], models, afe_type[type_id], afe_mode[mode_id]);
                                    afe_config->aec_init = aec_init;
                                    afe_config->se_init = se_init;
                                    afe_config->ns_init = ns_init;
@ -138,9 +144,12 @@ TEST_CASE(">>>>>>>> AFE create/destroy API & memory leak <<<<<<<<", "[afe]")
            }
        }
    }
-    for (int idx=0; idx<256; idx++) {
-        printf("Internal RAM: %d, PSRAM:%d, feed cpu loading:%f, fetch cpu loading:%f\n", 
-            memory[idx*2], memory[idx*2+1], cpu[idx*2], cpu[idx*2+1]);
+    for (int idx = 0; idx < 256; idx++) {
+        printf("Internal RAM: %d, PSRAM:%d, feed cpu loading:%f, fetch cpu loading:%f\n",
+               memory[idx * 2],
+               memory[idx * 2 + 1],
+               cpu[idx * 2],
+               cpu[idx * 2 + 1]);
    }
    printf("AFE create/destroy API & memory leak test done\n");
 }
@ -156,12 +165,17 @@ TEST_CASE(">>>>>>>> AFE default setting <<<<<<<<", "[afe_benchmark]")

    // test all setting
    srmodel_list_t *models = esp_srmodel_init("model");
-    for (int format_id=0; format_id<2; format_id++) {
-        for (int type_id=0; type_id<2; type_id++) {
-            for (int mode_id=0; mode_id<2; mode_id++) {
-                printf("format: %s, type: %d, mode: %d, memory size:%d %d\n", 
-                input_format[format_id], afe_type[type_id], afe_mode[mode_id], heap_caps_get_free_size(MALLOC_CAP_8BIT), count);
-                afe_config_t *afe_config = afe_config_init(input_format[format_id], models, afe_type[type_id], afe_mode[mode_id]);
+    for (int format_id = 0; format_id < 2; format_id++) {
+        for (int type_id = 0; type_id < 2; type_id++) {
+            for (int mode_id = 0; mode_id < 2; mode_id++) {
+                printf("format: %s, type: %d, mode: %d, memory size:%d %d\n",
+                       input_format[format_id],
+                       afe_type[type_id],
+                       afe_mode[mode_id],
+                       heap_caps_get_free_size(MALLOC_CAP_8BIT),
+                       count);
+                afe_config_t *afe_config =
+                    afe_config_init(input_format[format_id], models, afe_type[type_id], afe_mode[mode_id]);
                test_afe_by_config(afe_config, 8, memory, cpu, count);
                afe_config_free(afe_config);
                count++;
@ -169,13 +183,18 @@ TEST_CASE(">>>>>>>> AFE default setting <<<<<<<<", "[afe_benchmark]")
        }
    }
    count = 0;
-    for (int format_id=0; format_id<2; format_id++) {
-        for (int type_id=0; type_id<2; type_id++) {
-            for (int mode_id=0; mode_id<2; mode_id++) {
-
-                printf("--------format: %s, type: %s, mode: %s------------\n", input_format[format_id], type_id==0? "SR": "VC", mode_id==0? "LOW_COST": "HIGH_PERF");
-                printf("Internal RAM: %d, PSRAM:%d, feed cpu loading:%f, fetch cpu loading:%f\n", 
-                    memory[count*2], memory[count*2+1], cpu[count*2], cpu[count*2+1]);
+    for (int format_id = 0; format_id < 2; format_id++) {
+        for (int type_id = 0; type_id < 2; type_id++) {
+            for (int mode_id = 0; mode_id < 2; mode_id++) {
+                printf("--------format: %s, type: %s, mode: %s------------\n",
+                       input_format[format_id],
+                       type_id == 0 ? "SR" : "VC",
+                       mode_id == 0 ? "LOW_COST" : "HIGH_PERF");
+                printf("Internal RAM: %d, PSRAM:%d, feed cpu loading:%f, fetch cpu loading:%f\n",
+                       memory[count * 2],
+                       memory[count * 2 + 1],
+                       cpu[count * 2],
+                       cpu[count * 2 + 1]);
                count++;
            }
        }
@ -183,7 +202,6 @@ TEST_CASE(">>>>>>>> AFE default setting <<<<<<<<", "[afe_benchmark]")
    printf("test done\n");
 }

-
 void test_feed_Task(void *arg)
 {
    afe_task_into_t *afe_task_info = (afe_task_into_t *)arg;
@ -193,13 +211,13 @@ void test_feed_Task(void *arg)
    int feed_chunksize = afe_handle->get_feed_chunksize(afe_data);
    int feed_nch = afe_handle->get_feed_channel_num(afe_data);
    int sample_per_ms = afe_handle->get_samp_rate(afe_data) / 1000;
-    int16_t *i2s_buff = (int16_t *) malloc(feed_chunksize * sizeof(int16_t) * feed_nch);
+    int16_t *i2s_buff = (int16_t *)malloc(feed_chunksize * sizeof(int16_t) * feed_nch);
    assert(i2s_buff);
    ESP_LOGI(TAG, "feed task start\n");
    int count = 0;

    while (1) {
-        count ++;
+        count++;
        afe_handle->feed(afe_data, i2s_buff);
        vTaskDelay((feed_chunksize / sample_per_ms) / portTICK_PERIOD_MS);
        if (count > 100) {
@ -222,7 +240,7 @@ void test_fetch_Task(void *arg)
    detect_cnt = 0;
    fetch_task_flag = 1;
    while (1) {
-        afe_fetch_result_t* res = afe_handle->fetch(afe_data); 
+        afe_fetch_result_t *res = afe_handle->fetch(afe_data);
        if (!res || res->ret_value == ESP_FAIL) {
            break;
        }
@ -247,7 +265,7 @@ TEST_CASE("afe performance test (1ch)", "[afe_perf]")
    // test all setting
    srmodel_list_t *models = esp_srmodel_init("model");

-    for (int mode_id=0; mode_id<2; mode_id++) {
+    for (int mode_id = 0; mode_id < 2; mode_id++) {
        afe_config_t *afe_config = afe_config_init(input_format, models, afe_type, afe_model[mode_id]);
        if (afe_config->wakenet_init && afe_config->wakenet_model_name) {
            esp_afe_sr_iface_t *afe_handle = esp_afe_handle_from_config(afe_config);
@ -258,8 +276,10 @@ TEST_CASE("afe performance test (1ch)", "[afe_perf]")
            task_info.feed_task = NULL;
            task_info.fetch_task = NULL;
            fetch_task_flag = 1;
-            xTaskCreatePinnedToCore(test_feed_Task, "feed_task", 8 * 1024, (void *)(&task_info), 5, &task_info.feed_task, 0);
-            xTaskCreatePinnedToCore(test_fetch_Task, "fetch_task", 8 * 1024, (void *)(&task_info), 5, &task_info.fetch_task, 0);
+            xTaskCreatePinnedToCore(
+                test_feed_Task, "feed_task", 8 * 1024, (void *)(&task_info), 5, &task_info.feed_task, 0);
+            xTaskCreatePinnedToCore(
+                test_fetch_Task, "fetch_task", 8 * 1024, (void *)(&task_info), 5, &task_info.fetch_task, 0);
            while (fetch_task_flag) {
                vTaskDelay(32 / portTICK_PERIOD_MS);
            }
@ -278,7 +298,7 @@ TEST_CASE("afe performance test (2ch)", "[afe_perf]")
    // test all setting
    srmodel_list_t *models = esp_srmodel_init("model");

-    for (int mode_id=0; mode_id<2; mode_id++) {
+    for (int mode_id = 0; mode_id < 2; mode_id++) {
        afe_config_t *afe_config = afe_config_init(input_format, models, afe_type, afe_model[mode_id]);
        if (afe_config->wakenet_init && afe_config->wakenet_model_name) {
            esp_afe_sr_iface_t *afe_handle = esp_afe_handle_from_config(afe_config);
@ -289,8 +309,10 @@ TEST_CASE("afe performance test (2ch)", "[afe_perf]")
            task_info.feed_task = NULL;
            task_info.fetch_task = NULL;
            fetch_task_flag = 1;
-            xTaskCreatePinnedToCore(&test_feed_Task, "feed_task", 8 * 1024, (void *)(&task_info), 5, &task_info.feed_task, 0);
-            xTaskCreatePinnedToCore(&test_fetch_Task, "fetch_task", 8 * 1024, (void *)(&task_info), 5, &task_info.fetch_task, 0);
+            xTaskCreatePinnedToCore(
+                &test_feed_Task, "feed_task", 8 * 1024, (void *)(&task_info), 5, &task_info.feed_task, 0);
+            xTaskCreatePinnedToCore(
+                &test_fetch_Task, "fetch_task", 8 * 1024, (void *)(&task_info), 5, &task_info.fetch_task, 0);
            while (fetch_task_flag) {
                vTaskDelay(32 / portTICK_PERIOD_MS);
            }
@ -300,23 +322,62 @@ TEST_CASE("afe performance test (2ch)", "[afe_perf]")
    esp_srmodel_deinit(models);
 }

-
 TEST_CASE("test afe aec interface", "[afe]")
 {
    int start_size = heap_caps_get_free_size(MALLOC_CAP_8BIT);

-    afe_aec_handle_t *handle = afe_aec_create("MNR", 4, AFE_TYPE_SR, AFE_MODE_HIGH_PERF);
-    int frame_bytes = handle->frame_size * sizeof(int16_t);
-    int16_t *indata = (int16_t *) malloc(frame_bytes*handle->pcm_config.total_ch_num);
-    int16_t *outdata = (int16_t *) malloc(frame_bytes);
+    afe_aec_handle_t *afe_aec_handle = afe_aec_create("MNR", 4, AFE_TYPE_SR, AFE_MODE_LOW_COST);
+    aec_handle_t *aec_handle = aec_create(16000, 4, 1, AEC_MODE_SR_LOW_COST);
+    int frame_size = afe_aec_handle->frame_size;
+    int nch = afe_aec_handle->pcm_config.total_ch_num;
+    int mic_idx = afe_aec_handle->pcm_config.mic_ids[0];
+    int ref_idx = afe_aec_handle->pcm_config.ref_ids[0];
+    int frame_bytes = frame_size * sizeof(int16_t);
+    int16_t *afe_indata = (int16_t *)heap_caps_calloc(1, frame_bytes * nch, MALLOC_CAP_SPIRAM);
+    int16_t *indata = (int16_t *)heap_caps_aligned_calloc(16, 1, frame_bytes, MALLOC_CAP_SPIRAM);
+    int16_t *refdata = (int16_t *)heap_caps_aligned_calloc(16, 1, frame_bytes, MALLOC_CAP_SPIRAM);
+    int16_t *outdata1 = (int16_t *)heap_caps_aligned_calloc(16, 1, frame_bytes, MALLOC_CAP_SPIRAM);
+    int16_t *outdata2 = (int16_t *)heap_caps_aligned_calloc(16, 1, frame_bytes, MALLOC_CAP_SPIRAM);
+    int chunks = 0;
+    uint32_t c0, c1, t_aec = 0, t_afe_aec = 0;

-    afe_aec_process(handle, indata, outdata);
-    afe_aec_process(handle, indata, outdata);
-    afe_aec_process(handle, indata, outdata);
+    while (1) {
+        if ((chunks + 1) * frame_bytes <= sizeof(audio_mic_file)) {
+            memcpy(indata, audio_mic_file + chunks * frame_size, frame_bytes);
+            memcpy(refdata, audio_ref_file + chunks * frame_size, frame_bytes);

-    afe_aec_destroy(handle);
+            for (int i = 0; i < frame_size; i++) {
+                afe_indata[i * nch + mic_idx] = indata[i];
+                afe_indata[i * nch + ref_idx] = refdata[i];
+            }
+        } else {
+            break;
+        }
+
+        c0 = esp_timer_get_time();
+        afe_aec_process(afe_aec_handle, afe_indata, outdata1);
+        c1 = esp_timer_get_time();
+        t_afe_aec += c1 - c0;
+
+        c0 = esp_timer_get_time();
+        aec_process(aec_handle, indata, refdata, outdata2);
+        c1 = esp_timer_get_time();
+
+        t_aec += c1 - c0;
+        chunks++;
+    }
+
+    for (int i = 0; i < frame_size; i++) {
+        assert(outdata1[i] == outdata2[i]);
+    }
+    printf("afe aec interface:%d\n, aec interface:%d\n", t_afe_aec, t_aec);
+    afe_aec_destroy(afe_aec_handle);
+    aec_destroy(aec_handle);
+    free(afe_indata);
    free(indata);
-    free(outdata);
+    free(refdata);
+    free(outdata1);
+    free(outdata2);
    int end_size = heap_caps_get_free_size(MALLOC_CAP_8BIT);
    TEST_ASSERT_EQUAL(true, end_size == start_size);
-}
+}