mirror of
https://github.com/espressif/esp-sr.git
synced 2025-09-15 15:28:44 +08:00
docs(AFE): Update AFE docs
This commit is contained in:
parent
a2bc8a64d9
commit
04ae02fea2
212
docs/audio_front_end/README.md
Normal file → Executable file
212
docs/audio_front_end/README.md
Normal file → Executable file
@ -20,10 +20,10 @@ The workflow of Espressif AFE can be divided into four parts:
|
||||
|
||||
- AFE creation and initialization
|
||||
- AFE feed: Input audio data and will run AEC in the feed function
|
||||
- Internal BSS, NS algorithms
|
||||
- Internal BSS/NS algorithms
|
||||
- AFE fetch: Return the audio data after processing and the output value. the AFE fetch will perform VAD internally. If you configure WakeNet to be 'enabled', WakeNet wil do wake-word detection
|
||||
|
||||
**Note:** `afe->feed()` and `afe->fetch()` are visible to users, while `internal BSS task` is invisible to users.
|
||||
**Note:** `afe->feed()` and `afe->fetch()` are visible to users, while `internal BSS/NS task` is invisible to users.
|
||||
|
||||
> AEC runs in `afe->feed()` function;
|
||||
> BSS is an independent task in AFE;
|
||||
@ -41,45 +41,12 @@ Espressif AFE supports both single MIC and dual MIC scenarios. The internal task
|
||||
|
||||
esp_afe_sr_iface_t *afe_handle = &esp_afe_sr_2mic;
|
||||
|
||||
### Select AFE mode
|
||||
|
||||
- Single MIC
|
||||
|
||||
Espressif AFE single MIC supports 2 working modes: SR_MODE_MONO_LOW_COST, SR_MODE_MONO_MEDIUM_COST.
|
||||
|
||||
- SR_MODE_MONO_LOW_COST
|
||||
|
||||
It is suitable for mono audio data + one reference audio data, with very low memory consumption and CPU resource consumption. It runs less-complex AEC and less-complex mono NS algorithm.
|
||||
|
||||
- SR_MODE_MONO_MEDIUM_COST
|
||||
|
||||
It is suitable for mono audio data + one reference audio data, with low memory consumption and CPU resource consumption. It runs less-complex AEC and medium-complex mono NS algorithm.
|
||||
|
||||
|
||||
- Dual MIC
|
||||
|
||||
Espressif AFE dual MIC supports 3 working modes: SR_MODE_STEREO_LOW_COST, SR_MODE_STEREO_MEDIUM, SR_MODE_STEREO_HIGH_PERF.
|
||||
|
||||
- SR_MODE_STEREO_LOW_COST
|
||||
|
||||
It is suitable for two-channel audio data + one reference audio data, it runs less-complex AEC and less-complex BSS.
|
||||
|
||||
- SR_MODE_STEREO_MEDIUM
|
||||
|
||||
It is suitable for two-channel audio data + one reference audio data, it runs high-complexity AEC and less-complex BSS.
|
||||
|
||||
- SR_MODE_STEREO_HIGH_PERF
|
||||
|
||||
It is suitable for two-channel audio data + one reference audio data, it runs high-complexity AEC and high-complexity BSS.
|
||||
|
||||
### Input Audio data
|
||||
|
||||
- AFE single MIC
|
||||
|
||||
- Input audio data format: 16KHz, 16bit, two channels (one is mic data, another is reference data)
|
||||
- The data frame length is 16ms. Users can use `afe->get_feed_chunksize()` to get the number of sampling points needed (the data type of sampling points is int16).
|
||||
|
||||
**Note**: the number of sampling points got by `afe->get_feed_chunksize()` is just the data of one channel.
|
||||
- The data frame length is 32ms. Users can use `afe->get_feed_chunksize()` to get the number of sampling points needed (the data type of sampling points is int16).
|
||||
|
||||
The input data is arranged as follows:
|
||||
|
||||
@ -88,16 +55,17 @@ Espressif AFE supports both single MIC and dual MIC scenarios. The internal task
|
||||
- AFE dual MIC
|
||||
|
||||
- Input audio data format: 16KHz, 16bit, three channels (two are mic data, another is reference data)
|
||||
- The data frame length is 16ms. Users can use `afe->get_feed_chunksize()` to get the number of sampling points needed (the data type of sampling points is int16).
|
||||
- The data frame length is 32ms. Users can use `afe->get_feed_chunksize()` to get the number of sampling points needed (the data type of sampling points is int16).
|
||||
|
||||
The input data is arranged as follows:
|
||||
|
||||
<img src="../img/AFE_mode_other.png" height = "70" align=center />
|
||||
|
||||
Note: the converted data size is: `afe->get_feed_chunksize * channel number * sizeof(short)`
|
||||
|
||||
### AEC Introduction
|
||||
|
||||
The AEC (Acoustic Echo Cancellation) algorithm supports maximum two-channel processing, which can effectively remove the echo in the mic input signal, and help with further speech recognition.
|
||||
The AEC (Acoustic Echo Cancellation) algorithm supports maximum two-mic processing, which can effectively remove the echo in the mic input signal, and help with further speech recognition.
|
||||
|
||||
### NS (noise suppression)
|
||||
|
||||
@ -125,7 +93,7 @@ The output audio of AFE is single-channel data. When WakeNet is enabled, AFE wil
|
||||
|
||||
### 1. Define afe_handle
|
||||
|
||||
`afe_handle` is the handle of AFE operation. Users need to select the corresponding AFE handle according to the single MIC and dual MIC applications.
|
||||
`afe_handle ` is the function handle that the user calls the AFE interface. Users need to select the corresponding AFE handle according to the single MIC and dual MIC applications.
|
||||
|
||||
- Single MIC
|
||||
|
||||
@ -135,102 +103,162 @@ The output audio of AFE is single-channel data. When WakeNet is enabled, AFE wil
|
||||
|
||||
esp_afe_sr_iface_t *afe_handle = &esp_afe_sr_2mic;
|
||||
|
||||
### 2. Create afe_handle
|
||||
### 2. Configure AFE
|
||||
|
||||
Use `afe_handle->create()` function to initialize the AFE created in step1.
|
||||
Get the configuration of AFE:
|
||||
|
||||
afe_config_t afe_config = AFE_CONFIG_DEFAULT();
|
||||
|
||||
Users can adjust the switch of each algorithm module and its corresponding parameters in macros ` AFE_ CONFIG_ DEFAULT ()`:
|
||||
|
||||
```
|
||||
typedef esp_afe_sr_data_t* (*esp_afe_sr_iface_op_create_t)(afe_sr_mode_t mode, int perferred_core);
|
||||
|
||||
- param mode The mode of AFE_SR
|
||||
- param perferred_core The perferred core to be pinned for BSS Task.
|
||||
- returns Handle to the AFE_SR data
|
||||
#define AFE_CONFIG_DEFAULT() { \
|
||||
.aec_init = true, \
|
||||
.se_init = true, \
|
||||
.vad_init = true, \
|
||||
.wakenet_init = true, \
|
||||
.vad_mode = 3, \
|
||||
.wakenet_model = &WAKENET_MODEL, \
|
||||
.wakenet_coeff = &WAKENET_COEFF, \
|
||||
.wakenet_mode = DET_MODE_2CH_90, \
|
||||
.afe_mode = SR_MODE_HIGH_PERF, \
|
||||
.afe_perferred_core = 0, \
|
||||
.afe_perferred_priority = 5, \
|
||||
.afe_ringbuf_size = 50, \
|
||||
.alloc_from_psram = 1, \
|
||||
.agc_mode = 2, \
|
||||
}
|
||||
```
|
||||
|
||||
There are two parameters used as above. Users can set different AFE modes and the number of CPU cores for BSS task in AFE according to the actual application requirements.
|
||||
- aec_init: Whether the AEC algorithm is enabled.
|
||||
|
||||
**Note**: ESP32 audio development board, such as ESP32-Lyrat_Mini, AFE mode can only select `SR_MODE_MONO_LOW_COST` or `SR_MODE_MONO_MEDIUM_COST`.
|
||||
- se_init: Whether the BSS/NS algorithm is enabled.
|
||||
|
||||
### 3. Set WakeNet
|
||||
- vad_init: Whether the VAD algorithm is enabled.
|
||||
|
||||
Two steps to set up WakeNet:
|
||||
- wakenet_init: Whether the wake algorithm is enabled.
|
||||
|
||||
- Use `make menuconfig` to choose WakeNet model. Please refer to: [WakeNet](https://github.com/espressif/esp-sr/tree/b9504e35485b60524977a8df9ff448ca89cd9d62/wake_word_engine)
|
||||
- Call `afe_handle->set_wakenet(afe_data, &WAKENET_MODEL, &WAKENET_COEFF);` to initialize WakeNet.
|
||||
- vad_mode: The VAD operating mode. A more aggressive (higher mode) VAD is more.
|
||||
|
||||
- wakenet_model/wakenet_coeff/wakenet_mode: Use `make menuconfig` to choose WakeNet model. Please refer to:[WakeNet](https://github.com/espressif/esp-sr/tree/b9504e35485b60524977a8df9ff448ca89cd9d62/wake_word_engine)
|
||||
|
||||
- afe_mode: Espressif AFE supports two working modes: SR_MODE_LOW_COST, SR_MODE_HIGH_PERF. See the afe_sr_mode_t enumeration for details.
|
||||
|
||||
- SR_MODE_LOW_COST: The quantified version occupies less resources.
|
||||
|
||||
- SR_MODE_HIGH_PERF: The non-quantified version occupies more resources.
|
||||
|
||||
**ESP32 only supports SR_MODE_HIGH_PERF;
|
||||
And ESP32S3 supports both of the modes **
|
||||
|
||||
- afe_perferred_core: The internal BSS/NS algorithm of AFE will be running on which CPU core.
|
||||
|
||||
- afe_ringbuf_size: The configuration of the internal ringbuf size.
|
||||
|
||||
- alloc_from_psram: Whether to allocate memory from external psram first. Three values can be configured:
|
||||
|
||||
- 0: Allocated from internal ram.
|
||||
|
||||
- 1: Part of memory is allocated from external psram.
|
||||
|
||||
- 2: Most of memory is allocated from external psram.
|
||||
|
||||
- agc_mode: Configuration for linear audio amplification.
|
||||
|
||||
### 3. Create afe_data
|
||||
|
||||
The user uses the `afe_handle->create_from_config(&afe_config)` function to obtain the data handle, which will be used internally in afe, and the parameters passed in are the configurations obtained in step 2 above.
|
||||
|
||||
```
|
||||
/**
|
||||
* @brief Function to initialze a AFE_SR instance
|
||||
*
|
||||
* @param afe_config The config of AFE_SR
|
||||
* @returns Handle to the AFE_SR data
|
||||
*/
|
||||
typedef esp_afe_sr_data_t* (*esp_afe_sr_iface_op_create_from_config_t)(afe_config_t *afe_config);
|
||||
```
|
||||
|
||||
### 4. feed audio data
|
||||
|
||||
After initializing AFE and WakeNet, users need to input audio data into AFE by `afe->feed()` function for processing.
|
||||
After initializing AFE and WakeNet, users need to input audio data into AFE by `afe_handle->feed()` function for processing.
|
||||
|
||||
The input audio size and layout format can refer to the step **Input Audio data**.
|
||||
|
||||
```
|
||||
/**
|
||||
* @brief Feed samples of an audio stream to the AFE_SR
|
||||
*
|
||||
*
|
||||
* @param afe The AFE_SR data handle
|
||||
*
|
||||
* @param in The input microphone signal, only support signed 16-bit @ 16 KHZ. The frame size can be queried by the
|
||||
* `get_samp_chunksize`. The channel number can be queried `get_channel_num`.
|
||||
* @return The size of input
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_feed_t)(esp_afe_sr_data_t *afe, const int16_t* in);
|
||||
|
||||
- param afe The AFE_SR object to queryq
|
||||
- param in The input microphone signal, only support signed 16-bit @ 16 KHZ. The frame size can be queried by the `get_samp_chunksize`. The channel number can be queried `get_channel_num`.
|
||||
- return The size of input
|
||||
|
||||
```
|
||||
|
||||
Get the number of audio channels:
|
||||
|
||||
`afe->get_channel_num()` function can provide the number of MIC data channels that need to be put into `afe->feed()` function( Without reference channel).
|
||||
`afe_handle->get_channel_num()` function can provide the number of MIC data channels that need to be put into `afe_handle->feed()` function( Without reference channel).
|
||||
|
||||
```
|
||||
/**
|
||||
* @brief Get the channel number of samples that need to be passed to the fetch function
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The amount of channel number
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_get_channel_num_t)(esp_afe_sr_data_t *afe);
|
||||
- param afe The AFE_SR object to query
|
||||
- return The amount of samples to feed the fetch function
|
||||
```
|
||||
|
||||
### 5. fetch audio data
|
||||
|
||||
Users can get the processed single-channel audio by `afe->fetch()` function.
|
||||
Users can get the processed single-channel audio by `afe_handle->fetch()` function.
|
||||
|
||||
The number of data sampling points of fetch (the data type of sampling point is int16) can be got by `afe->get_fetch_chunksize`.
|
||||
The number of data sampling points of fetch (the data type of sampling point is int16) can be got by `afe_handle->get_fetch_chunksize`.
|
||||
|
||||
```
|
||||
/**
|
||||
* @brief Get the amount of each channel samples per frame that need to be passed to the function
|
||||
*
|
||||
* Every speech enhancement AFE_SR processes a certain number of samples at the same time. This function
|
||||
* can be used to query that amount. Note that the returned amount is in 16-bit samples, not in bytes.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The amount of samples to feed the fetch function
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_get_samp_chunksize_t)(esp_afe_sr_data_t *afe);
|
||||
- param afe The AFE_SR object to query
|
||||
```
|
||||
|
||||
Please pay attention to the return value of `afe->fetch()`:
|
||||
- -1: noise
|
||||
- 0: speech
|
||||
- 1: wake word 1
|
||||
- 2: wake word 2
|
||||
Please pay attention to the return value of `afe_handle->fetch()`:
|
||||
|
||||
- AFE_FETCH_CHANNEL_VERIFIED: Audio channel confirmation (This value is not returned while single microphone wakes up.)
|
||||
- AFE_FETCH_NOISE: Noise detected
|
||||
- AFE_FETCH_SPEECH: Speech detected
|
||||
- AFE_FETCH_WWE_DETECTED: Wakeup detected
|
||||
- ...
|
||||
|
||||
```
|
||||
typedef int (*esp_afe_sr_iface_op_fetch_t)(esp_afe_sr_data_t *afe, int16_t* out);
|
||||
- param afe The AFE_SR object to query
|
||||
- param out The output enhanced signal. The frame size can be queried by the `get_samp_chunksize`.
|
||||
- return The style of output, -1: noise, 0: speech, 1: wake word 1, 2: wake word 2, ...
|
||||
/**
|
||||
* @brief fetch enhanced samples of an audio stream from the AFE_SR
|
||||
*
|
||||
* @Warning The output is single channel data, no matter how many channels the input is.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @param out The output enhanced signal. The frame size can be queried by the `get_samp_chunksize`.
|
||||
* @return The state of output, please refer to the definition of `afe_fetch_mode_t`
|
||||
*/
|
||||
typedef afe_fetch_mode_t (*esp_afe_sr_iface_op_fetch_t)(esp_afe_sr_data_t *afe, int16_t* out);
|
||||
```
|
||||
|
||||
### 6. Usage of WakeNet
|
||||
|
||||
WakeNet in AFE can be used in three ways:
|
||||
When users need to perform other operations after wake-up, such as offline or online speech recognitioafe_handlen, they can pause the operation of WakeNet to reduce the CPU resource consumption.
|
||||
|
||||
- No WakeNet
|
||||
|
||||
Users can choose not to initialize WakeNet if not call:
|
||||
|
||||
afe_handle->set_wakenet(afe_data, &WAKENET_MODEL, &WAKENET_COEFF);
|
||||
|
||||
- Use WakeNet
|
||||
|
||||
Users need to configure the wake word by `make menuconfig` first. Then call:
|
||||
|
||||
afe_handle->set_wakenet(afe_data, &WAKENET_MODEL, &WAKENET_COEFF);
|
||||
|
||||
In this way, you can use `afe->fetch()` to check wake-up status.
|
||||
|
||||
- Disable WakeNet after wake-up:
|
||||
|
||||
When users need to perform other operations after wake-up, such as offline or online speech recognition, they can pause the operation of WakeNet to reduce the CPU resource consumption.
|
||||
|
||||
Users can call `afe->disable_wakenet(afe_data)` to stop WakeNet, or call `afe->enable_wakenet(afe_data)` to enable WakeNet.
|
||||
Users can call `afe_handle->disable_wakenet(afe_data)` to stop WakeNet, or call `afe_handle->enable_wakenet(afe_data)` to enable WakeNet.
|
||||
|
||||
### 7. Usage of AEC
|
||||
|
||||
|
||||
223
docs/audio_front_end/README_CN.md
Normal file → Executable file
223
docs/audio_front_end/README_CN.md
Normal file → Executable file
@ -23,10 +23,10 @@
|
||||
- 内部:AFE BSS/NS 算法处理
|
||||
- AFE fetch,返回处理过的音频数据和返回值, fetch 内部会进行 VAD 处理,如果用户设置 WakeNet 为 enable 状态,也会进行唤醒词的检测
|
||||
|
||||
其中 `afe->feed()` 和 `afe->fetch()` 对用户可见,`Internal BSS Task` 对用户不可见。
|
||||
其中 `afe->feed()` 和 `afe->fetch()` 对用户可见,`Internal BSS/NS Task` 对用户不可见。
|
||||
|
||||
> AEC 在 afe->feed() 函数中运行;
|
||||
> BSS 为 AFE 内部独立 Task 进行处理;
|
||||
> BSS/NS 为 AFE 内部独立 Task 进行处理;
|
||||
> VAD 和 WakeNet 的结果通过 afe->fetch() 函数中获取。
|
||||
|
||||
### 选择 AFE handle
|
||||
@ -41,63 +41,31 @@
|
||||
|
||||
esp_afe_sr_iface_t *afe_handle = &esp_afe_sr_2mic;
|
||||
|
||||
### 选择 AFE mode
|
||||
|
||||
- 单麦
|
||||
|
||||
乐鑫 AFE 单麦场景目前支持 2 种工作模式,分别为:SR_MODE_MONO_LOW_COST, SR_MODE_MONO_MEDIUM_COST.
|
||||
详细可见 afe_sr_mode_t 结构体。
|
||||
|
||||
- SR_MODE_MONO_LOW_COST
|
||||
|
||||
适用于单通道音频数据+一路回采数据,具有很低的内存消耗和 CPU 资源消耗,此时运行低复杂度 AEC 和低复杂度降噪算法。
|
||||
|
||||
- SR_MODE_MONO_MEDIUM_COST
|
||||
|
||||
适用于单通道音频数据+一路回采数据,具有较低的内存消耗和 CPU 资源消耗,此时运行低复杂度 AEC 和中等复杂度降噪算法。
|
||||
|
||||
- 双麦
|
||||
|
||||
乐鑫 AFE 双麦场景目前支持 3 种工作模式,分别为:SR_MODE_STEREO_LOW_COST, SR_MODE_STEREO_MEDIUM, SR_MODE_STEREO_HIGH_PERF.
|
||||
详细可见 afe_sr_mode_t 结构体。
|
||||
|
||||
- SR_MODE_STEREO_LOW_COST
|
||||
|
||||
适用于双通道音频数据 + 一路回采数据,AEC 采用复杂度较低的算法, BSS 采用低复杂度算法
|
||||
|
||||
- SR_MODE_STEREO_MEDIUM
|
||||
|
||||
适用于双通道音频数据 + 一路回采数据,AEC 采用复杂度较高的算法, BSS 采样低复杂度算法
|
||||
|
||||
- SR_MODE_STEREO_HIGH_PERF
|
||||
|
||||
适用于双通道音频数据 + 一路回采数据,AEC 和 BSS 均采用复杂度较高的模式
|
||||
|
||||
### 输入音频
|
||||
|
||||
- 当 AFE 单麦场景
|
||||
- AFE 单麦场景
|
||||
|
||||
- 输入音频格式为 16KHz, 16bit, 双通道(1个通道为 mic 数据,另一个通道为参考回路)
|
||||
- 数据帧长为 16ms, 用户可以使用 `afe->get_feed_chunksize` 来获取需要的采样点数目(采样点数据类型为 int16)
|
||||
|
||||
注意:此处得到数据量大小为单通道音频。
|
||||
- 输入音频格式为 16KHz, 16bit, 双通道(1个通道为 mic 数据,另一个通道为参考回路)
|
||||
- 数据帧长为 32ms, 用户可以使用 `afe->get_feed_chunksize` 来获取需要的采样点数目(采样点数据类型为 int16)
|
||||
|
||||
数据排布如下:
|
||||
|
||||
<img src="../img/AFE_mode_0.png" height = "100" align=center />
|
||||
|
||||
- 当 AFE 双麦场景
|
||||
- AFE 双麦场景
|
||||
|
||||
- 输入音频格式为 16KHz, 16bit, 三通道
|
||||
- 数据帧长为 32ms, 用户可以使用 `afe->get_feed_chunksize` 来获取需要填充的数据量
|
||||
- 输入音频格式为 16KHz, 16bit, 三通道
|
||||
- 数据帧长为 32ms, 用户可以使用 `afe->get_feed_chunksize` 来获取需要填充的数据量
|
||||
|
||||
数据排布如下:
|
||||
|
||||
<img src="../img/AFE_mode_other.png" height = "70" align=center />
|
||||
|
||||
注意:换算成数据量大小为:`afe->get_feed_chunksize * 通道数 * sizeof(short)`
|
||||
|
||||
### AEC 简介
|
||||
|
||||
AEC (Acoustic Echo Cancellation) 算法最多支持双通道处理,能够有效的去除 mic 输入信号中的自身播放回声。从而可以在自身播放音乐的情况下进行很好的语音识别等应用。
|
||||
AEC (Acoustic Echo Cancellation) 算法最多支持双麦处理,能够有效的去除 mic 输入信号中的自身播放声音。从而可以在自身播放音乐的情况下进行很好的语音识别等应用。
|
||||
|
||||
### NS 简介
|
||||
|
||||
@ -125,7 +93,7 @@ AFE 的输出音频为单通道数据,在 WakeNet 开启的情况下,AFE 会
|
||||
|
||||
### 1. 定义 afe_handle
|
||||
|
||||
`afe_handle` 是用户后续使用 afe 操作的相关句柄。用户需要根据单麦和双麦场景选择对应的 `afe_handle`。
|
||||
`afe_handle` 是用户后续调用 afe 接口的函数句柄。用户需要根据单麦和双麦场景选择对应的 `afe_handle`。
|
||||
|
||||
单麦场景:
|
||||
|
||||
@ -135,104 +103,163 @@ AFE 的输出音频为单通道数据,在 WakeNet 开启的情况下,AFE 会
|
||||
|
||||
esp_afe_sr_iface_t *afe_handle = &esp_afe_sr_2mic;
|
||||
|
||||
### 2. 创建 afe_handle
|
||||
### 2. 配置 afe
|
||||
|
||||
用户使用 `afe_handle->create()` 函数来初始化在第一步中创建的 `afe_handle`。
|
||||
获取 afe 的配置:
|
||||
|
||||
afe_config_t afe_config = AFE_CONFIG_DEFAULT();
|
||||
|
||||
可在宏`AFE_CONFIG_DEFAULT()`中调整各算法模块的使能及其相应参数:
|
||||
|
||||
```
|
||||
typedef esp_afe_sr_data_t* (*esp_afe_sr_iface_op_create_t)(afe_sr_mode_t mode, int perferred_core);
|
||||
|
||||
- param mode The mode of AFE_SR
|
||||
- param perferred_core The perferred core to be pinned for BSS Task.
|
||||
- returns Handle to the AFE_SR data
|
||||
#define AFE_CONFIG_DEFAULT() { \
|
||||
.aec_init = true, \
|
||||
.se_init = true, \
|
||||
.vad_init = true, \
|
||||
.wakenet_init = true, \
|
||||
.vad_mode = 3, \
|
||||
.wakenet_model = &WAKENET_MODEL, \
|
||||
.wakenet_coeff = &WAKENET_COEFF, \
|
||||
.wakenet_mode = DET_MODE_2CH_90, \
|
||||
.afe_mode = SR_MODE_HIGH_PERF, \
|
||||
.afe_perferred_core = 0, \
|
||||
.afe_perferred_priority = 5, \
|
||||
.afe_ringbuf_size = 50, \
|
||||
.alloc_from_psram = 1, \
|
||||
.agc_mode = 2, \
|
||||
}
|
||||
```
|
||||
|
||||
调用 `afe_handle->create()` 时使用的两个形参如上。用户可以根据实际应用的需求来设置不同的 AFE 模式和 AFE 内部 BSS Task 运行的 CPU 核数。
|
||||
- aec_init: AEC 算法是否使能。
|
||||
|
||||
注意:ESP32 系列的音频开发板,例如 ESP32-LyraT-Mini,AFE 模式只能选择 `SR_MODE_MONO_LOW_COST` 或者 `SR_MODE_MONO_MEDIUM_COST`, 即单通道模式。
|
||||
- se_init: BSS/NS 算法是否使能。
|
||||
|
||||
### 3. 设置 WakeNet
|
||||
- vad_init: VAD 是否使能。
|
||||
|
||||
对用户而言,设置 WakeNet 可以分为两步:
|
||||
- 使用 `make menuconfig` 来选择相应的唤醒模型,详见:[WakeNet](https://github.com/espressif/esp-sr/tree/b9504e35485b60524977a8df9ff448ca89cd9d62/wake_word_engine)
|
||||
- wakenet_init: 唤醒是否使能。
|
||||
|
||||
- 调用 `afe_handle->set_wakenet(afe_data, &WAKENET_MODEL, &WAKENET_COEFF);` 来初始化 WakeNet.
|
||||
- vad_mode: VAD 检测的操作模式,越大越激进。
|
||||
|
||||
- wakenet_model/wakenet_coeff/wakenet_mode: 使用 `make menuconfig` 来选择相应的唤醒模型,详见:[WakeNet](https://github.com/espressif/esp-sr/tree/b9504e35485b60524977a8df9ff448ca89cd9d62/wake_word_engine)
|
||||
|
||||
- afe_mode: 乐鑫 AFE 目前支持 2 种工作模式,分别为:SR_MODE_LOW_COST, SR_MODE_HIGH_PERF。详细可见 afe_sr_mode_t 枚举。
|
||||
|
||||
- SR_MODE_LOW_COST: 量化版本,占用资源较少。
|
||||
|
||||
- SR_MODE_HIGH_PERF: 非量化版本,占用资源较多。
|
||||
|
||||
**ESP32 芯片,只支持模式 SR_MODE_HIGH_PERF;
|
||||
ESP32S3 芯片,两种模式均支持 **
|
||||
|
||||
- afe_perferred_core: AFE 内部 BSS/NS 算法,运行在哪个 CPU 核。
|
||||
|
||||
- afe_ringbuf_size: 内部 ringbuf 大小的配置。
|
||||
|
||||
- alloc_from_psram: 是否优先从外部 psram 分配内存。可配置三个值:
|
||||
|
||||
- 0: 从内部ram分配。
|
||||
|
||||
- 1: 部分从外部psram分配。
|
||||
|
||||
- 2: 绝大部分从外部psram分配
|
||||
|
||||
- agc_mode: 将音频线性放大的 level 配置([0,3]),0 表示无放大
|
||||
|
||||
### 3. 创建 afe_data
|
||||
|
||||
用户使用 `afe_handle->create_from_config(&afe_config)` 函数来获得数据句柄,这将会在afe内部使用,传入的参数即为上面第2步中获得的配置。
|
||||
|
||||
```
|
||||
/**
|
||||
* @brief Function to initialze a AFE_SR instance
|
||||
*
|
||||
* @param afe_config The config of AFE_SR
|
||||
* @returns Handle to the AFE_SR data
|
||||
*/
|
||||
typedef esp_afe_sr_data_t* (*esp_afe_sr_iface_op_create_from_config_t)(afe_config_t *afe_config);
|
||||
|
||||
```
|
||||
|
||||
### 4. feed 音频数据
|
||||
|
||||
在初始化 AFE 和 WakeNet 完成后,用户需要将音频数据使用 `afe->feed()` 函数输入到 AFE 中进行处理。
|
||||
在初始化 AFE 和 WakeNet 完成后,用户需要将音频数据使用 `afe_handle->feed()` 函数输入到 AFE 中进行处理。
|
||||
|
||||
输入的音频大小和排布格式可以参考 **输入音频** 这一步骤。
|
||||
|
||||
```
|
||||
/**
|
||||
* @brief Feed samples of an audio stream to the AFE_SR
|
||||
*
|
||||
*
|
||||
* @param afe The AFE_SR data handle
|
||||
*
|
||||
* @param in The input microphone signal, only support signed 16-bit @ 16 KHZ. The frame size can be queried by the
|
||||
* `get_samp_chunksize`. The channel number can be queried `get_channel_num`.
|
||||
* @return The size of input
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_feed_t)(esp_afe_sr_data_t *afe, const int16_t* in);
|
||||
|
||||
- param afe The AFE_SR object to queryq
|
||||
- param in The input microphone signal, only support signed 16-bit @ 16 KHZ. The frame size can be queried by the `get_samp_chunksize`. The channel number can be queried `get_channel_num`.
|
||||
- return The size of input
|
||||
|
||||
```
|
||||
|
||||
获取音频通道数:
|
||||
|
||||
使用 `afe->get_channel_num()` 函数可以获取需要传入 `afe->feed()` 函数的 mic 数据通道数。(不含参考回路通道)
|
||||
使用 `afe_handle->get_channel_num()` 函数可以获取需要传入 `afe_handle->feed()` 函数的 mic 数据通道数。(不含参考回路通道)
|
||||
|
||||
```
|
||||
/**
|
||||
* @brief Get the channel number of samples that need to be passed to the fetch function
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The amount of channel number
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_get_channel_num_t)(esp_afe_sr_data_t *afe);
|
||||
- param afe The AFE_SR object to query
|
||||
- return The amount of samples to feed the fetch function
|
||||
```
|
||||
|
||||
### 5. fetch 音频数据
|
||||
|
||||
用户调用 `afe->fetch()` 函数可以获取处理完成的单通道音频。
|
||||
用户调用 `afe_handle->fetch()` 函数可以获取处理完成的单通道音频。
|
||||
|
||||
fetch 的数据采样点数目(采样点数据类型为 int16)可以通过 `afe->get_fetch_chunksize` 获取。
|
||||
fetch 的数据采样点数目(采样点数据类型为 int16)可以通过 `afe_handle->get_fetch_chunksize` 获取。
|
||||
|
||||
```
|
||||
/**
|
||||
* @brief Get the amount of each channel samples per frame that need to be passed to the function
|
||||
*
|
||||
* Every speech enhancement AFE_SR processes a certain number of samples at the same time. This function
|
||||
* can be used to query that amount. Note that the returned amount is in 16-bit samples, not in bytes.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The amount of samples to feed the fetch function
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_get_samp_chunksize_t)(esp_afe_sr_data_t *afe);
|
||||
- param afe The AFE_SR object to query
|
||||
- param out The output enhanced signal. The frame size can be queried by the `get_samp_chunksize`.
|
||||
- return The style of output, -1: noise, 0: speech, 1: wake word 1, 2: wake word 2, ...
|
||||
```
|
||||
|
||||
用户需要注意 `afe->fetch()` 的返回值:
|
||||
- -1: noise
|
||||
- 0: speech
|
||||
- 1: wake word 1
|
||||
- 2: wake word 2
|
||||
用户需要注意 `afe_handle->fetch()` 的返回值:
|
||||
|
||||
- AFE_FETCH_CHANNEL_VERIFIED: 音频通道确认 (单麦唤醒,不返回该值)
|
||||
- AFE_FETCH_NOISE: 侦测到噪声
|
||||
- AFE_FETCH_SPEECH: 侦测到语音
|
||||
- AFE_FETCH_WWE_DETECTED: 侦测到唤醒词
|
||||
- ...
|
||||
|
||||
```
|
||||
typedef int (*esp_afe_sr_iface_op_fetch_t)(esp_afe_sr_data_t *afe, int16_t* out);
|
||||
- param afe The AFE_SR object to query
|
||||
- param out The output enhanced signal. The frame size can be queried by the `get_samp_chunksize`.
|
||||
- return The style of output, -1: noise, 0: speech, 1: wake word 1, 2: wake word 2, ...
|
||||
/**
|
||||
* @brief fetch enhanced samples of an audio stream from the AFE_SR
|
||||
*
|
||||
* @Warning The output is single channel data, no matter how many channels the input is.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @param out The output enhanced signal. The frame size can be queried by the `get_samp_chunksize`.
|
||||
* @return The state of output, please refer to the definition of `afe_fetch_mode_t`
|
||||
*/
|
||||
typedef afe_fetch_mode_t (*esp_afe_sr_iface_op_fetch_t)(esp_afe_sr_data_t *afe, int16_t* out);
|
||||
```
|
||||
|
||||
### 6. WakeNet 使用
|
||||
|
||||
用户使用 AFE 中 WakeNet 大体可以分为以下三种情况:
|
||||
|
||||
- 不使用 WakeNet
|
||||
|
||||
当用户不使用 WakeNet 时可以选择不初始化 WakeNet,即不需要调用:
|
||||
|
||||
afe_handle->set_wakenet(afe_data, &WAKENET_MODEL, &WAKENET_COEFF);
|
||||
|
||||
- 使用 WakeNet
|
||||
|
||||
用户使用 WakeNet 则需要先使用 `make menuconfig` 来配置相应的唤醒词信息。然后调用:
|
||||
|
||||
afe_handle->set_wakenet(afe_data, &WAKENET_MODEL, &WAKENET_COEFF);
|
||||
|
||||
则可以通过 `afe->fetch()` 函数来获取是否识别到唤醒词。
|
||||
|
||||
- 使用 WakeNet 但是在唤醒后暂时停止 WakeNet
|
||||
|
||||
当用户在唤醒后需要进行其他操作,比如离线或在线语音识别,这时候可以暂停 WakeNet 的运行,从而减轻 CPU 的资源消耗。
|
||||
|
||||
用户可以调用 `afe->disable_wakenet(afe_data)` 来停止 WakeNet。 当后续应用结束后又可以调用 `afe->enable_wakenet(afe_data)` 来开启 WakeNet。
|
||||
用户可以调用 `afe_handle->disable_wakenet(afe_data)` 来停止 WakeNet。 当后续应用结束后又可以调用 `afe_handle->enable_wakenet(afe_data)` 来开启 WakeNet。
|
||||
|
||||
### 7. AEC 使用
|
||||
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 34 KiB After Width: | Height: | Size: 47 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 20 KiB After Width: | Height: | Size: 28 KiB |
Loading…
Reference in New Issue
Block a user