mirror of
https://github.com/espressif/esp-sr.git
synced 2025-09-15 15:28:44 +08:00
testing
This commit is contained in:
commit
fa9cfb1849
6
.gitignore
vendored
6
.gitignore
vendored
@ -6,6 +6,7 @@ include/sdkconfig.h
|
||||
build/
|
||||
sdkconfig.old
|
||||
sdkconfig
|
||||
<<<<<<< HEAD
|
||||
.DS_Store
|
||||
|
||||
*.pyc
|
||||
@ -24,3 +25,8 @@ docs/doxygen_sqlite3.db
|
||||
# Downloaded font files
|
||||
docs/_static/DejaVuSans.ttf
|
||||
docs/_static/NotoSansSC-Regular.otf
|
||||
=======
|
||||
model/target/*
|
||||
.vscode
|
||||
docs/_build/*
|
||||
>>>>>>> 0981bc8425d6cace35ebb73789265a1c2e14dc92
|
||||
|
||||
Binary file not shown.
BIN
docs/_static/QR_Dilated_Convolution.png
vendored
Normal file
BIN
docs/_static/QR_Dilated_Convolution.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 6.1 KiB |
BIN
docs/_static/QR_MFCC.png
vendored
Normal file
BIN
docs/_static/QR_MFCC.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 6.5 KiB |
BIN
docs/_static/QR_multinet_g2p.png
vendored
Normal file
BIN
docs/_static/QR_multinet_g2p.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 6.7 KiB |
@ -23,7 +23,6 @@ ESP32_DOCS = ['audio_front_end/README.rst',
|
||||
'wake_word_engine/README.rst',
|
||||
'wake_word_engine/ESP_Wake_Words_Customization.rst',
|
||||
'speech_command_recognition/README.rst',
|
||||
'acoustic_algorithm/README.rst',
|
||||
'flash_model/README.rst',
|
||||
'audio_front_end/Espressif_Microphone_Design_Guidelines.rst',
|
||||
'test_report/README.rst',
|
||||
|
||||
@ -1,27 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
function convert_md2rst(){
|
||||
for files in $1/$2/*
|
||||
do
|
||||
filename="$(basename -- $files)"
|
||||
echo $filename
|
||||
fname="${filename%.*}"
|
||||
echo $fname
|
||||
echo "converting $fname"
|
||||
pandoc $1/$2/$filename -f markdown -t rst -s -o "$1/$2/${fname}".rst
|
||||
done
|
||||
}
|
||||
|
||||
convert_md2rst en acoustic_algorithm
|
||||
convert_md2rst en audio_front_end
|
||||
convert_md2rst en flash_model
|
||||
convert_md2rst en performance_test
|
||||
convert_md2rst en speech_command_recognition
|
||||
convert_md2rst en wake_word_engine
|
||||
|
||||
convert_md2rst zh_cn acoustic_algorithm
|
||||
convert_md2rst zh_cn audio_front_end
|
||||
convert_md2rst zh_cn flash_model
|
||||
convert_md2rst zh_cn performance_test
|
||||
convert_md2rst zh_cn speech_command_recognition
|
||||
convert_md2rst zh_cn wake_word_engine
|
||||
@ -1,248 +0,0 @@
|
||||
Acoustic Algorithm Introduction
|
||||
===============================
|
||||
|
||||
:link_to_translation:`zh_CN:[中文]`
|
||||
|
||||
Acoustic algorithms provided in esp-sr include voice activity detection(VAD), adaptive gain control (AGC), acoustic echo cancellation (AEC),noise suppression (NS), and mic-array speech enhancement (MASE). VAD, AGC, AEC, and NS are supported with either single-mic and multi-mic development board, MASE is supported with multi-mic board only.
|
||||
|
||||
VAD
|
||||
---
|
||||
|
||||
Overview
|
||||
~~~~~~~~
|
||||
|
||||
VAD takes an audio stream as input, and outputs the prediction that a frame of the stream contains audio or not.
|
||||
|
||||
API Reference
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Header
|
||||
^^^^^^
|
||||
|
||||
- esp_vad.h
|
||||
|
||||
Function
|
||||
^^^^^^^^
|
||||
|
||||
- ``vad_handle_t vad_create(vad_mode_t vad_mode)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Initialization of VAD handle.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- vad_mode: operating mode of VAD, VAD_MODE_0 to VAD_MODE_4, larger value indicates more aggressive VAD.
|
||||
|
||||
**Return**
|
||||
|
||||
Handle to VAD.
|
||||
|
||||
- ``vad_state_t vad_process(vad_handle_t inst, int16_t *data, int sample_rate_hz, int one_frame_ms);``
|
||||
|
||||
**Definition**
|
||||
|
||||
Processing of VAD for one frame.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- inst: VAD handle.
|
||||
- data: buffer to save both input and output audio stream.
|
||||
- sample_rate_hz: The Sampling frequency (Hz) can be 32000, 16000, 8000, default: 16000.
|
||||
- one_frame_ms: The length of the audio processing can be 10ms, 20ms, 30ms, default: 30.
|
||||
|
||||
**Return**
|
||||
|
||||
- VAD_SILENCE if no voice
|
||||
- VAD_SPEECH if voice is detected
|
||||
|
||||
- ``void vad_destroy(vad_handle_t inst)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Destruction of a VAD handle.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- inst: the VAD handle to be destroyed.
|
||||
|
||||
AGC
|
||||
---
|
||||
|
||||
.. _overview-1:
|
||||
|
||||
Overview
|
||||
~~~~~~~~
|
||||
|
||||
AGC keeps the volume of audio signal at a stable level to avoid the situation that the signal is so loud that gets clipped or too quiet to trigger the speech recognizer.
|
||||
|
||||
.. _api-reference-1:
|
||||
|
||||
API Reference
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
- ``void *esp_agc_open(int agc_mode, int sample_rate)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Initialization of AGC handle.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- agc_mode: operating mode of AGC, 3 to enable AGC and 0 to disable it.
|
||||
- sample_rate: sampling rate of audio signal.
|
||||
|
||||
**Return**
|
||||
|
||||
- AGC handle.
|
||||
|
||||
- ``int esp_agc_process(void *agc_handle, short *in_pcm, short *out_pcm, int frame_size, int sample_rate)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Pocessing of AGC for one frame.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- agc_handle: AGC handle.
|
||||
- in_pcm: input audio stream.
|
||||
- out_pcm: output audio stream.
|
||||
- frame_size: signal frame length in ms.
|
||||
- sample_rate: signal sampling rate in Hz.
|
||||
|
||||
**Return**
|
||||
|
||||
Return 0 if AGC processing succeeds, -1 if fails; -2 and -3 indicate invalid input of sample_rate and frame_size, respectively.
|
||||
|
||||
- ``void esp_agc_clse(void *agc_handle)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Destruction of an AGC handle.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- agc_handle: the AGC handle to be destroyed.
|
||||
|
||||
AEC
|
||||
---
|
||||
|
||||
.. _overview-2:
|
||||
|
||||
Overview
|
||||
~~~~~~~~
|
||||
|
||||
AEC suppresses echo of the sound played by the speaker of the board.
|
||||
|
||||
.. _api-reference-2:
|
||||
|
||||
API Reference
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
- ``aec_handle_t aec_create(int sample_rate, int frame_length, int filter_length)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Initialization of AEC handle.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- sample_rate: audio signal sampling rate.
|
||||
- frame_length: audio frame length in ms.
|
||||
- filter_length: the length of adaptive filter in AEC.
|
||||
|
||||
**Return**
|
||||
|
||||
Handle to AEC.
|
||||
|
||||
- ``aec_create_t aec_create_multimic(int sample_rate, int frame_length, int filter_length, int nch)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Initialization of AEC handle.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- sample_rate: audio signal sampling rate.
|
||||
- frame_length: audio frame length in ms.
|
||||
- filter_length: the length of adaptive filter in AEC.
|
||||
- nch: number of channels of the signal to be processed.
|
||||
|
||||
**Return**
|
||||
|
||||
Handle to AEC.
|
||||
|
||||
- ``void aec_process(aec_handle_t inst, int16_t *indata, int16_t *refdata, int16_t *outdata)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Processing of AEC for one frame.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- inst: AEC handle.
|
||||
- indata: input audio stream, which could be single- or multi-channel, depending on the channel number defined on initialization.
|
||||
- refdata: reference signal to be cancelled from the input.
|
||||
- outdata: output audio stream, the number of channels is the same as indata.
|
||||
|
||||
- ``void aec_destroy(aec_handle_t inst)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Destruction of an AEC handle.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- inst: the AEC handle to be destroyed.
|
||||
|
||||
NS
|
||||
--
|
||||
|
||||
.. _overview-3:
|
||||
|
||||
Overview
|
||||
~~~~~~~~
|
||||
|
||||
Single-channel speech enhancement. If multiple mics are available with the board, MASE is recommened for noise suppression.
|
||||
|
||||
.. _api-reference-3:
|
||||
|
||||
API Reference
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
- ``ns_handle_t ns_pro_create(int frame_length, int mode)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Creates an instance of the more powerful noise suppression algorithm.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- frame_length_ms: audio frame length in ms.
|
||||
- mode: 0: Mild, 1: Medium, 2: Aggressive
|
||||
|
||||
**Return**
|
||||
|
||||
Handle to NS.
|
||||
|
||||
- ``void ns_process(ns_handle_t inst, int16_t *indata, int16_t *outdata)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Prodessing of NS for one frame.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- inst: NS handle.
|
||||
- indata: input audio stream.
|
||||
- outdata: output audio stream.
|
||||
|
||||
- ``void ns_destroy(ns_handle_t inst)``
|
||||
|
||||
**Definition**
|
||||
|
||||
Destruction of a NS handle.
|
||||
|
||||
**Parameter**
|
||||
|
||||
- inst: the NS handle to be destroyed.
|
||||
@ -21,7 +21,6 @@ Based on years of hardware design and development experience, Loxin can provide
|
||||
Wake word model <wake_word_engine/README>
|
||||
Customized wake words <wake_word_engine/ESP_Wake_Words_Customization>
|
||||
Speech commands <speech_command_recognition/README>
|
||||
Acoustic algorithm introduction <acoustic_algorithm/README>
|
||||
Model loading method <flash_model/README>
|
||||
Microphone Design Guidelines <audio_front_end/Espressif_Microphone_Design_Guidelines>
|
||||
Test Reports <test_report/README>
|
||||
|
||||
@ -1,248 +0,0 @@
|
||||
声学算法介绍
|
||||
============
|
||||
|
||||
:link_to_translation:`en:[English]`
|
||||
|
||||
esp-sr 中提供的声学算法包括语音活动检测 (VAD)、自适应增益控制 (AGC)、声学回声消除 (AEC)、噪声抑制 (NS) 和麦克风阵列语音增强 (MASE)。 VAD、AGC、AEC 和 NS 支持单麦克风和多麦克风开发板,MASE 仅支持多麦克风板。
|
||||
|
||||
VAD
|
||||
---
|
||||
|
||||
概述
|
||||
~~~~
|
||||
|
||||
VAD将一个音频流作为输入,并输出该流的某一帧是否包含音频的预测。
|
||||
|
||||
API 参考
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
头文件
|
||||
^^^^^^
|
||||
|
||||
- esp_vad.h
|
||||
|
||||
函数
|
||||
^^^^
|
||||
|
||||
- ``vad_handle_t vad_create(vad_mode_t vad_mode)``
|
||||
|
||||
**定义**
|
||||
|
||||
VAD 句柄的初始化。
|
||||
|
||||
**范围**
|
||||
|
||||
- vad_mode:VAD的工作模式,VAD_MODE_0到VAD_MODE_4,数值越大表示VAD越激进。
|
||||
|
||||
**返回值**
|
||||
|
||||
vad_handle_t
|
||||
|
||||
- ``vad_state_t vad_process(vad_handle_t inst, int16_t *data, int sample_rate_hz, int one_frame_ms);``
|
||||
|
||||
**定义**
|
||||
|
||||
处理一帧的 VAD。
|
||||
|
||||
**范围**
|
||||
|
||||
- inst:VAD句柄。
|
||||
- data: 保存输入和输出音频流的缓冲区。
|
||||
- sample_rate_hz: 采样频率(Hz)可以是32000、16000、8000,默认是16000。
|
||||
- one_frame_ms: 音频处理的长度可以是10ms、20ms、30ms,默认:30。
|
||||
|
||||
**返回值**
|
||||
|
||||
- VAD_SILENCE if no voice
|
||||
- VAD_SPEECH if voice is detected
|
||||
|
||||
- ``void vad_destroy(vad_handle_t inst)``
|
||||
|
||||
**定义**
|
||||
|
||||
- 销毁 VAD 句柄.
|
||||
|
||||
**范围**
|
||||
|
||||
- inst:要销毁的VAD句柄。
|
||||
|
||||
AGC
|
||||
---
|
||||
|
||||
.. _overview-1:
|
||||
|
||||
概述
|
||||
~~~~~~~~
|
||||
|
||||
AGC将音频信号的音量保持在一个稳定的水平,以避免信号过大而被削掉或过小而无法触发语音识别器的情况。
|
||||
|
||||
.. _api-reference-1:
|
||||
|
||||
API 参考
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
- ``void *esp_agc_open(int agc_mode, int sample_rate)``
|
||||
|
||||
**定义**
|
||||
|
||||
AGC句柄的初始化。
|
||||
|
||||
**范围**
|
||||
|
||||
- agc_mode:AGC的工作模式,3表示启用AGC,0表示禁用。
|
||||
- sample_rate:音频信号的采样率。
|
||||
|
||||
**返回值**
|
||||
|
||||
- AGC 句柄.
|
||||
|
||||
- ``int esp_agc_process(void *agc_handle, short *in_pcm, short *out_pcm, int frame_size, int sample_rate)``
|
||||
|
||||
**定义**
|
||||
|
||||
对一帧的AGC进行分配。
|
||||
|
||||
**范围**
|
||||
|
||||
- agc_handle: AGC手柄。
|
||||
- in_pcm: 输入音频流。
|
||||
- out_pcm:输出音频流。
|
||||
- frame_size: 信号帧的长度,单位是ms。
|
||||
- sample_rate:信号的采样率,单位为Hz。
|
||||
|
||||
**返回值**
|
||||
|
||||
- 返回 0 如果 AGC processing 成功, -1 如果失败; -2 和 -3 分别表示采样率和帧大小的无效输入。
|
||||
|
||||
- ``void esp_agc_clse(void *agc_handle)``
|
||||
|
||||
**定义**
|
||||
|
||||
- 销毁一个AGC句柄。
|
||||
|
||||
**范围**
|
||||
|
||||
- agc_handle: 销毁AGC句柄。
|
||||
|
||||
AEC
|
||||
---
|
||||
|
||||
.. _overview-2:
|
||||
|
||||
概述
|
||||
~~~~~~~~
|
||||
|
||||
AEC抑制了电路板上的扬声器所播放的声音的回声。
|
||||
|
||||
.. _api-reference-2:
|
||||
|
||||
API 参考
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
- ``aec_handle_t aec_create(int sample_rate, int frame_length, int filter_length)``
|
||||
|
||||
**定义**
|
||||
|
||||
AEC 句柄的初始化。
|
||||
|
||||
**范围**
|
||||
|
||||
- sample_rate: audio signal sampling rate.
|
||||
- frame_length: audio frame length in ms.
|
||||
- filter_length: the length of adaptive filter in AEC.
|
||||
|
||||
**返回值**
|
||||
|
||||
Handle to AEC.
|
||||
|
||||
- ``aec_create_t aec_create_multimic(int sample_rate, int frame_length, int filter_length, int nch)``
|
||||
|
||||
**定义**
|
||||
|
||||
AEC 句柄的初始化。
|
||||
|
||||
**范围**
|
||||
|
||||
- sample_rate:音频信号采样率。
|
||||
- frame_length:以毫秒为单位的音频帧长度。
|
||||
- filter_length:AEC 中自适应滤波器的长度。
|
||||
- nch:要处理的信号的通道数。
|
||||
|
||||
**返回值**
|
||||
|
||||
Handle to AEC.
|
||||
|
||||
- ``void aec_process(aec_handle_t inst, int16_t *indata, int16_t *refdata, int16_t *outdata)``
|
||||
|
||||
**定义**
|
||||
|
||||
一帧的AEC处理。
|
||||
|
||||
**范围**
|
||||
|
||||
- inst:AEC 手柄。
|
||||
- indata:输入音频流,可以是单声道或多声道,取决于初始化时定义的声道号。
|
||||
- refdata:要从输入中取消的参考信号。
|
||||
- outdata:输出音频流,通道数与indata相同。
|
||||
|
||||
- ``void aec_destroy(aec_handle_t inst)``
|
||||
|
||||
**定义**
|
||||
|
||||
AEC 句柄的破坏。
|
||||
|
||||
**范围**
|
||||
|
||||
-inst:要销毁的 AEC 句柄。
|
||||
|
||||
NS
|
||||
--
|
||||
|
||||
.. _overview-3:
|
||||
|
||||
概述
|
||||
~~~~~~~~
|
||||
|
||||
单通道语音增强。如果电路板上有多个麦克风可用,建议使用 MASE 进行噪声抑制。
|
||||
|
||||
.. _api-reference-3:
|
||||
|
||||
API 参考
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
- ``ns_handle_t ns_pro_create(int frame_length, int mode)``
|
||||
|
||||
**定义**
|
||||
|
||||
创建更强大的噪声抑制算法的实例。
|
||||
|
||||
**范围**
|
||||
|
||||
- frame_length_ms:以毫秒为单位的音频帧长度。
|
||||
- mode:0:轻度,1:中度,2:激进
|
||||
|
||||
**返回值**
|
||||
|
||||
Handle to NS.
|
||||
|
||||
- ``void ns_process(ns_handle_t inst, int16_t *indata, int16_t *outdata)``
|
||||
|
||||
**定义**
|
||||
|
||||
NS 处理一帧。
|
||||
|
||||
**范围**
|
||||
|
||||
- inst:NS 句柄。
|
||||
- indata:输入音频流。
|
||||
- outdata:输出音频流。
|
||||
|
||||
- ``void ns_destroy(ns_handle_t inst)``
|
||||
|
||||
**定义**
|
||||
|
||||
NS句柄的破坏。
|
||||
|
||||
**范围**
|
||||
|
||||
- inst:要销毁的 NS 句柄。
|
||||
@ -164,236 +164,238 @@ WakeNet or Bypass 简介
|
||||
|
||||
AFE 的输出音频为单通道数据。在语音识别场景,若WakeNet 开启的情况下,AFE 会输出有目标人声的单通道数据。在语音通话场景,将会输出信噪比更高的单通道数据。
|
||||
|
||||
快速开始
|
||||
--------
|
||||
.. only:: html
|
||||
|
||||
定义 afe_handle
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
快速开始
|
||||
--------
|
||||
|
||||
``afe_handle`` 是用户后续调用 afe 接口的函数句柄。所以第一步需先获得 ``afe_handle``。
|
||||
定义 afe_handle
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- 语音识别
|
||||
``afe_handle`` 是用户后续调用 afe 接口的函数句柄。所以第一步需先获得 ``afe_handle``。
|
||||
|
||||
- 语音识别
|
||||
|
||||
::
|
||||
|
||||
esp_afe_sr_iface_t *afe_handle = &ESP_AFE_SR_HANDLE;
|
||||
|
||||
- 语音通话
|
||||
|
||||
::
|
||||
|
||||
esp_afe_sr_iface_t *afe_handle = &ESP_AFE_VC_HANDLE;
|
||||
|
||||
配置 afe
|
||||
~~~~~~~~~~~
|
||||
|
||||
获取 afe 的配置:
|
||||
|
||||
::
|
||||
|
||||
esp_afe_sr_iface_t *afe_handle = &ESP_AFE_SR_HANDLE;
|
||||
afe_config_t afe_config = AFE_CONFIG_DEFAULT();
|
||||
|
||||
- 语音通话
|
||||
可调整 ``afe_config`` 中各算法模块的使能及其相应参数:
|
||||
|
||||
::
|
||||
|
||||
esp_afe_sr_iface_t *afe_handle = &ESP_AFE_VC_HANDLE;
|
||||
#define AFE_CONFIG_DEFAULT() { \
|
||||
.aec_init = true, \
|
||||
.se_init = true, \
|
||||
.vad_init = true, \
|
||||
.wakenet_init = true, \
|
||||
.voice_communication_init = false, \
|
||||
.voice_communication_agc_init = false, \
|
||||
.voice_communication_agc_gain = 15, \
|
||||
.vad_mode = VAD_MODE_3, \
|
||||
.wakenet_model_name = NULL, \
|
||||
.wakenet_mode = DET_MODE_2CH_90, \
|
||||
.afe_mode = SR_MODE_LOW_COST, \
|
||||
.afe_perferred_core = 0, \
|
||||
.afe_perferred_priority = 5, \
|
||||
.afe_ringbuf_size = 50, \
|
||||
.memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM, \
|
||||
.agc_mode = AFE_MN_PEAK_AGC_MODE_2, \
|
||||
.pcm_config.total_ch_num = 3, \
|
||||
.pcm_config.mic_num = 2, \
|
||||
.pcm_config.ref_num = 1, \
|
||||
}
|
||||
|
||||
配置 afe
|
||||
~~~~~~~~~~~
|
||||
- aec_init: AEC 算法是否使能。
|
||||
|
||||
获取 afe 的配置:
|
||||
- se_init: BSS/NS 算法是否使能。
|
||||
|
||||
::
|
||||
- vad_init: VAD 是否使能 ( 仅可在语音识别场景中使用 )
|
||||
|
||||
afe_config_t afe_config = AFE_CONFIG_DEFAULT();
|
||||
- wakenet_init: 唤醒是否使能。
|
||||
|
||||
可调整 ``afe_config`` 中各算法模块的使能及其相应参数:
|
||||
- voice_communication_init: 语音通话是否使能。与 wakenet_init
|
||||
不能同时使能。
|
||||
|
||||
::
|
||||
- voice_communication_agc_init: 语音通话中AGC是否使能。
|
||||
|
||||
#define AFE_CONFIG_DEFAULT() { \
|
||||
.aec_init = true, \
|
||||
.se_init = true, \
|
||||
.vad_init = true, \
|
||||
.wakenet_init = true, \
|
||||
.voice_communication_init = false, \
|
||||
.voice_communication_agc_init = false, \
|
||||
.voice_communication_agc_gain = 15, \
|
||||
.vad_mode = VAD_MODE_3, \
|
||||
.wakenet_model_name = NULL, \
|
||||
.wakenet_mode = DET_MODE_2CH_90, \
|
||||
.afe_mode = SR_MODE_LOW_COST, \
|
||||
.afe_perferred_core = 0, \
|
||||
.afe_perferred_priority = 5, \
|
||||
.afe_ringbuf_size = 50, \
|
||||
.memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM, \
|
||||
.agc_mode = AFE_MN_PEAK_AGC_MODE_2, \
|
||||
.pcm_config.total_ch_num = 3, \
|
||||
.pcm_config.mic_num = 2, \
|
||||
.pcm_config.ref_num = 1, \
|
||||
}
|
||||
- voice_communication_agc_gain: AGC的增益值,单位为dB。
|
||||
|
||||
- aec_init: AEC 算法是否使能。
|
||||
- vad_mode: VAD 检测的操作模式,越大越激进。
|
||||
|
||||
- se_init: BSS/NS 算法是否使能。
|
||||
- wakenet_model_name: 宏 ``AFE_CONFIG_DEFAULT()`` 中该值默认为NULL。使用 ``idf.py menuconfig`` 选择了相应的唤醒模型后,在调用 ``afe_handle->create_from_config`` 之前,需给该处赋值具体的模型名字,类型为字符串形式。唤醒模型的具体说明,详见: `flash_model <../flash_model/README_cn.md>`__ (注意:示例代码中,使用了 esp_srmodel_filter() 获取模型名字,若 menuconfig 中选择了多个模型共存,该函数将会随机返回一个模型名字)
|
||||
|
||||
- vad_init: VAD 是否使能 ( 仅可在语音识别场景中使用 )
|
||||
- wakenet_mode: 唤醒的模式。对应为多少通道的唤醒,根据mic通道的数量选择
|
||||
|
||||
- wakenet_init: 唤醒是否使能。
|
||||
- afe_mode: 乐鑫 AFE 目前支持 2 种工作模式,分别为:SR_MODE_LOW_COST,SR_MODE_HIGH_PERF。详细可见 afe_sr_mode_t 枚举。
|
||||
|
||||
- voice_communication_init: 语音通话是否使能。与 wakenet_init
|
||||
不能同时使能。
|
||||
- SR_MODE_LOW_COST: 量化版本,占用资源较少。
|
||||
|
||||
- voice_communication_agc_init: 语音通话中AGC是否使能。
|
||||
- SR_MODE_HIGH_PERF: 非量化版本,占用资源较多。
|
||||
|
||||
- voice_communication_agc_gain: AGC的增益值,单位为dB。
|
||||
**ESP32 芯片,只支持模式 SR_MODE_HIGH_PERF; ESP32S3 芯片,两种模式均支持**
|
||||
|
||||
- vad_mode: VAD 检测的操作模式,越大越激进。
|
||||
- afe_perferred_core: AFE 内部 BSS/NS/MISO 算法,运行在哪个 CPU 核。
|
||||
|
||||
- wakenet_model_name: 宏 ``AFE_CONFIG_DEFAULT()`` 中该值默认为NULL。使用 ``idf.py menuconfig`` 选择了相应的唤醒模型后,在调用 ``afe_handle->create_from_config`` 之前,需给该处赋值具体的模型名字,类型为字符串形式。唤醒模型的具体说明,详见: `flash_model <../flash_model/README_cn.md>`__ (注意:示例代码中,使用了 esp_srmodel_filter() 获取模型名字,若 menuconfig 中选择了多个模型共存,该函数将会随机返回一个模型名字)
|
||||
- afe_perferred_priority: AFE 内部 BSS/NS/MISO 算法,运行的task优先级。
|
||||
|
||||
- wakenet_mode: 唤醒的模式。对应为多少通道的唤醒,根据mic通道的数量选择
|
||||
- afe_ringbuf_size: 内部 ringbuf 大小的配置。
|
||||
|
||||
- afe_mode: 乐鑫 AFE 目前支持 2 种工作模式,分别为:SR_MODE_LOW_COST,SR_MODE_HIGH_PERF。详细可见 afe_sr_mode_t 枚举。
|
||||
- memory_alloc_mode: 内存分配的模式。可配置三个值:
|
||||
|
||||
- SR_MODE_LOW_COST: 量化版本,占用资源较少。
|
||||
- AFE_MEMORY_ALLOC_MORE_INTERNAL: 更多的从内部ram分配。
|
||||
|
||||
- SR_MODE_HIGH_PERF: 非量化版本,占用资源较多。
|
||||
- AFE_MEMORY_ALLOC_INTERNAL_PSRAM_BALANCE: 部分从内部ram分配。
|
||||
|
||||
**ESP32 芯片,只支持模式 SR_MODE_HIGH_PERF; ESP32S3 芯片,两种模式均支持**
|
||||
- AFE_MEMORY_ALLOC_MORE_PSRAM: 绝大部分从外部psram分配
|
||||
|
||||
- afe_perferred_core: AFE 内部 BSS/NS/MISO 算法,运行在哪个 CPU 核。
|
||||
- agc_mode: 将音频线性放大的 level 配置,该配置在语音识别场景下起作用,并且在唤醒使能时才生效。可配置四个值:
|
||||
|
||||
- afe_perferred_priority: AFE 内部 BSS/NS/MISO 算法,运行的task优先级。
|
||||
- AFE_MN_PEAK_AGC_MODE_1: 线性放大喂给后续multinet的音频,峰值处为 -5dB。
|
||||
|
||||
- afe_ringbuf_size: 内部 ringbuf 大小的配置。
|
||||
- AFE_MN_PEAK_AGC_MODE_2: 线性放大喂给后续multinet的音频,峰值处为 -4dB。
|
||||
|
||||
- memory_alloc_mode: 内存分配的模式。可配置三个值:
|
||||
- AFE_MN_PEAK_AGC_MODE_3: 线性放大喂给后续multinet的音频,峰值处为 -3dB。
|
||||
|
||||
- AFE_MEMORY_ALLOC_MORE_INTERNAL: 更多的从内部ram分配。
|
||||
- AFE_MN_PEAK_NO_AGC: 不做线性放大
|
||||
|
||||
- AFE_MEMORY_ALLOC_INTERNAL_PSRAM_BALANCE: 部分从内部ram分配。
|
||||
- pcm_config: 根据 ``afe->feed()`` 喂入的音频结构进行配置,该结构体有三个成员变量需要配置:
|
||||
|
||||
- AFE_MEMORY_ALLOC_MORE_PSRAM: 绝大部分从外部psram分配
|
||||
- total_ch_num: 音频总的通道数,total_ch_num = mic_num + ref_num。
|
||||
|
||||
- agc_mode: 将音频线性放大的 level 配置,该配置在语音识别场景下起作用,并且在唤醒使能时才生效。可配置四个值:
|
||||
- mic_num: 音频的麦克风通道数。目前仅支持配置为 1 或 2。
|
||||
|
||||
- AFE_MN_PEAK_AGC_MODE_1: 线性放大喂给后续multinet的音频,峰值处为 -5dB。
|
||||
- ref_num: 音频的参考回路通道数,目前仅支持配置为 0 或 1。
|
||||
|
||||
- AFE_MN_PEAK_AGC_MODE_2: 线性放大喂给后续multinet的音频,峰值处为 -4dB。
|
||||
创建 afe_data
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
- AFE_MN_PEAK_AGC_MODE_3: 线性放大喂给后续multinet的音频,峰值处为 -3dB。
|
||||
用户使用 ``afe_handle->create_from_config(&afe_config)`` 函数来获得数据句柄,这将会在afe内部使用,传入的参数即为上面第2步中获得的配置。
|
||||
|
||||
- AFE_MN_PEAK_NO_AGC: 不做线性放大
|
||||
::
|
||||
|
||||
- pcm_config: 根据 ``afe->feed()`` 喂入的音频结构进行配置,该结构体有三个成员变量需要配置:
|
||||
/**
|
||||
* @brief Function to initialze a AFE_SR instance
|
||||
*
|
||||
* @param afe_config The config of AFE_SR
|
||||
* @returns Handle to the AFE_SR data
|
||||
*/
|
||||
typedef esp_afe_sr_data_t* (*esp_afe_sr_iface_op_create_from_config_t)(afe_config_t *afe_config);
|
||||
|
||||
- total_ch_num: 音频总的通道数,total_ch_num = mic_num + ref_num。
|
||||
feed 音频数据
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
- mic_num: 音频的麦克风通道数。目前仅支持配置为 1 或 2。
|
||||
在初始化 AFE 完成后,用户需要将音频数据使用 ``afe_handle->feed()`` 函数输入到 AFE 中进行处理。
|
||||
|
||||
- ref_num: 音频的参考回路通道数,目前仅支持配置为 0 或 1。
|
||||
输入的音频大小和排布格式可以参考 **输入音频** 这一步骤。
|
||||
|
||||
创建 afe_data
|
||||
~~~~~~~~~~~~~~~~
|
||||
::
|
||||
|
||||
用户使用 ``afe_handle->create_from_config(&afe_config)`` 函数来获得数据句柄,这将会在afe内部使用,传入的参数即为上面第2步中获得的配置。
|
||||
/**
|
||||
* @brief Feed samples of an audio stream to the AFE_SR
|
||||
*
|
||||
* @Warning The input data should be arranged in the format of channel interleaving.
|
||||
* The last channel is reference signal if it has reference data.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
*
|
||||
* @param in The input microphone signal, only support signed 16-bit @ 16 KHZ. The frame size can be queried by the
|
||||
* `get_feed_chunksize`.
|
||||
* @return The size of input
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_feed_t)(esp_afe_sr_data_t *afe, const int16_t* in);
|
||||
|
||||
::
|
||||
获取音频通道数:
|
||||
|
||||
/**
|
||||
* @brief Function to initialze a AFE_SR instance
|
||||
*
|
||||
* @param afe_config The config of AFE_SR
|
||||
* @returns Handle to the AFE_SR data
|
||||
*/
|
||||
typedef esp_afe_sr_data_t* (*esp_afe_sr_iface_op_create_from_config_t)(afe_config_t *afe_config);
|
||||
使用 ``afe_handle->get_total_channel_num()`` 函数可以获取需要传入 ``afe_handle->feed()`` 函数的总数据通道数。其返回值等于AFE_CONFIG_DEFAULT()中配置的 ``pcm_config.mic_num + pcm_config.ref_num``
|
||||
|
||||
feed 音频数据
|
||||
~~~~~~~~~~~~~~~~
|
||||
::
|
||||
|
||||
在初始化 AFE 完成后,用户需要将音频数据使用 ``afe_handle->feed()`` 函数输入到 AFE 中进行处理。
|
||||
/**
|
||||
* @brief Get the total channel number which be config
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The amount of total channels
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_get_total_channel_num_t)(esp_afe_sr_data_t *afe);
|
||||
|
||||
输入的音频大小和排布格式可以参考 **输入音频** 这一步骤。
|
||||
fetch 音频数据
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
::
|
||||
用户调用 ``afe_handle->fetch()`` 函数可以获取处理完成的单通道音频以及相关处理信息。
|
||||
|
||||
/**
|
||||
* @brief Feed samples of an audio stream to the AFE_SR
|
||||
*
|
||||
* @Warning The input data should be arranged in the format of channel interleaving.
|
||||
* The last channel is reference signal if it has reference data.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
*
|
||||
* @param in The input microphone signal, only support signed 16-bit @ 16 KHZ. The frame size can be queried by the
|
||||
* `get_feed_chunksize`.
|
||||
* @return The size of input
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_feed_t)(esp_afe_sr_data_t *afe, const int16_t* in);
|
||||
fetch 的数据采样点数目(采样点数据类型为 int16)可以通过 ``afe_handle->get_fetch_chunksize`` 获取。
|
||||
|
||||
获取音频通道数:
|
||||
::
|
||||
|
||||
使用 ``afe_handle->get_total_channel_num()`` 函数可以获取需要传入 ``afe_handle->feed()`` 函数的总数据通道数。其返回值等于AFE_CONFIG_DEFAULT()中配置的 ``pcm_config.mic_num + pcm_config.ref_num``
|
||||
/**
|
||||
* @brief Get the amount of each channel samples per frame that need to be passed to the function
|
||||
*
|
||||
* Every speech enhancement AFE_SR processes a certain number of samples at the same time. This function
|
||||
* can be used to query that amount. Note that the returned amount is in 16-bit samples, not in bytes.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The amount of samples to feed the fetch function
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_get_samp_chunksize_t)(esp_afe_sr_data_t *afe);
|
||||
|
||||
::
|
||||
``afe_handle->fetch()`` 的函数声明如下:
|
||||
|
||||
/**
|
||||
* @brief Get the total channel number which be config
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The amount of total channels
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_get_total_channel_num_t)(esp_afe_sr_data_t *afe);
|
||||
::
|
||||
|
||||
fetch 音频数据
|
||||
~~~~~~~~~~~~~~
|
||||
/**
|
||||
* @brief fetch enhanced samples of an audio stream from the AFE_SR
|
||||
*
|
||||
* @Warning The output is single channel data, no matter how many channels the input is.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The result of output, please refer to the definition of `afe_fetch_result_t`. (The frame size of output audio can be queried by the `get_fetch_chunksize`.)
|
||||
*/
|
||||
typedef afe_fetch_result_t* (*esp_afe_sr_iface_op_fetch_t)(esp_afe_sr_data_t *afe);
|
||||
|
||||
用户调用 ``afe_handle->fetch()`` 函数可以获取处理完成的单通道音频以及相关处理信息。
|
||||
其返回值为结构体指针,结构体定义如下:
|
||||
|
||||
fetch 的数据采样点数目(采样点数据类型为 int16)可以通过 ``afe_handle->get_fetch_chunksize`` 获取。
|
||||
::
|
||||
|
||||
::
|
||||
|
||||
/**
|
||||
* @brief Get the amount of each channel samples per frame that need to be passed to the function
|
||||
*
|
||||
* Every speech enhancement AFE_SR processes a certain number of samples at the same time. This function
|
||||
* can be used to query that amount. Note that the returned amount is in 16-bit samples, not in bytes.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The amount of samples to feed the fetch function
|
||||
*/
|
||||
typedef int (*esp_afe_sr_iface_op_get_samp_chunksize_t)(esp_afe_sr_data_t *afe);
|
||||
|
||||
``afe_handle->fetch()`` 的函数声明如下:
|
||||
|
||||
::
|
||||
|
||||
/**
|
||||
* @brief fetch enhanced samples of an audio stream from the AFE_SR
|
||||
*
|
||||
* @Warning The output is single channel data, no matter how many channels the input is.
|
||||
*
|
||||
* @param afe The AFE_SR object to query
|
||||
* @return The result of output, please refer to the definition of `afe_fetch_result_t`. (The frame size of output audio can be queried by the `get_fetch_chunksize`.)
|
||||
*/
|
||||
typedef afe_fetch_result_t* (*esp_afe_sr_iface_op_fetch_t)(esp_afe_sr_data_t *afe);
|
||||
|
||||
其返回值为结构体指针,结构体定义如下:
|
||||
|
||||
::
|
||||
|
||||
/**
|
||||
* @brief The result of fetch function
|
||||
*/
|
||||
typedef struct afe_fetch_result_t
|
||||
{
|
||||
int16_t *data; // the data of audio.
|
||||
int data_size; // the size of data. The unit is byte.
|
||||
int wakeup_state; // the value is wakenet_state_t
|
||||
int wake_word_index; // if the wake word is detected. It will store the wake word index which start from 1.
|
||||
int vad_state; // the value is afe_vad_state_t
|
||||
int trigger_channel_id; // the channel index of output
|
||||
int wake_word_length; // the length of wake word. It's unit is the number of samples.
|
||||
int ret_value; // the return state of fetch function
|
||||
void* reserved; // reserved for future use
|
||||
} afe_fetch_result_t;
|
||||
|
||||
WakeNet 使用
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
当用户在唤醒后需要进行其他操作,比如离线或在线语音识别,这时候可以暂停 WakeNet 的运行,从而减轻 CPU 的资源消耗。
|
||||
|
||||
用户可以调用 ``afe_handle->disable_wakenet(afe_data)`` 来停止 WakeNet。当后续应用结束后又可以调用 ``afe_handle->enable_wakenet(afe_data)`` 来开启 WakeNet。
|
||||
|
||||
另外,ESP32S3 芯片,支持唤醒词切换。(注: ESP32 芯片只支持一个唤醒词,不支持切换)。在初始化 AFE 完成后,ESP32S3 芯片可通过 ``set_wakenet()`` 函数切换唤醒词。例如, ``afe_handle->set_wakenet(afe_data, “wn9_hilexin”)`` 切换到“Hi Lexin”唤醒词。具体如何配置多个唤醒词,详见: `flash_model <../flash_model/README_CN.md>`__
|
||||
/**
|
||||
* @brief The result of fetch function
|
||||
*/
|
||||
typedef struct afe_fetch_result_t
|
||||
{
|
||||
int16_t *data; // the data of audio.
|
||||
int data_size; // the size of data. The unit is byte.
|
||||
int wakeup_state; // the value is wakenet_state_t
|
||||
int wake_word_index; // if the wake word is detected. It will store the wake word index which start from 1.
|
||||
int vad_state; // the value is afe_vad_state_t
|
||||
int trigger_channel_id; // the channel index of output
|
||||
int wake_word_length; // the length of wake word. It's unit is the number of samples.
|
||||
int ret_value; // the return state of fetch function
|
||||
void* reserved; // reserved for future use
|
||||
} afe_fetch_result_t;
|
||||
|
||||
WakeNet 使用
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
当用户在唤醒后需要进行其他操作,比如离线或在线语音识别,这时候可以暂停 WakeNet 的运行,从而减轻 CPU 的资源消耗。
|
||||
|
||||
用户可以调用 ``afe_handle->disable_wakenet(afe_data)`` 来停止 WakeNet。当后续应用结束后又可以调用 ``afe_handle->enable_wakenet(afe_data)`` 来开启 WakeNet。
|
||||
|
||||
另外,ESP32S3 芯片,支持唤醒词切换。(注: ESP32 芯片只支持一个唤醒词,不支持切换)。在初始化 AFE 完成后,ESP32S3 芯片可通过 ``set_wakenet()`` 函数切换唤醒词。例如, ``afe_handle->set_wakenet(afe_data, “wn9_hilexin”)`` 切换到“Hi Lexin”唤醒词。具体如何配置多个唤醒词,详见: `flash_model <../flash_model/README_CN.md>`__
|
||||
|
||||
AEC 使用
|
||||
~~~~~~~~
|
||||
|
||||
@ -153,18 +153,20 @@ ESP32S3 支持:
|
||||
|
||||
- 自定义路径 如果用户想将模型放置于指定文件夹,可以自己修改 ``get_model_base_path()`` 函数,位于 ``ESP-SR_PATH/model/model_path.c``。 比如,指定文件夹为 SD 卡目录中的 ``espmodel``, 则可以修改该函数为:
|
||||
|
||||
::
|
||||
.. only:: html
|
||||
|
||||
char *get_model_base_path(void)
|
||||
{
|
||||
#if defined CONFIG_MODEL_IN_SDCARD
|
||||
return "sdcard/espmodel";
|
||||
#elif defined CONFIG_MODEL_IN_SPIFFS
|
||||
return "srmodel";
|
||||
#else
|
||||
return NULL;
|
||||
#endif
|
||||
}
|
||||
::
|
||||
|
||||
char *get_model_base_path(void)
|
||||
{
|
||||
#if defined CONFIG_MODEL_IN_SDCARD
|
||||
return "sdcard/espmodel";
|
||||
#elif defined CONFIG_MODEL_IN_SPIFFS
|
||||
return "srmodel";
|
||||
#else
|
||||
return NULL;
|
||||
#endif
|
||||
}
|
||||
|
||||
- 初始化 SD 卡
|
||||
|
||||
@ -172,39 +174,41 @@ ESP32S3 支持:
|
||||
|
||||
完成以上操作后,便可以进行工程的烧录。
|
||||
|
||||
代码中模型初始化与使用
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
.. only:: html
|
||||
|
||||
::
|
||||
代码中模型初始化与使用
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
//
|
||||
// step1: initialize spiffs and return models in spiffs
|
||||
//
|
||||
srmodel_list_t *models = esp_srmodel_init();
|
||||
::
|
||||
|
||||
//
|
||||
// step2: select the specific model by keywords
|
||||
//
|
||||
char *wn_name = esp_srmodel_filter(models, ESP_WN_PREFIX, NULL); // select wakenet model
|
||||
char *nm_name = esp_srmodel_filter(models, ESP_MN_PREFIX, NULL); // select multinet model
|
||||
char *alexa_wn_name = esp_srmodel_filter(models, ESP_WN_PREFIX, "alexa"); // select wakenet with "alexa" wake word.
|
||||
char *en_mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_ENGLISH); // select english multinet model
|
||||
char *cn_mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_CHINESE); // select english multinet model
|
||||
//
|
||||
// step1: initialize spiffs and return models in spiffs
|
||||
//
|
||||
srmodel_list_t *models = esp_srmodel_init();
|
||||
|
||||
// It also works if you use the model name directly in your code.
|
||||
char *my_wn_name = "wn9_hilexin"
|
||||
// we recommend you to check that it is loaded correctly
|
||||
if (!esp_srmodel_exists(models, my_wn_name))
|
||||
printf("%s can not be loaded correctly\n")
|
||||
//
|
||||
// step2: select the specific model by keywords
|
||||
//
|
||||
char *wn_name = esp_srmodel_filter(models, ESP_WN_PREFIX, NULL); // select wakenet model
|
||||
char *nm_name = esp_srmodel_filter(models, ESP_MN_PREFIX, NULL); // select multinet model
|
||||
char *alexa_wn_name = esp_srmodel_filter(models, ESP_WN_PREFIX, "alexa"); // select wakenet with "alexa" wake word.
|
||||
char *en_mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_ENGLISH); // select english multinet model
|
||||
char *cn_mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_CHINESE); // select english multinet model
|
||||
|
||||
//
|
||||
// step3: initialize model
|
||||
//
|
||||
esp_wn_iface_t *wakenet = esp_wn_handle_from_name(wn_name);
|
||||
model_iface_data_t *wn_model_data = wakenet->create(wn_name, DET_MODE_2CH_90);
|
||||
// It also works if you use the model name directly in your code.
|
||||
char *my_wn_name = "wn9_hilexin"
|
||||
// we recommend you to check that it is loaded correctly
|
||||
if (!esp_srmodel_exists(models, my_wn_name))
|
||||
printf("%s can not be loaded correctly\n")
|
||||
|
||||
esp_mn_iface_t *multinet = esp_mn_handle_from_name(mn_name);
|
||||
model_iface_data_t *mn_model_data = multinet->create(mn_name, 6000);
|
||||
//
|
||||
// step3: initialize model
|
||||
//
|
||||
esp_wn_iface_t *wakenet = esp_wn_handle_from_name(wn_name);
|
||||
model_iface_data_t *wn_model_data = wakenet->create(wn_name, DET_MODE_2CH_90);
|
||||
|
||||
esp_mn_iface_t *multinet = esp_mn_handle_from_name(mn_name);
|
||||
model_iface_data_t *mn_model_data = multinet->create(mn_name, 6000);
|
||||
|
||||
.. |select wake wake| image:: ../../_static/wn_menu1.png
|
||||
.. |multi wake wake| image:: ../../_static/wn_menu2.png
|
||||
|
||||
@ -24,7 +24,6 @@ ESP-SR 用户指南
|
||||
唤醒词模型 <wake_word_engine/README>
|
||||
定制化唤醒词 <wake_word_engine/ESP_Wake_Words_Customization>
|
||||
语音指令 <speech_command_recognition/README>
|
||||
声学算法介绍 <acoustic_algorithm/README>
|
||||
模型加载方式 <flash_model/README>
|
||||
麦克风设计指南 <audio_front_end/Espressif_Microphone_Design_Guidelines>
|
||||
测试报告 <test_report/README>
|
||||
|
||||
@ -85,6 +85,11 @@ MultiNet 对命令词自定义方法没有限制,用户可以通过任意方
|
||||
|
||||
**并且我们也提供相应的工具,供用户将汉字转换为拼音,详细可见:** `英文转音素工具 <../../tool/multinet_g2p.py>`__。
|
||||
|
||||
.. only:: latex
|
||||
|
||||
.. figure:: ../../_static/QR_multinet_g2p.png
|
||||
:alt: menuconfig_add_speech_commands
|
||||
|
||||
离线设置命令词
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
|
||||
@ -138,18 +138,21 @@
|
||||
唤醒率测试
|
||||
-----------
|
||||
|
||||
+----------------+------------+---------------+-----------+-------------+-------------+--------+--------+
|
||||
| 测试项 | 环境噪声 | 噪声指标 | 信噪比SNR | 角度 | 距离 | 唤醒率 | 识别率 |
|
||||
+================+============+===============+===========+=============+=============+========+========+
|
||||
| 本地唤醒率测试 | 安静 | - 人声:59dBA | NA | - 人声:90° | - 人声:3米 | 99% | 91.5% |
|
||||
| | | - 噪声:NA | | - 噪声45° | - 噪声:2米 | | |
|
||||
| +------------+---------------+-----------+ + +--------+--------+
|
||||
| | 白噪声 | - 人声:59dBA | ≥4dBA | | | 99% | 78.25% |
|
||||
| | | - 噪声:55dBA | | | | | |
|
||||
| +------------+---------------+-----------+ + +--------+--------+
|
||||
| | 人声类噪声 | - 人声:59dBA | ≥4dBA | | | 99% | 82.77% |
|
||||
| | | - 噪声55dBA | | | | | |
|
||||
+----------------+------------+---------------+-----------+-------------+-------------+--------+--------+
|
||||
+----------------+------------+-------------+-----------+-----------+-----------+--------+--------+
|
||||
| 测试项 | 环境噪声 | 噪声指标 | 信噪比SNR | 角度 | 距离 | 唤醒率 | 识别率 |
|
||||
+================+============+=============+===========+===========+===========+========+========+
|
||||
| 本地唤醒率测试 | 安静 | 人声:59dBA | NA | 人声:90° | 人声:3米 | 99% | 91.5% |
|
||||
| | | | | | | | |
|
||||
| | | 噪声:NA | | 噪声:45° | 噪声:2米 | | |
|
||||
| +------------+-------------+-----------+ | +--------+--------+
|
||||
| | 白噪声 | 人声:59dBA | ≥4dBA | | | 99% | 78.25% |
|
||||
| | | | | | | | |
|
||||
| | | 噪声:55dBA | | | | | |
|
||||
| +------------+-------------+-----------+ | +--------+--------+
|
||||
| | 人声类噪声 | 人声:59dBA | ≥4dBA | | | 99% | 82.77% |
|
||||
| | | | | | | | |
|
||||
| | | 噪声:55dBA | | | | | |
|
||||
+----------------+------------+-------------+-----------+-----------+-----------+--------+--------+
|
||||
|
||||
误唤醒测试
|
||||
-----------
|
||||
@ -168,11 +171,11 @@
|
||||
+----------------+----------+---------------+-----------+--------+--------------+
|
||||
| 测试项 | 环境噪声 | 噪声指标 | 信噪比SNR | 唤醒率 | 命令词识别率 |
|
||||
+================+==========+===============+===========+========+==============+
|
||||
| 唤醒打断率测试 | 音乐 | - 人声59dBA | ≥-10dBA | 100% | 96% |
|
||||
| | | - 噪声69dBA | | | |
|
||||
| 唤醒打断率测试 | 音乐 | 人声59dBA | ≥ 10dBA | 100% | 96% |
|
||||
| | | 噪声69dBA | | | |
|
||||
| +----------+---------------+-----------+--------+--------------+
|
||||
| | TTS | - 人声:59dBA | ≥-10dBA | 100% | 96% |
|
||||
| | | - 噪声:69dBA | | | |
|
||||
| | TTS | 人声:59dBA | ≥ 10dBA | 100% | 96% |
|
||||
| | | 噪声:69dBA | | | |
|
||||
+----------------+----------+---------------+-----------+--------+--------------+
|
||||
|
||||
响应时间测试
|
||||
@ -181,8 +184,8 @@
|
||||
+--------------+----------+---------------+------------+----------+
|
||||
| 测试项 | 环境噪声 | 噪声指标 | 信噪比 SNR | 响应时间 |
|
||||
+==============+==========+===============+============+==========+
|
||||
| 响应时间测试 | 安静 | - 人声:59dBA | NA | <500 ms |
|
||||
| | | - 噪声:NA | | |
|
||||
| 响应时间测试 | 安静 | 人声:59dBA | NA | <500 ms |
|
||||
| | | 噪声:NA | | |
|
||||
+--------------+----------+---------------+------------+----------+
|
||||
|
||||
.. figure:: ../../_static/test_response_time.png
|
||||
|
||||
@ -24,6 +24,11 @@ WakeNet的流程图如下:
|
||||
- Speech Features:
|
||||
我们使用 `MFCC <https://en.wikipedia.org/wiki/Mel-frequency_cepstrum>`__ 方法提取语音频谱特征。输入的音频文件采样率为16KHz,单声道,编码方式为signed 16-bit。每帧窗宽和步长均为30ms。
|
||||
|
||||
.. only:: latex
|
||||
|
||||
.. figure:: ../../_static/QR_MFCC.png
|
||||
:alt: overview
|
||||
|
||||
- Neural Network:
|
||||
神经网络结构已经更新到第9版,其中:
|
||||
|
||||
@ -31,6 +36,11 @@ WakeNet的流程图如下:
|
||||
- wakeNet5应用于ESP32芯片。
|
||||
- wakeNet8和wakeNet9应用于ESP32S3芯片,模型基于 `Dilated Convolution <https://arxiv.org/pdf/1609.03499.pdf>`__ 结构。
|
||||
|
||||
.. only:: latex
|
||||
|
||||
.. figure:: ../../_static/QR_Dilated_Convolution.png
|
||||
:alt: overview
|
||||
|
||||
注意,WakeNet5,WakeNet5X2 和 WakeNet5X3 的网络结构一致,但是 WakeNet5X2 和 WakeNet5X3 的参数比 WakeNet5 要多。请参考 `性能测试 <#性能测试>`__ 来获取更多细节。
|
||||
|
||||
- Keyword Trigger Method:
|
||||
@ -71,9 +81,7 @@ WakeNet使用
|
||||
|
||||
- WakeNet 运行
|
||||
|
||||
WakeNet 目前包含在语音前端算法
|
||||
`AFE <../audio_front_end/README_CN.md>`__
|
||||
中,默认为运行状态,并将识别结果通过 AFE fetch 接口返回。
|
||||
WakeNet 目前包含在语音前端算法 `AFE <../audio_front_end/README_CN.md>`__中,默认为运行状态,并将识别结果通过 AFE fetch 接口返回。
|
||||
|
||||
如果用户不需要初始化 WakeNet,请在 AFE 配置时选择:
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user