esp-sr/docs/zh_CN/wake_word_engine/README.rst
liying@espressif.com fa9cfb1849 testing
2022-11-23 20:01:24 +08:00

108 lines
4.8 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

WakeNet
========
:link_to_translation:`en:[English]`
WakeNet是一个基于神经网络为低功耗嵌入式MCU设计的的唤醒词模型目前支持5个以内的唤醒词识别。
Overview
--------
WakeNet的流程图如下
.. figure:: ../../_static/wakenet_workflow.png
:alt: overview
.. raw:: html
<center>
.. raw:: html
</center>
- Speech Features
我们使用 `MFCC <https://en.wikipedia.org/wiki/Mel-frequency_cepstrum>`__ 方法提取语音频谱特征。输入的音频文件采样率为16KHz单声道编码方式为signed 16-bit。每帧窗宽和步长均为30ms。
.. only:: latex
.. figure:: ../../_static/QR_MFCC.png
:alt: overview
- Neural Network
神经网络结构已经更新到第9版其中
- wakeNet1,wakeNet2,wakeNet3,wakeNet4已经停止使用。
- wakeNet5应用于ESP32芯片。
- wakeNet8和wakeNet9应用于ESP32S3芯片模型基于 `Dilated Convolution <https://arxiv.org/pdf/1609.03499.pdf>`__ 结构。
.. only:: latex
.. figure:: ../../_static/QR_Dilated_Convolution.png
:alt: overview
注意WakeNet5,WakeNet5X2 和 WakeNet5X3 的网络结构一致,但是 WakeNet5X2 和 WakeNet5X3 的参数比 WakeNet5 要多。请参考 `性能测试 <#性能测试>`__ 来获取更多细节。
- Keyword Trigger Method
对连续的音频流为准确判断关键词的触发我们通过计算若干帧内识别结果的平均值M来判断触发。当M大于大于指定阈值发出触发的命令。
以下表格展示在不同芯片上的模型支持:
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| Chip | ESP32 | ESP32S3 |
+=================+===========+=============+=============+===========+===========+===========+===========+
| model | WakeNet 5 | WakeNet 8 | WakeNet 9 |
| +-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| | WakeNet 5 | WakeNet 5X2 | WakeNet 5X3 | Q16 | Q8 | Q16 | Q8 |
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| Hi,Lexin | √ | √ | √ | | | | √ |
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| nihaoxiaozhi | √ | | √ | | | | √ |
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| nihaoxiaoxin | | | √ | | | | |
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| xiaoaitongxue | | | | | | | √ |
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| Alexa | | | | √ | | | √ |
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| Hi,ESP | | | | | | | √ |
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
| Customized word | | | | | | | √ |
+-----------------+-----------+-------------+-------------+-----------+-----------+-----------+-----------+
WakeNet使用
-----------
- WakeNet 模型选择
WakeNet 模型选择请参考 `flash model 介绍 <../flash_model/README_CN.md>`__
自定义的唤醒词,请参考 `乐鑫语音唤醒词定制流程 <乐鑫语音唤醒词定制流程.md>`__
- WakeNet 运行
WakeNet 目前包含在语音前端算法 `AFE <../audio_front_end/README_CN.md>`__中,默认为运行状态,并将识别结果通过 AFE fetch 接口返回。
如果用户不需要初始化 WakeNet请在 AFE 配置时选择:
::
afe_config.wakenet_init = False.
如果用户想临时关闭/打开 WakeNet, 请在运行过程中调用:
::
afe_handle->disable_wakenet(afe_data)
afe_handle->enable_wakenet(afe_data)
性能测试
--------
具体请参考 `Performance Test <../performance_test/README.md>`__
唤醒词定制
----------
如果需要定制唤醒词,请参考 `乐鑫语音唤醒词定制流程 <乐鑫语音唤醒词定制流程.md>`__