esp-sr/esp-tts
2022-11-04 17:11:17 +08:00
..
esp_tts_chinese bugfix(tts): Fix some wrong pronounce 2022-11-04 17:11:17 +08:00
img add chinese tts 2020-09-14 15:22:07 +08:00
samples doc(tts): Update TTS doc 2022-08-16 17:17:19 +08:00
CMakeLists.txt feat(tts): Add Chinese TTS lib for ESP32-S3 2021-11-12 17:35:26 +08:00
component.mk add chinese tts 2020-09-14 15:22:07 +08:00
README_en.md add chinese tts 2020-09-14 15:22:07 +08:00
README.md feat(esp32): Support ESP-IDF v3.3 for esp32 chip 2022-08-17 20:29:21 +08:00

ESP Chinese TTS [中文]

Espressif Chinese TTS is a lightweight TTS system designed for embedded systems。

Overview

The Chinese TTS is based on concatenative method. The flow diagram of system is as follows:

chinese TTS

  • Parser : a Chinese grapheme to phoneme module, input text (UTF-8) and output Chinese pinyin list.
  • Synthesizer : a concatenative synthesizer, input pinyin list and output wave raw data. The default encoding of raw data is mono, 16 bit@16000 Hz.

Features

  • UTF-8 encoding text input

  • Streaming output

  • Polyphonic pronunciation

  • Adjustable speech rate

  • Digital broadcasting optimization

  • Custom sound set

Performance Test

Resource Occupancy

Flash image size 2.2 MB

RAM runtime: 20 KB

CPU loading testESP32 @ 240 MHz:

speech rate 0 1 2 3 4 5
times faster than real time 4.5 3.2 2.9 2.5 2.2 1.8

Note: the bigger rate, the faster speech speed. 0: slowest speaking speed, 5: fastest speaking speed.

Samples

User Guide

#include "esp_tts.h"
#include "esp_tts_voice_female.h"
#include "esp_partition.h"

/*** 1. create esp tts handle  ***/

//// Method1: use pre-define xiaole voice lib.
//// This method is not recommended because the method may make app bin exceed the limit of esp32
// esp_tts_handle_t *tts_handle=esp_tts_create(esp_tts_voice_female);

  
// method2: initial voice set from separate voice data partition

const esp_partition_t* part=esp_partition_find_first(ESP_PARTITION_TYPE_DATA, ESP_PARTITION_SUBTYPE_DATA_FAT, "voice_data");
if (part==0) printf("Couldn't find voice data partition!\n");
spi_flash_mmap_handle_t mmap;
uint16_t* voicedata;
esp_err_t err=esp_partition_mmap(part, 0, 3*1024*1024, SPI_FLASH_MMAP_DATA, (const void**)&voicedata, &mmap);
esp_tts_voice_t *voice=esp_tts_voice_set_init(&esp_tts_voice_template, voicedata); 

// 2. parse text and synthesis wave data
char *text="欢迎使用乐鑫语音合成";	
if (esp_tts_parse_chinese(tts_handle, text)) {  // parse text into pinyin list
	int len[1]={0};
	do {
		short *data=esp_tts_stream_play(tts_handle, len, 4); // streaming synthesis
		i2s_audio_play(data, len[0]*2, portMAX_DELAY);  // i2s output             
	} while(len[0]>0);
	i2s_zero_dma_buffer(0);
}

please refer to esp_tts.h for the details of API or chinese_tts example in esp-skainet.