update repo

2025-09-15 14:48:36 +08:00 · 2023-05-18 15:32:23 +08:00 · 2023-05-18 15:32:23 +08:00 · 4227722165
commit 4227722165
parent d5fa81d4dc
2 changed files with 16 additions and 11 deletions
--- a/docs/academic_recipe/asr_recipe.md
+++ b/docs/academic_recipe/asr_recipe.md
@ -1,7 +1,11 @@
 # Speech Recognition
 Here we take "Training a paraformer model from scratch using the AISHELL-1 dataset" as an example to introduce how to use FunASR. According to this example, users can similarly employ other datasets (such as AISHELL-2 dataset, etc.) to train other models (such as conformer, transformer, etc.).

-## Overall Introduction
+## Quick Start
+
+
+
+## Introduction
 We provide a recipe `egs/aishell/paraformer/run.sh` for training a paraformer model on AISHELL-1 dataset. This recipe consists of five stages, supporting training on multiple GPUs and decoding by CPU or GPU. Before introducing each stage in detail, we first explain several parameters which should be set by users.
 - `CUDA_VISIBLE_DEVICES`: visible gpu list
 - `gpu_num`: the number of GPUs used for training
@ -14,7 +18,7 @@ We provide a recipe `egs/aishell/paraformer/run.sh` for training a paraformer mo
 - `exp_dir`: the path for saving experimental results
 - `tag`: the suffix of experimental result directory

-## Stage 0: Data preparation
+### Stage 0: Data preparation
 This stage processes raw AISHELL-1 dataset `$raw_data` and generates the corresponding `wav.scp` and `text` in `$feats_dir/data/xxx`. `xxx` means `train/dev/test`. Here we assume users have already downloaded AISHELL-1 dataset. If not, users can download data [here](https://www.openslr.org/33/) and set the path for `$raw_data`. The examples of `wav.scp` and `text` are as follows:
 * `wav.scp`
 ```
@ -32,10 +36,10 @@ BAC009S0002W0124 自 六 月 底 呼 和 浩 特 市 率 先 宣 布 取 消 限
 ```
 These two files both have two columns, while the first column is wav ids and the second column is the corresponding wav paths/label tokens.

-## Stage 1: Feature and CMVN Generation
+### Stage 1: Feature and CMVN Generation
 This stage computes CMVN based on `train` dataset, which is used in the following stages. Users can set `nj` to control the number of jobs for computing CMVN. The generated CMVN file is saved as `$feats_dir/data/train/cmvn/cmvn.mvn`.

-## Stage 2: Dictionary Preparation
+### Stage 2: Dictionary Preparation
 This stage processes the dictionary, which is used as a mapping between label characters and integer indices during ASR training. The processed dictionary file is saved as `$feats_dir/data/$lang_toekn_list/$token_type/tokens.txt`. An example of `tokens.txt` is as follows:
 * `tokens.txt`
 ```
@ -54,7 +58,7 @@ This stage processes the dictionary, which is used as a mapping between label ch
 * `</s>`: indicates the end-of-sentence token
 * `<unk>`: indicates the out-of-vocabulary token

-## Stage 3: Training
+### Stage 3: Training
 This stage achieves the training of the specified model. To start training, users should manually set `exp_dir`, `CUDA_VISIBLE_DEVICES` and `gpu_num`, which have already been explained above. By default, the best `$keep_nbest_models` checkpoints on validation dataset will be averaged to generate a better model and adopted for decoding.

 * DDP Training
@ -80,7 +84,7 @@ Users can use tensorboard to observe the loss, learning rate, etc. Please run th
 tensorboard --logdir ${exp_dir}/exp/${model_dir}/tensorboard/train
 ```

-## Stage 4: Decoding
+### Stage 4: Decoding
 This stage generates the recognition results and calculates the `CER` to verify the performance of the trained model. 

 * Mode Selection
@ -107,3 +111,4 @@ res:    构 建 良 好 的 旅 游 市 场 环 境
 ...
 ```

+## Change settings
--- a/funasr/datasets/small_datasets/dataset.py
+++ b/funasr/datasets/small_datasets/dataset.py
@ -173,8 +173,8 @@ class ESPnetDataset(Dataset):
                        raise RuntimeError(f"{k} is duplicated ({path}:{linenum})")
                    text_loader[k] = v
            return text_loader
-        elif loader_type == "text_in":
-            text_in_loader = {}
+        elif loader_type == "text_int":
+            text_int_loader = {}
            with open(path, "r", encoding="utf-8") as f:
                for linenum, line in enumerate(f, 1):
                    sps = line.rstrip().split(maxsplit=1)
@ -182,10 +182,10 @@ class ESPnetDataset(Dataset):
                        k, v = sps[0], ""
                    else:
                        k, v = sps
-                    if k in text_in_loader:
+                    if k in text_int_loader:
                        raise RuntimeError(f"{k} is duplicated ({path}:{linenum})")
-                    text_in_loader[k] = [int(i) for i in v.split()]
-            return text_in_loader
+                    text_int_loader[k] = [int(i) for i in v.split()]
+            return text_int_loader
        else:
            raise RuntimeError(f"Not supported: loader_type={loader_type}")