FunASR/funasr/frontends/whisper_frontend.py
zhifu gao d19f48e174
Dev gzf exp (#1593)
* update

* update with main (#1582)

* update

* Expose the max_end_silence_time to the user (#1532)

* update

* update

* update

* update

* update

* update

* update

* update

* update

* finetune

* finetune

* finetune

* finetune

* finetune

* finetune

* fix: resolve IndexError when using spk model and the audio contains only 1 segment (#1535)

* install requirements automatically

* install requirements automatically

* install requirements automatically

* install requirements automatically

* install requirements automatically

* install requirements automatically

* install requirements automatically

* v1.0.19

* v1.0.19

* train

* train

* docs

* update

* update

* update

* update

* update

* update

* update

* train update

* bugfix seg_dict_file

* bugfix seg_dict_file

* train

* train

* train (#1548)

* Dev gzf new (#1551)

* train

* train

* <funasr>: <punc online> (#1552)

1.修正添加标点时英文首单词和第二个单词被错误合并的问题。

Co-authored-by: carl.che <carl.che@cloudminds.com>

* Dev gzf new (#1553)

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1554)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1555)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* 修正commit 87b62d6895 引入的英文整句标点预测导致末尾两个单词中间的空格被删除的问题。 (#1556)

* <funasr>: <punc online>

1.修正添加标点时英文首单词和第二个单词被错误合并的问题。

* <funasr>: <punc online>

1.修正commit 87b62d6895 引入的英文整句标点预测导致末尾两个单词中间的空格被删除的问题。

---------

Co-authored-by: carl.che <carl.che@cloudminds.com>

* Dev gzf new (#1557)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1559)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1561)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1562)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1567)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* whisper_lib for sense voice

* aishell recipe

* sense voice (#1568)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* whisper_lib for sense voice

* aishell recipe

* sense voice

* Dev gzf new (#1574)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* whisper_lib for sense voice

* aishell recipe

* sense voice

* docs

* bugfix (#1580)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* whisper_lib for sense voice

* aishell recipe

* sense voice

* docs

* bugfix

* v1.0.20

---------

Co-authored-by: BOBOTANG <tzfjobmail@gmail.com>
Co-authored-by: Atomie CHEN <atomic_cwh@163.com>
Co-authored-by: Carl <415692979@qq.com>
Co-authored-by: carl.che <carl.che@cloudminds.com>

* ctc

* ctc

* ctc

* ctc

* update with main (#1592)

* update

* Expose the max_end_silence_time to the user (#1532)

* update

* update

* update

* update

* update

* update

* update

* update

* update

* finetune

* finetune

* finetune

* finetune

* finetune

* finetune

* fix: resolve IndexError when using spk model and the audio contains only 1 segment (#1535)

* install requirements automatically

* install requirements automatically

* install requirements automatically

* install requirements automatically

* install requirements automatically

* install requirements automatically

* install requirements automatically

* v1.0.19

* v1.0.19

* train

* train

* docs

* update

* update

* update

* update

* update

* update

* update

* train update

* bugfix seg_dict_file

* bugfix seg_dict_file

* train

* train

* train (#1548)

* Dev gzf new (#1551)

* train

* train

* <funasr>: <punc online> (#1552)

1.修正添加标点时英文首单词和第二个单词被错误合并的问题。

Co-authored-by: carl.che <carl.che@cloudminds.com>

* Dev gzf new (#1553)

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1554)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1555)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* 修正commit 87b62d6895 引入的英文整句标点预测导致末尾两个单词中间的空格被删除的问题。 (#1556)

* <funasr>: <punc online>

1.修正添加标点时英文首单词和第二个单词被错误合并的问题。

* <funasr>: <punc online>

1.修正commit 87b62d6895 引入的英文整句标点预测导致末尾两个单词中间的空格被删除的问题。

---------

Co-authored-by: carl.che <carl.che@cloudminds.com>

* Dev gzf new (#1557)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1559)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1561)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1562)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* Dev gzf new (#1567)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* whisper_lib for sense voice

* aishell recipe

* sense voice (#1568)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* whisper_lib for sense voice

* aishell recipe

* sense voice

* Dev gzf new (#1574)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* whisper_lib for sense voice

* aishell recipe

* sense voice

* docs

* bugfix (#1580)

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* train

* whisper_lib for sense voice

* aishell recipe

* sense voice

* docs

* bugfix

* v1.0.20

* update demo page (#1585)

* commit web page vue

* optimize web page

* optimize web page

* remove other private component

* modify web page

* Update index.vue

* Update lxwjzxfw.vue

* Update sstx.vue

* update static file

---------

Co-authored-by: BOBOTANG <tzfjobmail@gmail.com>
Co-authored-by: Atomie CHEN <atomic_cwh@163.com>
Co-authored-by: Carl <415692979@qq.com>
Co-authored-by: carl.che <carl.che@cloudminds.com>
Co-authored-by: bltcn <blt@tom.com>

* sensevoice

* sensevoice

---------

Co-authored-by: BOBOTANG <tzfjobmail@gmail.com>
Co-authored-by: Atomie CHEN <atomic_cwh@163.com>
Co-authored-by: Carl <415692979@qq.com>
Co-authored-by: carl.che <carl.che@cloudminds.com>
Co-authored-by: bltcn <blt@tom.com>
2024-04-08 18:51:53 +08:00

116 lines
3.7 KiB
Python

from typing import Tuple
import torch
import torch.nn as nn
from funasr.register import tables
from torch.nn.utils.rnn import pad_sequence
@tables.register("frontend_classes", "WhisperFrontend")
class WhisperFrontend(nn.Module):
"""Speech Representation Using Encoder Outputs from OpenAI's Whisper Model:
URL: https://github.com/openai/whisper
"""
def __init__(
self,
fs: int = 16000,
whisper_model: str = None,
do_pad_trim: bool = True,
n_mels: int = 80,
permute: bool = False,
**kwargs,
):
super().__init__()
assert fs == 16000
self.fs = fs
import whisper
from whisper.audio import HOP_LENGTH, N_FFT, N_SAMPLES
self.n_fft = N_FFT
self.win_length = N_FFT
self.hop_length = HOP_LENGTH
self.pad_samples = N_SAMPLES
self.frame_shift = self.hop_length
self.lfr_n = 1
self.n_mels = n_mels
if whisper_model == "large-v3" or whisper_model == "large":
self.n_mels = 128
filters_path = kwargs.get("filters_path", None)
self.filters_path = filters_path
if filters_path is not None:
from funasr.models.sense_voice.whisper_lib.audio import mel_filters
self.mel_filters = mel_filters
else:
self.mel_filters = whisper.audio.mel_filters
self.do_pad_trim = do_pad_trim
if do_pad_trim:
self.pad_or_trim = whisper.pad_or_trim
self.permute = permute
# assert whisper_model in whisper.available_models()
def output_size(self) -> int:
return self.n_mels
def log_mel_spectrogram(
self,
audio: torch.Tensor,
ilens: torch.Tensor = None,
) -> torch.Tensor:
window = torch.hann_window(self.win_length).to(audio.device)
stft = torch.stft(
audio, self.n_fft, self.hop_length, window=window, return_complex=True
)
# whisper deletes the last frame by default (Shih-Lun)
magnitudes = stft[..., :-1].abs() ** 2
if self.filters_path is not None:
filters = self.mel_filters(audio.device, self.n_mels, self.filters_path)
else:
filters = self.mel_filters(audio.device, self.n_mels)
mel_spec = filters @ magnitudes
log_spec = torch.clamp(mel_spec, min=1e-10).log10()
if ilens is not None:
olens = ilens // self.hop_length
else:
olens = None
log_spec = torch.maximum(
log_spec,
log_spec.view(audio.size(0), -1).max(dim=-1)[0][:, None, None] - 8.0,
)
log_spec = (log_spec + 4.0) / 4.0
return log_spec, olens
def forward(
self, input: torch.Tensor, input_lengths: torch.Tensor, **kwargs,
) -> Tuple[torch.Tensor, torch.Tensor]:
batch_size = input.size(0)
feats = []
feats_lens = []
input = input.to(torch.float32)
for i in range(batch_size):
if self.do_pad_trim:
feat = self.pad_or_trim(input[i], self.pad_samples)
else:
feat = input[i]
feat, feat_len = self.log_mel_spectrogram(feat[None, :], input_lengths[0])
feats.append(feat[0])
feats_lens.append(feat_len)
feats_lens = torch.as_tensor(feats_lens)
if batch_size == 1:
feats_pad = feats[0][None, :, :]
else:
feats_pad = pad_sequence(feats,
batch_first=True,
padding_value=0.0)
if self.permute:
feats_pad = feats_pad.permute(0, 2, 1)
return feats_pad, feats_lens