mirror of
https://github.com/modelscope/FunASR
synced 2025-09-15 14:48:36 +08:00
* update * update with main (#1582) * update * Expose the max_end_silence_time to the user (#1532) * update * update * update * update * update * update * update * update * update * finetune * finetune * finetune * finetune * finetune * finetune * fix: resolve IndexError when using spk model and the audio contains only 1 segment (#1535) * install requirements automatically * install requirements automatically * install requirements automatically * install requirements automatically * install requirements automatically * install requirements automatically * install requirements automatically * v1.0.19 * v1.0.19 * train * train * docs * update * update * update * update * update * update * update * train update * bugfix seg_dict_file * bugfix seg_dict_file * train * train * train (#1548) * Dev gzf new (#1551) * train * train * <funasr>: <punc online> (#1552) 1.修正添加标点时英文首单词和第二个单词被错误合并的问题。 Co-authored-by: carl.che <carl.che@cloudminds.com> * Dev gzf new (#1553) * train * train * train * train * train * train * train * train * Dev gzf new (#1554) * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1555) * train * train * train * train * train * train * train * train * train * train * train * train * train * 修正commit87b62d6895引入的英文整句标点预测导致末尾两个单词中间的空格被删除的问题。 (#1556) * <funasr>: <punc online> 1.修正添加标点时英文首单词和第二个单词被错误合并的问题。 * <funasr>: <punc online> 1.修正commit87b62d6895引入的英文整句标点预测导致末尾两个单词中间的空格被删除的问题。 --------- Co-authored-by: carl.che <carl.che@cloudminds.com> * Dev gzf new (#1557) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1559) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1561) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1562) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1567) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * whisper_lib for sense voice * aishell recipe * sense voice (#1568) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * whisper_lib for sense voice * aishell recipe * sense voice * Dev gzf new (#1574) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * whisper_lib for sense voice * aishell recipe * sense voice * docs * bugfix (#1580) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * whisper_lib for sense voice * aishell recipe * sense voice * docs * bugfix * v1.0.20 --------- Co-authored-by: BOBOTANG <tzfjobmail@gmail.com> Co-authored-by: Atomie CHEN <atomic_cwh@163.com> Co-authored-by: Carl <415692979@qq.com> Co-authored-by: carl.che <carl.che@cloudminds.com> * ctc * ctc * ctc * ctc * update with main (#1592) * update * Expose the max_end_silence_time to the user (#1532) * update * update * update * update * update * update * update * update * update * finetune * finetune * finetune * finetune * finetune * finetune * fix: resolve IndexError when using spk model and the audio contains only 1 segment (#1535) * install requirements automatically * install requirements automatically * install requirements automatically * install requirements automatically * install requirements automatically * install requirements automatically * install requirements automatically * v1.0.19 * v1.0.19 * train * train * docs * update * update * update * update * update * update * update * train update * bugfix seg_dict_file * bugfix seg_dict_file * train * train * train (#1548) * Dev gzf new (#1551) * train * train * <funasr>: <punc online> (#1552) 1.修正添加标点时英文首单词和第二个单词被错误合并的问题。 Co-authored-by: carl.che <carl.che@cloudminds.com> * Dev gzf new (#1553) * train * train * train * train * train * train * train * train * Dev gzf new (#1554) * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1555) * train * train * train * train * train * train * train * train * train * train * train * train * train * 修正commit87b62d6895引入的英文整句标点预测导致末尾两个单词中间的空格被删除的问题。 (#1556) * <funasr>: <punc online> 1.修正添加标点时英文首单词和第二个单词被错误合并的问题。 * <funasr>: <punc online> 1.修正commit87b62d6895引入的英文整句标点预测导致末尾两个单词中间的空格被删除的问题。 --------- Co-authored-by: carl.che <carl.che@cloudminds.com> * Dev gzf new (#1557) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1559) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1561) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1562) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * Dev gzf new (#1567) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * whisper_lib for sense voice * aishell recipe * sense voice (#1568) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * whisper_lib for sense voice * aishell recipe * sense voice * Dev gzf new (#1574) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * whisper_lib for sense voice * aishell recipe * sense voice * docs * bugfix (#1580) * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * train * whisper_lib for sense voice * aishell recipe * sense voice * docs * bugfix * v1.0.20 * update demo page (#1585) * commit web page vue * optimize web page * optimize web page * remove other private component * modify web page * Update index.vue * Update lxwjzxfw.vue * Update sstx.vue * update static file --------- Co-authored-by: BOBOTANG <tzfjobmail@gmail.com> Co-authored-by: Atomie CHEN <atomic_cwh@163.com> Co-authored-by: Carl <415692979@qq.com> Co-authored-by: carl.che <carl.che@cloudminds.com> Co-authored-by: bltcn <blt@tom.com> * sensevoice * sensevoice --------- Co-authored-by: BOBOTANG <tzfjobmail@gmail.com> Co-authored-by: Atomie CHEN <atomic_cwh@163.com> Co-authored-by: Carl <415692979@qq.com> Co-authored-by: carl.che <carl.che@cloudminds.com> Co-authored-by: bltcn <blt@tom.com>
116 lines
3.7 KiB
Python
116 lines
3.7 KiB
Python
from typing import Tuple
|
|
import torch
|
|
import torch.nn as nn
|
|
|
|
|
|
from funasr.register import tables
|
|
from torch.nn.utils.rnn import pad_sequence
|
|
|
|
|
|
@tables.register("frontend_classes", "WhisperFrontend")
|
|
class WhisperFrontend(nn.Module):
|
|
"""Speech Representation Using Encoder Outputs from OpenAI's Whisper Model:
|
|
|
|
URL: https://github.com/openai/whisper
|
|
"""
|
|
|
|
def __init__(
|
|
self,
|
|
fs: int = 16000,
|
|
whisper_model: str = None,
|
|
do_pad_trim: bool = True,
|
|
n_mels: int = 80,
|
|
permute: bool = False,
|
|
**kwargs,
|
|
):
|
|
super().__init__()
|
|
assert fs == 16000
|
|
self.fs = fs
|
|
import whisper
|
|
from whisper.audio import HOP_LENGTH, N_FFT, N_SAMPLES
|
|
self.n_fft = N_FFT
|
|
self.win_length = N_FFT
|
|
self.hop_length = HOP_LENGTH
|
|
self.pad_samples = N_SAMPLES
|
|
self.frame_shift = self.hop_length
|
|
self.lfr_n = 1
|
|
self.n_mels = n_mels
|
|
if whisper_model == "large-v3" or whisper_model == "large":
|
|
self.n_mels = 128
|
|
|
|
filters_path = kwargs.get("filters_path", None)
|
|
self.filters_path = filters_path
|
|
if filters_path is not None:
|
|
from funasr.models.sense_voice.whisper_lib.audio import mel_filters
|
|
self.mel_filters = mel_filters
|
|
else:
|
|
self.mel_filters = whisper.audio.mel_filters
|
|
self.do_pad_trim = do_pad_trim
|
|
if do_pad_trim:
|
|
self.pad_or_trim = whisper.pad_or_trim
|
|
self.permute = permute
|
|
|
|
# assert whisper_model in whisper.available_models()
|
|
|
|
def output_size(self) -> int:
|
|
return self.n_mels
|
|
|
|
def log_mel_spectrogram(
|
|
self,
|
|
audio: torch.Tensor,
|
|
ilens: torch.Tensor = None,
|
|
) -> torch.Tensor:
|
|
window = torch.hann_window(self.win_length).to(audio.device)
|
|
stft = torch.stft(
|
|
audio, self.n_fft, self.hop_length, window=window, return_complex=True
|
|
)
|
|
|
|
# whisper deletes the last frame by default (Shih-Lun)
|
|
magnitudes = stft[..., :-1].abs() ** 2
|
|
if self.filters_path is not None:
|
|
filters = self.mel_filters(audio.device, self.n_mels, self.filters_path)
|
|
else:
|
|
filters = self.mel_filters(audio.device, self.n_mels)
|
|
mel_spec = filters @ magnitudes
|
|
|
|
log_spec = torch.clamp(mel_spec, min=1e-10).log10()
|
|
|
|
if ilens is not None:
|
|
olens = ilens // self.hop_length
|
|
else:
|
|
olens = None
|
|
|
|
log_spec = torch.maximum(
|
|
log_spec,
|
|
log_spec.view(audio.size(0), -1).max(dim=-1)[0][:, None, None] - 8.0,
|
|
)
|
|
log_spec = (log_spec + 4.0) / 4.0
|
|
|
|
return log_spec, olens
|
|
|
|
def forward(
|
|
self, input: torch.Tensor, input_lengths: torch.Tensor, **kwargs,
|
|
) -> Tuple[torch.Tensor, torch.Tensor]:
|
|
batch_size = input.size(0)
|
|
feats = []
|
|
feats_lens = []
|
|
input = input.to(torch.float32)
|
|
for i in range(batch_size):
|
|
if self.do_pad_trim:
|
|
feat = self.pad_or_trim(input[i], self.pad_samples)
|
|
else:
|
|
feat = input[i]
|
|
feat, feat_len = self.log_mel_spectrogram(feat[None, :], input_lengths[0])
|
|
feats.append(feat[0])
|
|
feats_lens.append(feat_len)
|
|
feats_lens = torch.as_tensor(feats_lens)
|
|
|
|
if batch_size == 1:
|
|
feats_pad = feats[0][None, :, :]
|
|
else:
|
|
feats_pad = pad_sequence(feats,
|
|
batch_first=True,
|
|
padding_value=0.0)
|
|
if self.permute:
|
|
feats_pad = feats_pad.permute(0, 2, 1)
|
|
return feats_pad, feats_lens |