mirror of
https://github.com/modelscope/FunASR
synced 2025-09-15 14:48:36 +08:00
update m2met docs
This commit is contained in:
parent
97ed4fada4
commit
5a48c1cb7f
@ -1,5 +1,5 @@
|
||||
# Organizers
|
||||
***Lei Xie, Professor, Northwestern Polytechnical University, China***
|
||||
***Lei Xie, Professor, AISHELL foundation, China***
|
||||
|
||||
Email: [lxie@nwpu.edu.cn](mailto:lxie@nwpu.edu.cn)
|
||||
|
||||
|
||||
Binary file not shown.
Binary file not shown.
@ -124,7 +124,7 @@
|
||||
|
||||
<section id="organizers">
|
||||
<h1>Organizers<a class="headerlink" href="#organizers" title="Permalink to this heading">¶</a></h1>
|
||||
<p><em><strong>Lei Xie, Professor, Northwestern Polytechnical University, China</strong></em></p>
|
||||
<p><em><strong>Lei Xie, Professor, AISHELL foundation, China</strong></em></p>
|
||||
<p>Email: <a class="reference external" href="mailto:lxie%40nwpu.edu.cn">lxie<span>@</span>nwpu<span>.</span>edu<span>.</span>cn</a></p>
|
||||
<a class="reference internal image-reference" href="_images/lxie.jpeg"><img alt="lxie" src="_images/lxie.jpeg" style="width: 20%;" /></a>
|
||||
<p><em><strong>Kong Aik Lee, Senior Scientist at Institute for Infocomm Research, A*Star, Singapore</strong></em></p>
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
# Organizers
|
||||
***Lei Xie, Professor, Northwestern Polytechnical University, China***
|
||||
***Lei Xie, Professor, AISHELL foundation, China***
|
||||
|
||||
Email: [lxie@nwpu.edu.cn](mailto:lxie@nwpu.edu.cn)
|
||||
|
||||
|
||||
File diff suppressed because one or more lines are too long
Binary file not shown.
Binary file not shown.
@ -4,7 +4,7 @@
|
||||
|
||||
为了推动会议场景语音识别的发展,已经有很多相关的挑战赛,如 Rich Transcription evaluation 和 CHIME(Computational Hearing in Multisource Environments) 挑战赛。最新的CHIME挑战赛关注于远距离自动语音识别和开发能在各种不同拓扑结构的阵列和应用场景中通用的系统。然而不同语言之间的差异限制了非英语会议转录的进展。MISP(Multimodal Information Based Speech Processing)和M2MeT(Multi-Channel Multi-Party Meeting Transcription)挑战赛为推动普通话会议场景语音识别做出了贡献。MISP挑战赛侧重于用视听多模态的方法解决日常家庭环境中的远距离多麦克风信号处理问题,而M2MeT挑战则侧重于解决离线会议室中会议转录的语音重叠问题。
|
||||
|
||||
ASSP2022 M2MeT挑战的侧重点是会议场景,它包括两个赛道:说话人日记和多说话人自动语音识别。前者涉及识别“谁在什么时候说了话”,而后者旨在同时识别来自多个说话人的语音,语音重叠和各种噪声带来了巨大的技术困难。
|
||||
IASSP2022 M2MeT挑战的侧重点是会议场景,它包括两个赛道:说话人日记和多说话人自动语音识别。前者涉及识别“谁在什么时候说了话”,而后者旨在同时识别来自多个说话人的语音,语音重叠和各种噪声带来了巨大的技术困难。
|
||||
|
||||
在上一届M2MET成功举办的基础上,我们将在ASRU2023上继续举办M2MET2.0挑战赛。在上一届M2MET挑战赛中,评估指标是说话人无关的,我们只能得到识别文本,而不能确定相应的说话人。
|
||||
为了解决这一局限性并将现在的多说话人语音识别系统推向实用化,M2MET2.0挑战赛将在说话人相关的人物上评估,并且同时设立限定数据与不限定数据两个子赛道。通过将语音归属于特定的说话人,这项任务旨在提高多说话人ASR系统在真实世界环境中的准确性和适用性。
|
||||
|
||||
@ -1 +1 @@
|
||||
Search.setIndex({"docnames": ["index", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "filenames": ["index.rst", "\u57fa\u7ebf.md", "\u6570\u636e\u96c6.md", "\u7b80\u4ecb.md", "\u7ec4\u59d4\u4f1a.md", "\u8054\u7cfb\u65b9\u5f0f.md", "\u89c4\u5219.md", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30.md"], "titles": ["ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u7ade\u8d5b\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "terms": {"m2met": [0, 3, 5, 7], "asru2023": [0, 3], "m2met2": [0, 3, 5, 7], "funasr": 1, "sa": 1, "asr": [1, 3, 7], "speakerencod": 1, "modelscop": [1, 7], "todo": 1, "fill": 1, "with": 1, "the": 1, "readm": 1, "md": 1, "of": 1, "baselin": [1, 2], "aishel": [2, 7], "cn": [2, 4, 7], "celeb": [2, 7], "test": [2, 6, 7], "2023": [2, 3, 6, 7], "118": 2, "75": 2, "104": 2, "train": 2, "eval": [2, 6], "10": [2, 3, 7], "212": 2, "15": [2, 3], "30": 2, "456": 2, "25": 2, "13": [2, 3], "55": 2, "42": 2, "27": 2, "34": 2, "76": 2, "20": 2, "textgrid": 2, "id": 2, "openslr": 2, "automat": 3, "speech": 3, "recognit": 3, "speaker": 3, "diariz": 3, "rich": 3, "transcript": 3, "evalu": 3, "chime": 3, "comput": 3, "hear": 3, "in": 3, "multisourc": 3, "environ": 3, "misp": 3, "multimod": 3, "inform": 3, "base": 3, "process": 3, "multi": 3, "channel": 3, "parti": 3, "meet": 3, "assp2022": 3, "29": 3, "19": 3, "12": 3, "asru": 3, "workshop": 3, "challeng": 3, "session": 3, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "star": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yzj": 4, "alibaba": 4, "inc": 4, "com": [4, 5], "sli": 4, "zsl": 4, "yanminqian": 4, "sjtu": 4, "zhuc": 4, "microsoft": 4, "wujian": 4, "ceo": 4, "buhui": 4, "aishelldata": 4, "alimeet": [5, 7], "gmail": 5, "cpcer": [6, 7], "las": 6, "rnnt": 6, "transform": 6, "aishell4": 7, "vad": 7, "cer": 7, "ins": 7, "sub": 7, "del": 7, "text": 7, "frac": 7, "mathcal": 7, "n_": 7, "total": 7, "time": 7, "100": 7, "hug": 7, "face": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"asru": 0, "2023": 0, "alimeet": 2, "aoe": 3}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0": [[0, "asru-2023-2-0"]], "\u76ee\u5f55:": [[0, null]], "\u57fa\u7ebf": [[1, "id1"]], "\u57fa\u7ebf\u6982\u8ff0": [[1, "id2"]], "\u5feb\u901f\u5f00\u59cb": [[1, "id3"]], "\u57fa\u7ebf\u7ed3\u679c": [[1, "id4"]], "\u6570\u636e\u96c6": [[2, "id1"]], "\u6570\u636e\u96c6\u6982\u8ff0": [[2, "id2"]], "Alimeeting\u6570\u636e\u96c6\u4ecb\u7ecd": [[2, "alimeeting"]], "\u83b7\u53d6\u6570\u636e": [[2, "id3"]], "\u7b80\u4ecb": [[3, "id1"]], "\u7ade\u8d5b\u4ecb\u7ecd": [[3, "id2"]], "\u65f6\u95f4\u5b89\u6392(AOE\u65f6\u95f4)": [[3, "aoe"]], "\u7ade\u8d5b\u62a5\u540d": [[3, "id3"]], "\u7ec4\u59d4\u4f1a": [[4, "id1"]], "\u8054\u7cfb\u65b9\u5f0f": [[5, "id1"]], "\u7ade\u8d5b\u89c4\u5219": [[6, "id1"]], "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30": [[7, "id1"]], "\u8bf4\u8bdd\u4eba\u76f8\u5173\u7684\u8bed\u97f3\u8bc6\u522b": [[7, "id2"]], "\u8bc4\u4f30\u65b9\u6cd5": [[7, "id3"]], "\u5b50\u8d5b\u9053\u8bbe\u7f6e": [[7, "id4"]], "\u5b50\u8d5b\u9053\u4e00 (\u9650\u5b9a\u8bad\u7ec3\u6570\u636e):": [[7, "id5"]], "\u5b50\u8d5b\u9053\u4e8c (\u5f00\u653e\u8bad\u7ec3\u6570\u636e):": [[7, "id6"]]}, "indexentries": {}})
|
||||
Search.setIndex({"docnames": ["index", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "filenames": ["index.rst", "\u57fa\u7ebf.md", "\u6570\u636e\u96c6.md", "\u7b80\u4ecb.md", "\u7ec4\u59d4\u4f1a.md", "\u8054\u7cfb\u65b9\u5f0f.md", "\u89c4\u5219.md", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30.md"], "titles": ["ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u7ade\u8d5b\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "terms": {"m2met": [0, 3, 5, 7], "asru2023": [0, 3], "m2met2": [0, 3, 5, 7], "funasr": 1, "sa": 1, "asr": [1, 3, 7], "speakerencod": 1, "modelscop": [1, 7], "todo": 1, "fill": 1, "with": 1, "the": 1, "readm": 1, "md": 1, "of": 1, "baselin": [1, 2], "aishel": [2, 7], "cn": [2, 4, 7], "celeb": [2, 7], "test": [2, 6, 7], "2023": [2, 3, 6, 7], "118": 2, "75": 2, "104": 2, "train": 2, "eval": [2, 6], "10": [2, 3, 7], "212": 2, "15": [2, 3], "30": 2, "456": 2, "25": 2, "13": [2, 3], "55": 2, "42": 2, "27": 2, "34": 2, "76": 2, "20": 2, "textgrid": 2, "id": 2, "openslr": 2, "automat": 3, "speech": 3, "recognit": 3, "speaker": 3, "diariz": 3, "rich": 3, "transcript": 3, "evalu": 3, "chime": 3, "comput": 3, "hear": 3, "in": 3, "multisourc": 3, "environ": 3, "misp": 3, "multimod": 3, "inform": 3, "base": 3, "process": 3, "multi": 3, "channel": 3, "parti": 3, "meet": 3, "iassp2022": 3, "29": 3, "19": 3, "12": 3, "asru": 3, "workshop": 3, "challeng": 3, "session": 3, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "star": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yzj": 4, "alibaba": 4, "inc": 4, "com": [4, 5], "sli": 4, "zsl": 4, "yanminqian": 4, "sjtu": 4, "zhuc": 4, "microsoft": 4, "wujian": 4, "ceo": 4, "buhui": 4, "aishelldata": 4, "alimeet": [5, 7], "gmail": 5, "cpcer": [6, 7], "las": 6, "rnnt": 6, "transform": 6, "aishell4": 7, "vad": 7, "cer": 7, "ins": 7, "sub": 7, "del": 7, "text": 7, "frac": 7, "mathcal": 7, "n_": 7, "total": 7, "time": 7, "100": 7, "hug": 7, "face": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"asru": 0, "2023": 0, "alimeet": 2, "aoe": 3}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0": [[0, "asru-2023-2-0"]], "\u76ee\u5f55:": [[0, null]], "\u57fa\u7ebf": [[1, "id1"]], "\u57fa\u7ebf\u6982\u8ff0": [[1, "id2"]], "\u5feb\u901f\u5f00\u59cb": [[1, "id3"]], "\u57fa\u7ebf\u7ed3\u679c": [[1, "id4"]], "\u6570\u636e\u96c6": [[2, "id1"]], "\u6570\u636e\u96c6\u6982\u8ff0": [[2, "id2"]], "Alimeeting\u6570\u636e\u96c6\u4ecb\u7ecd": [[2, "alimeeting"]], "\u83b7\u53d6\u6570\u636e": [[2, "id3"]], "\u7b80\u4ecb": [[3, "id1"]], "\u7ade\u8d5b\u4ecb\u7ecd": [[3, "id2"]], "\u65f6\u95f4\u5b89\u6392(AOE\u65f6\u95f4)": [[3, "aoe"]], "\u7ade\u8d5b\u62a5\u540d": [[3, "id3"]], "\u7ec4\u59d4\u4f1a": [[4, "id1"]], "\u8054\u7cfb\u65b9\u5f0f": [[5, "id1"]], "\u7ade\u8d5b\u89c4\u5219": [[6, "id1"]], "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30": [[7, "id1"]], "\u8bf4\u8bdd\u4eba\u76f8\u5173\u7684\u8bed\u97f3\u8bc6\u522b": [[7, "id2"]], "\u8bc4\u4f30\u65b9\u6cd5": [[7, "id3"]], "\u5b50\u8d5b\u9053\u8bbe\u7f6e": [[7, "id4"]], "\u5b50\u8d5b\u9053\u4e00 (\u9650\u5b9a\u8bad\u7ec3\u6570\u636e):": [[7, "id5"]], "\u5b50\u8d5b\u9053\u4e8c (\u5f00\u653e\u8bad\u7ec3\u6570\u636e):": [[7, "id6"]]}, "indexentries": {}})
|
||||
@ -130,7 +130,7 @@
|
||||
<h2>竞赛介绍<a class="headerlink" href="#id2" title="此标题的永久链接">¶</a></h2>
|
||||
<p>语音识别(Automatic Speech Recognition)、说话人日志(Speaker Diarization)等语音处理技术的最新发展激发了众多智能语音的广泛应用。然而会议场景由于其复杂的声学条件和不同的讲话风格,包括重叠的讲话、不同数量的发言者、大会议室的远场信号以及环境噪声和混响,仍然属于一项极具挑战性的任务。</p>
|
||||
<p>为了推动会议场景语音识别的发展,已经有很多相关的挑战赛,如 Rich Transcription evaluation 和 CHIME(Computational Hearing in Multisource Environments) 挑战赛。最新的CHIME挑战赛关注于远距离自动语音识别和开发能在各种不同拓扑结构的阵列和应用场景中通用的系统。然而不同语言之间的差异限制了非英语会议转录的进展。MISP(Multimodal Information Based Speech Processing)和M2MeT(Multi-Channel Multi-Party Meeting Transcription)挑战赛为推动普通话会议场景语音识别做出了贡献。MISP挑战赛侧重于用视听多模态的方法解决日常家庭环境中的远距离多麦克风信号处理问题,而M2MeT挑战则侧重于解决离线会议室中会议转录的语音重叠问题。</p>
|
||||
<p>ASSP2022 M2MeT挑战的侧重点是会议场景,它包括两个赛道:说话人日记和多说话人自动语音识别。前者涉及识别“谁在什么时候说了话”,而后者旨在同时识别来自多个说话人的语音,语音重叠和各种噪声带来了巨大的技术困难。</p>
|
||||
<p>IASSP2022 M2MeT挑战的侧重点是会议场景,它包括两个赛道:说话人日记和多说话人自动语音识别。前者涉及识别“谁在什么时候说了话”,而后者旨在同时识别来自多个说话人的语音,语音重叠和各种噪声带来了巨大的技术困难。</p>
|
||||
<p>在上一届M2MET成功举办的基础上,我们将在ASRU2023上继续举办M2MET2.0挑战赛。在上一届M2MET挑战赛中,评估指标是说话人无关的,我们只能得到识别文本,而不能确定相应的说话人。
|
||||
为了解决这一局限性并将现在的多说话人语音识别系统推向实用化,M2MET2.0挑战赛将在说话人相关的人物上评估,并且同时设立限定数据与不限定数据两个子赛道。通过将语音归属于特定的说话人,这项任务旨在提高多说话人ASR系统在真实世界环境中的准确性和适用性。
|
||||
我们对数据集、规则、基线系统和评估方法进行了详细介绍,以进一步促进多说话人语音识别领域研究的发展。此外,我们将根据时间表发布一个全新的测试集,包括大约10小时的音频。</p>
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
|
||||
为了推动会议场景语音识别的发展,已经有很多相关的挑战赛,如 Rich Transcription evaluation 和 CHIME(Computational Hearing in Multisource Environments) 挑战赛。最新的CHIME挑战赛关注于远距离自动语音识别和开发能在各种不同拓扑结构的阵列和应用场景中通用的系统。然而不同语言之间的差异限制了非英语会议转录的进展。MISP(Multimodal Information Based Speech Processing)和M2MeT(Multi-Channel Multi-Party Meeting Transcription)挑战赛为推动普通话会议场景语音识别做出了贡献。MISP挑战赛侧重于用视听多模态的方法解决日常家庭环境中的远距离多麦克风信号处理问题,而M2MeT挑战则侧重于解决离线会议室中会议转录的语音重叠问题。
|
||||
|
||||
ASSP2022 M2MeT挑战的侧重点是会议场景,它包括两个赛道:说话人日记和多说话人自动语音识别。前者涉及识别“谁在什么时候说了话”,而后者旨在同时识别来自多个说话人的语音,语音重叠和各种噪声带来了巨大的技术困难。
|
||||
IASSP2022 M2MeT挑战的侧重点是会议场景,它包括两个赛道:说话人日记和多说话人自动语音识别。前者涉及识别“谁在什么时候说了话”,而后者旨在同时识别来自多个说话人的语音,语音重叠和各种噪声带来了巨大的技术困难。
|
||||
|
||||
在上一届M2MET成功举办的基础上,我们将在ASRU2023上继续举办M2MET2.0挑战赛。在上一届M2MET挑战赛中,评估指标是说话人无关的,我们只能得到识别文本,而不能确定相应的说话人。
|
||||
为了解决这一局限性并将现在的多说话人语音识别系统推向实用化,M2MET2.0挑战赛将在说话人相关的人物上评估,并且同时设立限定数据与不限定数据两个子赛道。通过将语音归属于特定的说话人,这项任务旨在提高多说话人ASR系统在真实世界环境中的准确性和适用性。
|
||||
|
||||
Loading…
Reference in New Issue
Block a user