From 40f1f529b0ebc7a571af6be3ca9df67cd5480d66 Mon Sep 17 00:00:00 2001 From: yhliang <429259365@qq.com> Date: Thu, 20 Apr 2023 16:48:26 +0800 Subject: [PATCH] add m2met2 registration form --- docs_m2met2/Dataset.md | 2 +- docs_m2met2/Introduction.md | 4 +++- docs_m2met2/Track_setting_and_evaluation.md | 4 ++-- docs_m2met2/_build/doctrees/Dataset.doctree | Bin 14369 -> 14409 bytes .../_build/doctrees/Introduction.doctree | Bin 14733 -> 15176 bytes .../Track_setting_and_evaluation.doctree | Bin 12729 -> 12655 bytes .../_build/doctrees/environment.pickle | Bin 25953 -> 25894 bytes docs_m2met2/_build/html/Baseline.html | 2 +- docs_m2met2/_build/html/Contact.html | 2 +- docs_m2met2/_build/html/Dataset.html | 4 ++-- docs_m2met2/_build/html/Introduction.html | 5 +++-- docs_m2met2/_build/html/Organizers.html | 2 +- docs_m2met2/_build/html/Rules.html | 2 +- .../html/Track_setting_and_evaluation.html | 8 ++++---- .../_build/html/_sources/Dataset.md.txt | 2 +- .../_build/html/_sources/Introduction.md.txt | 4 +++- .../Track_setting_and_evaluation.md.txt | 4 ++-- docs_m2met2/_build/html/genindex.html | 2 +- docs_m2met2/_build/html/index.html | 2 +- docs_m2met2/_build/html/search.html | 2 +- docs_m2met2/_build/html/searchindex.js | 2 +- .../_build/doctrees/environment.pickle | Bin 25645 -> 25266 bytes docs_m2met2_cn/_build/doctrees/数据集.doctree | Bin 12226 -> 12277 bytes docs_m2met2_cn/_build/doctrees/简介.doctree | Bin 12422 -> 12888 bytes .../_build/doctrees/赛道设置与评估.doctree | Bin 11464 -> 11286 bytes .../_build/html/_sources/数据集.md.txt | 4 ++-- .../_build/html/_sources/简介.md.txt | 2 ++ .../html/_sources/赛道设置与评估.md.txt | 6 +++--- docs_m2met2_cn/_build/html/genindex.html | 2 +- docs_m2met2_cn/_build/html/index.html | 2 +- docs_m2met2_cn/_build/html/search.html | 2 +- docs_m2met2_cn/_build/html/searchindex.js | 2 +- docs_m2met2_cn/_build/html/基线.html | 2 +- docs_m2met2_cn/_build/html/数据集.html | 6 +++--- docs_m2met2_cn/_build/html/简介.html | 3 ++- docs_m2met2_cn/_build/html/组委会.html | 2 +- docs_m2met2_cn/_build/html/联系方式.html | 2 +- docs_m2met2_cn/_build/html/规则.html | 2 +- .../_build/html/赛道设置与评估.html | 8 ++++---- docs_m2met2_cn/数据集.md | 4 ++-- docs_m2met2_cn/简介.md | 2 ++ docs_m2met2_cn/赛道设置与评估.md | 6 +++--- 42 files changed, 60 insertions(+), 50 deletions(-) diff --git a/docs_m2met2/Dataset.md b/docs_m2met2/Dataset.md index c987acac9..a897d673b 100644 --- a/docs_m2met2/Dataset.md +++ b/docs_m2met2/Dataset.md @@ -2,7 +2,7 @@ ## Overview of training data In the fixed training condition, the training dataset is restricted to three publicly available corpora, namely, AliMeeting, AISHELL-4, and CN-Celeb. To evaluate the performance of the models trained on these datasets, we will release a new Test set called Test-2023 for scoring and ranking. We will describe the AliMeeting dataset and the Test-2023 set in detail. ## Detail of AliMeeting corpus -AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train and Eval sets contain 212 and 8 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train and Eval sets is 456 and 25, respectively, with balanced gender coverage. +AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train, Eval and Test sets contain 212, 8 and 20 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train, Eval and Test sets is 456, 25 and 60, respectively, with balanced gender coverage. The dataset is collected in 13 meeting venues, which are categorized into three types: small, medium, and large rooms with sizes ranging from 8 m$^{2}$ to 55 m$^{2}$. Different rooms give us a variety of acoustic properties and layouts. The detailed parameters of each meeting venue will be released together with the Train data. The type of wall material of the meeting venues covers cement, glass, etc. Other furnishings in meeting venues include sofa, TV, blackboard, fan, air conditioner, plants, etc. During recording, the participants of the meeting sit around the microphone array which is placed on the table and conduct a natural conversation. The microphone-speaker distance ranges from 0.3 m to 5.0 m. All participants are native Chinese speakers speaking Mandarin without strong accents. During the meeting, various kinds of indoor noise including but not limited to clicking, keyboard, door opening/closing, fan, bubble noise, etc., are made naturally. For both Train and Eval sets, the participants are required to remain in the same position during recording. There is no speaker overlap between the Train and Eval set. An example of the recording venue from the Train set is shown in Fig 1. diff --git a/docs_m2met2/Introduction.md b/docs_m2met2/Introduction.md index e1f9fc792..b7f783ebb 100644 --- a/docs_m2met2/Introduction.md +++ b/docs_m2met2/Introduction.md @@ -20,6 +20,8 @@ Building on the success of the previous M2MeT challenge, we are excited to propo ## Guidelines -Interested participants, whether from academia or industry, must register for the challenge by completing a Google form, which will be available here. The deadline for registration is May 5, 2023. +Interested participants, whether from academia or industry, must register for the challenge by completing the Google form below. The deadline for registration is May 5, 2023. + +[M2MET2.0 Registration](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link) Within three working days, the challenge organizer will send email invitations to eligible teams to participate in the challenge. All qualified teams are required to adhere to the challenge rules, which will be published on the challenge page. Prior to the ranking release time, each participant must submit a system description document detailing their approach and methods. The organizer will select the top three submissions to be included in the ASRU2023 Proceedings. diff --git a/docs_m2met2/Track_setting_and_evaluation.md b/docs_m2met2/Track_setting_and_evaluation.md index b90c17af1..2b75fcac5 100644 --- a/docs_m2met2/Track_setting_and_evaluation.md +++ b/docs_m2met2/Track_setting_and_evaluation.md @@ -1,6 +1,6 @@ # Track & Evaluation -## Speaker-Attributed ASR (Main Track) -The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It's worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps. Instead, segments containing multiple speakers will be provided on the Test-2023 set, which can be obtained using a simple voice activity detection (VAD) model. +## Speaker-Attributed ASR +The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It's worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps of the Test-2023 set. Instead, segments containing multiple speakers will be provided, which can be obtained using a simple voice activity detection (VAD) model. ![task difference](images/task_diff.png) diff --git a/docs_m2met2/_build/doctrees/Dataset.doctree b/docs_m2met2/_build/doctrees/Dataset.doctree index b5d9e8bc4b457209d61d60ac1bd28030016f16d3..49021e640576be63b66749ac6e774bee263c2b2b 100644 GIT binary patch delta 172 zcmZ2jaI%1N$E1!7nav#Ghs J=7T03%mDk|IN|^R diff --git a/docs_m2met2/_build/doctrees/Introduction.doctree b/docs_m2met2/_build/doctrees/Introduction.doctree index e80c8f44a34609bd2cc3a325948348d918a4590f..d8e58afa12cef5868fdc749983cbb523f9680753 100644 GIT binary patch delta 740 zcmeAzK2gTnz%td=dLyfsCgZxv0h(E&B^jv-?)mxYIjIV1`9--3NvS#c<&&RkniG&W znEXg9qQ0ji+Ska}HN;5IK%po#J+ru^D6u3nKQC4zqokyu*h*hNB|o`XFCA=wUUGh} zKFE?{{S^IFeM2`#&w@ao;52jd5X&;h9P`Lr-BQzZi}Z+6LyKUi{IvY=phUOe45xe} zv&dWnT@RNEk8sO?K>f1J)N+t9_NAG{1-8X$@j01!*_u-_*kh+;aQ1K(rKY78rRF84 zPMJJKvvEr86pakl9#O2$gy<2&CEGd$qFp-c>SRVu&H5fTpxLEGnNxc5>EbD^kj^>}zR+d^?JLZTuYpe01qm@qZ#LI| z#>A+%S7+5nH*#G*c_gWstjGQAsJ0szc(*6 r&t&A}0_yQ*1jbD7I;WT!&rd$6WjOh!j?iXi!>de;LYr?I zKVgfsW!>MLV42Ct`3|VZn-M5K h`I1$H+*1%w6l{7AmwRbuN@`AK9?)d z&a{$QR3Sf2p(G#SlT)vx=*#ht=+04l`pHW$?7;1YW z+;-i>;-V>u8OlKGRb!|0h)r%}64rr;sRP9{fnqGhIi=}9DeRgz>+-Bumf$%=D&<9nJ2H~bQD#{&r>MLNL2_)EiTbDGB7e$C{8Vz{GQWyvnN{s z9wnRKbIxaERGi$;Ehes;3v`8U3CI;w5;K&6u27Ag(xW_iBe$4>f(A%Z0W7HrlvM}H wY64|higQZSr}Tgg;6^fF^9SzrihPEKFyAnn7#eO~s~f|tLX5vAD;W9%0K;Q!hyVZp diff --git a/docs_m2met2/_build/doctrees/environment.pickle b/docs_m2met2/_build/doctrees/environment.pickle index 3fd14ad19ec4be8816551b37d35de5ab14f64144..137f599ab0a1ee2cac85303137e638e2bd2d2aea 100644 GIT binary patch delta 1508 zcmZ`(TWlLu5cPGOIyMQyc`FSijomnHoL8%aB&K;ZMTzaC2}zR%r)lljTYGQPUDw$a zB?1K@(LA)JMzbvsl?w5J_(Nqu;-eC%A3;PyZR5rTQBcDN5)$HvmkU4WAZ$=peoj*VJW=8Nh)}!9_vsFjxaLM3!JJH z`UEK{M&*dai-M}yN0PCyz%O!1b}FA2qTG3Qx?dEea75DV&b+*(wAo}D==<`K?jJ6{ z$kBT_b|u(M@8sBPnJ#)a$KL4Krdv67b&`T^&y)*P(MA#MjAtLMuy@?1_ zqqh)YC+K}d*aiAMBJ2SDl_6QNNMd9sDecC=-2@JtaA|)WDn{VY_OwlB!CHGH3;ufi z90A{;0AF{`K!wi*r@GdG_}d}qeHAwRF0JedCt<*$`3^sq1(|L!3nqJivcd;}c6iX% z!78nUTZ4mef7k(6hf2WfTL-aIV0BMJSCapnqr(Iu7H$Pb>aMcUT^t zV0((q6;5oG<~TM^rb-fIoPx08cV z)RxWa*ohk?qdhbwn-8M&0gJACfasz#hKe(Xan5ItuNgl<#W>h^d=J01=4)bDe3tpKiWhva88EU|N ztNON8ebvkSizhq#F`R93`6CHkbN=EQ~jVGbQprx#fAAM|L6c7 zD>FovZ0E8w&*{f14Z%Xg^s}=ShOqrV4+aaC8iW5nF!pXhC>raGl0gA=4mvi(VRpxi{>brQJ}M5M-?<$PO?)}$_xn!I z={djiefRPz`C*ml-_*Ywx~{)qD>S?q6O&vxslX+h$($#M5&o1Sm-LIt*;qshCxw`( zEFOBcN-o&XCHQGUbxjxu>^uIXQj3-#qcbpOt^4=vs!E*Dlia z46__)qIWawDY=&3%dncOpV6-~toGx4x|w0m=Zm6-C&!;FN%S7vZ9E9`t};TzE?H>tyY)Z0DhDhu)yM)Y=eaKeWD#KXgG}^M&g7 zNxHsHk|a4lDI}-afUheJbPJZ7S0UbNhfmzY@FVkaj~Vv1EW=s19W1Sf;9uqgJ`-GS z?SWrg?27>_xOVlB4VY=QfxPSb5nK~J&$`!EL_b1=>!F__!qw0(5utqa4kBC${T30f zgZ_jFS3!ScNRsu*Som>w>U+Cu2>cSNSMNVxj=-q?)i#y}(e`i}JlHcs;JbbimO94a zimx6#oh$H;KgLFR1+E2+iz6oWp~p@bE?1phucpC`ZZQqodL9}e&>^yEEpW~2g{sba zu=KA0@w9{Z!d^E07~J;M!&kl)aCqCHEYz_Wn1n~(dXNJ*$p2|R5*%i6qfWNL7s0CH z9XnFDf+2!2wQC@j2FrtNB6b3|hRW1E2j5Dc-Wn>_F_ka%icIARxDxWh&uwg$J7hy$ z8c|W7!VE5?4El8l~jYB$t>v8WD~kf83;ma|H|7 sn;AmGqnSk0hvo>HV`xHX4xsU&8ALOR<|woryPs#9v`*HkCE;q_f0NYt`v3p{ diff --git a/docs_m2met2/_build/html/Baseline.html b/docs_m2met2/_build/html/Baseline.html index 8893a1692..e52d32275 100644 --- a/docs_m2met2/_build/html/Baseline.html +++ b/docs_m2met2/_build/html/Baseline.html @@ -88,7 +88,7 @@
  • Track & Evaluation diff --git a/docs_m2met2/_build/html/Contact.html b/docs_m2met2/_build/html/Contact.html index 44546d590..eafd2d5b8 100644 --- a/docs_m2met2/_build/html/Contact.html +++ b/docs_m2met2/_build/html/Contact.html @@ -84,7 +84,7 @@
  • Track & Evaluation diff --git a/docs_m2met2/_build/html/Dataset.html b/docs_m2met2/_build/html/Dataset.html index 88531f6d0..43bf8a121 100644 --- a/docs_m2met2/_build/html/Dataset.html +++ b/docs_m2met2/_build/html/Dataset.html @@ -89,7 +89,7 @@
  • Track & Evaluation @@ -131,7 +131,7 @@

    Detail of AliMeeting corpus

    -

    AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train and Eval sets contain 212 and 8 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train and Eval sets is 456 and 25, respectively, with balanced gender coverage.

    +

    AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train, Eval and Test sets contain 212, 8 and 20 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train, Eval and Test sets is 456, 25 and 60, respectively, with balanced gender coverage.

    The dataset is collected in 13 meeting venues, which are categorized into three types: small, medium, and large rooms with sizes ranging from 8 m\(^{2}\) to 55 m\(^{2}\). Different rooms give us a variety of acoustic properties and layouts. The detailed parameters of each meeting venue will be released together with the Train data. The type of wall material of the meeting venues covers cement, glass, etc. Other furnishings in meeting venues include sofa, TV, blackboard, fan, air conditioner, plants, etc. During recording, the participants of the meeting sit around the microphone array which is placed on the table and conduct a natural conversation. The microphone-speaker distance ranges from 0.3 m to 5.0 m. All participants are native Chinese speakers speaking Mandarin without strong accents. During the meeting, various kinds of indoor noise including but not limited to clicking, keyboard, door opening/closing, fan, bubble noise, etc., are made naturally. For both Train and Eval sets, the participants are required to remain in the same position during recording. There is no speaker overlap between the Train and Eval set. An example of the recording venue from the Train set is shown in Fig 1.

    meeting room

    The number of participants within one meeting session ranges from 2 to 4. To ensure the coverage of different overlap ratios, we select various meeting topics during recording, including medical treatment, education, business, organization management, industrial production and other daily routine meetings. The average speech overlap ratio of Train, Eval and Test sets are 42.27%, 34.76% and 42.8%, respectively. More details of AliMeeting are shown in Table 1. A detailed overlap ratio distribution of meeting sessions with different numbers of speakers in the Train, Eval and Test set is shown in Table 2.

    diff --git a/docs_m2met2/_build/html/Introduction.html b/docs_m2met2/_build/html/Introduction.html index d75fa954e..a1950685f 100644 --- a/docs_m2met2/_build/html/Introduction.html +++ b/docs_m2met2/_build/html/Introduction.html @@ -89,7 +89,7 @@
  • Track & Evaluation @@ -146,7 +146,8 @@

    Guidelines

    -

    Interested participants, whether from academia or industry, must register for the challenge by completing a Google form, which will be available here. The deadline for registration is May 5, 2023.

    +

    Interested participants, whether from academia or industry, must register for the challenge by completing the Google form below. The deadline for registration is May 5, 2023.

    +

    M2MET2.0 registration

    Within three working days, the challenge organizer will send email invitations to eligible teams to participate in the challenge. All qualified teams are required to adhere to the challenge rules, which will be published on the challenge page. Prior to the ranking release time, each participant must submit a system description document detailing their approach and methods. The organizer will select the top three submissions to be included in the ASRU2023 Proceedings.

    diff --git a/docs_m2met2/_build/html/Organizers.html b/docs_m2met2/_build/html/Organizers.html index f66c037f9..0a8811e78 100644 --- a/docs_m2met2/_build/html/Organizers.html +++ b/docs_m2met2/_build/html/Organizers.html @@ -88,7 +88,7 @@
  • Track & Evaluation diff --git a/docs_m2met2/_build/html/Rules.html b/docs_m2met2/_build/html/Rules.html index 3d2bddebe..8eef8aad7 100644 --- a/docs_m2met2/_build/html/Rules.html +++ b/docs_m2met2/_build/html/Rules.html @@ -88,7 +88,7 @@
  • Track & Evaluation diff --git a/docs_m2met2/_build/html/Track_setting_and_evaluation.html b/docs_m2met2/_build/html/Track_setting_and_evaluation.html index f377ee5ea..859f4444a 100644 --- a/docs_m2met2/_build/html/Track_setting_and_evaluation.html +++ b/docs_m2met2/_build/html/Track_setting_and_evaluation.html @@ -89,7 +89,7 @@
  • Track & Evaluation @@ -125,9 +125,9 @@

    Track & Evaluation

    -
    -

    Speaker-Attributed ASR (Main Track)

    -

    The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It’s worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps. Instead, segments containing multiple speakers will be provided on the Test-2023 set, which can be obtained using a simple voice activity detection (VAD) model.

    +
    +

    Speaker-Attributed ASR

    +

    The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It’s worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps of the Test-2023 set. Instead, segments containing multiple speakers will be provided, which can be obtained using a simple voice activity detection (VAD) model.

    task difference

    diff --git a/docs_m2met2/_build/html/_sources/Dataset.md.txt b/docs_m2met2/_build/html/_sources/Dataset.md.txt index c987acac9..a897d673b 100644 --- a/docs_m2met2/_build/html/_sources/Dataset.md.txt +++ b/docs_m2met2/_build/html/_sources/Dataset.md.txt @@ -2,7 +2,7 @@ ## Overview of training data In the fixed training condition, the training dataset is restricted to three publicly available corpora, namely, AliMeeting, AISHELL-4, and CN-Celeb. To evaluate the performance of the models trained on these datasets, we will release a new Test set called Test-2023 for scoring and ranking. We will describe the AliMeeting dataset and the Test-2023 set in detail. ## Detail of AliMeeting corpus -AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train and Eval sets contain 212 and 8 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train and Eval sets is 456 and 25, respectively, with balanced gender coverage. +AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train, Eval and Test sets contain 212, 8 and 20 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train, Eval and Test sets is 456, 25 and 60, respectively, with balanced gender coverage. The dataset is collected in 13 meeting venues, which are categorized into three types: small, medium, and large rooms with sizes ranging from 8 m$^{2}$ to 55 m$^{2}$. Different rooms give us a variety of acoustic properties and layouts. The detailed parameters of each meeting venue will be released together with the Train data. The type of wall material of the meeting venues covers cement, glass, etc. Other furnishings in meeting venues include sofa, TV, blackboard, fan, air conditioner, plants, etc. During recording, the participants of the meeting sit around the microphone array which is placed on the table and conduct a natural conversation. The microphone-speaker distance ranges from 0.3 m to 5.0 m. All participants are native Chinese speakers speaking Mandarin without strong accents. During the meeting, various kinds of indoor noise including but not limited to clicking, keyboard, door opening/closing, fan, bubble noise, etc., are made naturally. For both Train and Eval sets, the participants are required to remain in the same position during recording. There is no speaker overlap between the Train and Eval set. An example of the recording venue from the Train set is shown in Fig 1. diff --git a/docs_m2met2/_build/html/_sources/Introduction.md.txt b/docs_m2met2/_build/html/_sources/Introduction.md.txt index e1f9fc792..27322dece 100644 --- a/docs_m2met2/_build/html/_sources/Introduction.md.txt +++ b/docs_m2met2/_build/html/_sources/Introduction.md.txt @@ -20,6 +20,8 @@ Building on the success of the previous M2MeT challenge, we are excited to propo ## Guidelines -Interested participants, whether from academia or industry, must register for the challenge by completing a Google form, which will be available here. The deadline for registration is May 5, 2023. +Interested participants, whether from academia or industry, must register for the challenge by completing the Google form below. The deadline for registration is May 5, 2023. + +[M2MET2.0 registration](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link) Within three working days, the challenge organizer will send email invitations to eligible teams to participate in the challenge. All qualified teams are required to adhere to the challenge rules, which will be published on the challenge page. Prior to the ranking release time, each participant must submit a system description document detailing their approach and methods. The organizer will select the top three submissions to be included in the ASRU2023 Proceedings. diff --git a/docs_m2met2/_build/html/_sources/Track_setting_and_evaluation.md.txt b/docs_m2met2/_build/html/_sources/Track_setting_and_evaluation.md.txt index b90c17af1..2b75fcac5 100644 --- a/docs_m2met2/_build/html/_sources/Track_setting_and_evaluation.md.txt +++ b/docs_m2met2/_build/html/_sources/Track_setting_and_evaluation.md.txt @@ -1,6 +1,6 @@ # Track & Evaluation -## Speaker-Attributed ASR (Main Track) -The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It's worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps. Instead, segments containing multiple speakers will be provided on the Test-2023 set, which can be obtained using a simple voice activity detection (VAD) model. +## Speaker-Attributed ASR +The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It's worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps of the Test-2023 set. Instead, segments containing multiple speakers will be provided, which can be obtained using a simple voice activity detection (VAD) model. ![task difference](images/task_diff.png) diff --git a/docs_m2met2/_build/html/genindex.html b/docs_m2met2/_build/html/genindex.html index 0071a1845..e7e17b6ed 100644 --- a/docs_m2met2/_build/html/genindex.html +++ b/docs_m2met2/_build/html/genindex.html @@ -79,7 +79,7 @@
  • Track & Evaluation diff --git a/docs_m2met2/_build/html/index.html b/docs_m2met2/_build/html/index.html index acd0d32c4..dcbb8cb03 100644 --- a/docs_m2met2/_build/html/index.html +++ b/docs_m2met2/_build/html/index.html @@ -84,7 +84,7 @@
  • Track & Evaluation diff --git a/docs_m2met2/_build/html/search.html b/docs_m2met2/_build/html/search.html index 344ad958a..71adf366d 100644 --- a/docs_m2met2/_build/html/search.html +++ b/docs_m2met2/_build/html/search.html @@ -72,7 +72,7 @@
  • Track & Evaluation diff --git a/docs_m2met2/_build/html/searchindex.js b/docs_m2met2/_build/html/searchindex.js index 1542f8e9c..6fd42560b 100644 --- a/docs_m2met2/_build/html/searchindex.js +++ b/docs_m2met2/_build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["Baseline", "Contact", "Dataset", "Introduction", "Organizers", "Rules", "Track_setting_and_evaluation", "index"], "filenames": ["Baseline.md", "Contact.md", "Dataset.md", "Introduction.md", "Organizers.md", "Rules.md", "Track_setting_and_evaluation.md", "index.rst"], "titles": ["Baseline", "Contact", "Datasets", "Introduction", "Organizers", "Rules", "Track & Evaluation", "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)"], "terms": {"we": [0, 2, 3, 7], "releas": [0, 2, 3, 6], "an": [0, 2, 3, 6], "e2": 0, "sa": 0, "asr": [0, 3, 7], "cite": 0, "kanda21b_interspeech": 0, "conduct": [0, 2], "funasr": 0, "time": [0, 6], "accord": [0, 3], "timelin": [0, 2], "The": [0, 2, 3, 5, 6], "model": [0, 2, 3, 5, 6], "architectur": 0, "i": [0, 2, 3, 5], "shown": [0, 2], "figur": [0, 6], "3": [0, 2, 3], "speakerencod": 0, "initi": 0, "pre": [0, 6], "train": [0, 3, 5, 7], "speaker": [0, 2, 3, 7], "verif": 0, "from": [0, 2, 3, 5, 6], "modelscop": [0, 6], "thi": [0, 3, 5, 6], "also": [0, 2, 6], "us": [0, 2, 5, 6], "extract": 0, "embed": 0, "profil": 0, "todo": 0, "fill": 0, "readm": 0, "md": 0, "system": [0, 3, 5, 6, 7], "ar": [0, 2, 3, 5, 6, 7], "tabl": [0, 2], "adopt": 0, "oracl": [0, 6], "dure": [0, 2, 6], "howev": [0, 3, 6], "due": [0, 3], "lack": 0, "label": [0, 5, 6], "evalu": [0, 2, 3, 7], "provid": [0, 2, 6, 7], "addit": [0, 6], "spectral": 0, "cluster": 0, "meanwhil": 0, "eval": [0, 2, 5, 6], "test": [0, 2, 3, 5, 6], "set": [0, 2, 3, 5, 6], "show": 0, "impact": 0, "accuraci": [0, 6], "If": [1, 5, 6], "you": 1, "have": [1, 3], "ani": [1, 5, 6], "question": 1, "about": 1, "m2met2": [1, 3], "0": [1, 2, 3], "challeng": [1, 3, 5, 6], "pleas": 1, "u": [1, 2], "email": [1, 3, 4], "m2met": [1, 3, 6, 7], "alimeet": [1, 6], "gmail": 1, "com": [1, 4], "wechat": 1, "group": [1, 2], "In": [2, 3, 5], "fix": [2, 3, 7], "condit": [2, 3, 7], "restrict": 2, "three": [2, 3, 6], "publicli": [2, 6], "avail": [2, 3, 6], "corpora": 2, "name": 2, "aishel": [2, 4, 6], "4": [2, 6], "cn": [2, 4, 6], "celeb": [2, 6], "To": [2, 3, 7], "perform": [2, 3], "new": [2, 3, 6], "call": 2, "2023": [2, 3, 5, 6], "score": [2, 6], "rank": [2, 3, 6], "describ": 2, "contain": [2, 6], "118": 2, "75": 2, "hour": [2, 3, 6], "speech": [2, 3, 6, 7], "total": [2, 6], "divid": [2, 6], "104": 2, "10": [2, 3, 6], "specif": [2, 6], "212": 2, "8": 2, "session": [2, 3, 6, 7], "respect": 2, "each": [2, 3, 6], "consist": [2, 6], "15": 2, "30": 2, "minut": 2, "discuss": 2, "particip": [2, 5, 6], "number": [2, 3, 6], "456": 2, "25": 2, "balanc": 2, "gender": 2, "coverag": 2, "collect": 2, "13": [2, 3], "meet": [2, 3, 6], "venu": 2, "which": [2, 3, 6], "categor": 2, "type": 2, "small": 2, "medium": 2, "larg": [2, 3], "room": [2, 3], "size": 2, "rang": 2, "m": 2, "2": [2, 6], "55": 2, "differ": [2, 3, 5, 6], "give": 2, "varieti": 2, "acoust": [2, 3, 6], "properti": 2, "layout": 2, "paramet": [2, 5], "togeth": 2, "wall": 2, "materi": 2, "cover": 2, "cement": 2, "glass": 2, "etc": 2, "other": 2, "furnish": 2, "includ": [2, 3, 5, 6], "sofa": 2, "tv": 2, "blackboard": 2, "fan": 2, "air": 2, "condition": 2, "plant": 2, "record": [2, 6], "sit": 2, "around": 2, "microphon": [2, 3], "arrai": [2, 3], "place": 2, "natur": 2, "convers": 2, "distanc": 2, "5": [2, 3], "all": [2, 3, 5, 6], "nativ": 2, "chines": 2, "speak": [2, 3], "mandarin": [2, 3], "without": 2, "strong": 2, "accent": 2, "variou": [2, 3], "kind": 2, "indoor": 2, "nois": [2, 3, 5], "limit": [2, 3, 5], "click": 2, "keyboard": 2, "door": 2, "open": [2, 3, 7], "close": 2, "bubbl": 2, "made": [2, 3], "For": 2, "both": [2, 6], "requir": [2, 3, 6], "remain": [2, 3], "same": [2, 5], "posit": 2, "There": 2, "overlap": [2, 3], "between": [2, 6], "exampl": 2, "fig": 2, "1": 2, "within": [2, 3], "one": [2, 5], "ensur": 2, "ratio": 2, "select": [2, 3, 5, 6], "topic": 2, "medic": 2, "treatment": 2, "educ": 2, "busi": 2, "organ": [2, 3, 5, 6, 7], "manag": 2, "industri": [2, 3], "product": 2, "daili": 2, "routin": 2, "averag": 2, "42": 2, "27": 2, "34": 2, "76": 2, "more": 2, "A": [2, 4], "distribut": 2, "20": 2, "were": 2, "ident": [2, 6], "compris": [2, 3, 7], "therebi": 2, "share": 2, "similar": 2, "configur": 2, "field": [2, 3, 6], "signal": [2, 3], "headset": 2, "onli": [2, 5, 6], "": [2, 6], "own": 2, "transcrib": [2, 3, 6], "It": [2, 6], "worth": [2, 6], "note": [2, 6], "far": [2, 3], "audio": [2, 3, 6], "synchron": 2, "common": 2, "transcript": [2, 3, 5, 6], "prepar": 2, "textgrid": 2, "format": 2, "inform": [2, 3], "durat": 2, "id": 2, "segment": [2, 6], "timestamp": [2, 6], "mention": 2, "abov": 2, "can": [2, 3, 5, 6], "download": 2, "openslr": 2, "via": 2, "follow": [2, 5], "link": 2, "particularli": 2, "baselin": [2, 3, 7], "conveni": 2, "script": 2, "automat": [3, 7], "recognit": [3, 7], "diariz": 3, "signific": 3, "stride": 3, "recent": 3, "year": 3, "result": 3, "surg": 3, "technologi": 3, "applic": 3, "across": 3, "domain": 3, "present": 3, "uniqu": [3, 6], "complex": [3, 5], "divers": 3, "style": 3, "variabl": 3, "confer": 3, "environment": 3, "reverber": [3, 5], "over": 3, "sever": 3, "been": 3, "advanc": [3, 7], "develop": [3, 6], "rich": 3, "comput": [3, 5], "hear": 3, "multisourc": 3, "environ": 3, "chime": 3, "latest": 3, "iter": 3, "ha": 3, "particular": 3, "focu": 3, "distant": 3, "gener": 3, "topologi": 3, "scenario": 3, "while": 3, "progress": 3, "english": 3, "languag": [3, 5], "barrier": 3, "achiev": 3, "compar": 3, "non": 3, "multimod": 3, "base": 3, "process": [3, 6], "misp": 3, "multi": [3, 5, 6], "channel": 3, "parti": [3, 6], "instrument": 3, "seek": 3, "address": 3, "problem": 3, "visual": 3, "everydai": 3, "home": 3, "focus": 3, "tackl": 3, "issu": 3, "offlin": 3, "icassp2022": 3, "two": [3, 5, 7], "main": 3, "task": [3, 6, 7], "former": 3, "involv": [3, 6], "identifi": 3, "who": 3, "spoke": 3, "when": 3, "latter": 3, "aim": 3, "multipl": [3, 6], "simultan": 3, "pose": [3, 6], "technic": 3, "difficulti": 3, "interfer": 3, "build": [3, 6, 7], "success": [3, 7], "previou": 3, "excit": 3, "propos": [3, 7], "asru2023": [3, 7], "special": [3, 5, 7], "origin": [3, 5], "metric": [3, 7], "wa": [3, 6], "independ": 3, "meant": 3, "could": 3, "determin": 3, "correspond": [3, 5], "further": 3, "current": [3, 7], "talker": [3, 7], "toward": 3, "practic": 3, "attribut": [3, 7], "sub": [3, 5, 7], "track": [3, 5, 7], "By": [], "improv": [], "real": [], "world": [], "detail": [3, 6], "dataset": [3, 5, 6, 7], "rule": [3, 7], "method": 3, "facilit": [3, 7], "reproduc": [3, 7], "research": [3, 4, 7], "what": 3, "offer": 3, "comprehens": [3, 7], "overview": [3, 7], "furthermor": 3, "carefulli": 3, "curat": 3, "approxim": [3, 6], "design": 3, "enabl": 3, "valid": 3, "state": [3, 6, 7], "art": [3, 7], "area": 3, "mai": 3, "th": 3, "registr": 3, "deadlin": 3, "date": 3, "join": 3, "june": 3, "9": 3, "data": [3, 5, 6], "rd": 3, "final": [3, 5, 6], "submiss": 3, "19": 3, "juli": 3, "paper": [3, 6], "decemb": 3, "12": 3, "nd": 3, "16": 3, "asru": 3, "workshop": 3, "possibl": 6, "version": [], "interest": 3, "whether": 3, "academia": 3, "must": [3, 5, 6], "regist": 3, "complet": 3, "googl": 3, "form": 3, "here": 3, "work": 3, "dai": 3, "send": 3, "invit": 3, "elig": [3, 5], "team": 3, "qualifi": 3, "adher": [3, 5], "publish": 3, "page": 3, "prior": 3, "submit": 3, "descript": [3, 6], "document": 3, "approach": [3, 5], "top": 3, "proceed": 3, "lei": 4, "xie": 4, "professor": 4, "northwestern": 4, "polytechn": 4, "univers": 4, "china": 4, "lxie": 4, "nwpu": 4, "edu": 4, "receiv": [], "ph": [], "d": [], "degre": [], "scienc": [], "xi": [], "2004": [], "2001": [], "2002": [], "he": [], "depart": [], "electron": [], "vrije": [], "universiteit": [], "brussel": [], "vub": [], "belgium": [], "visit": [], "scientist": 4, "2006": [], "senior": 4, "associ": [], "center": [], "media": [], "school": [], "creativ": [], "citi": [], "hong": [], "kong": 4, "2007": [], "postdoctor": [], "fellow": [], "human": [], "commun": [], "laboratori": [], "hccl": [], "xian": [], "lead": [], "aslp": [], "npu": [], "200": [], "refer": 6, "journal": [], "ieee": 4, "acm": [], "transact": [], "multimedia": [], "interspeech": [], "icassp": [], "acl": [], "best": [], "award": [], "flagship": [], "hi": [], "interact": [], "dr": [], "editor": [], "ae": [], "tran": [], "activ": 6, "serv": [], "chair": [], "mani": [], "committe": [], "member": [], "aik": 4, "lee": 4, "institut": 4, "infocomm": 4, "star": 4, "singapor": 4, "kongaik": 4, "org": 4, "start": [], "off": [], "him": [], "career": [], "leader": [], "strateg": [], "plan": [], "2018": [], "2020": [], "spent": [], "half": [], "nec": [], "corpor": [], "japan": [], "veri": [], "much": [], "voic": 6, "biometr": [], "modal": [], "proud": [], "great": [], "featur": [], "bio": [], "idiom": [], "platform": [], "return": [], "now": [], "analyt": [], "pi": [], "elsevi": [], "sinc": [], "2016": [], "2017": [], "2021": [], "am": [], "elect": [], "2019": [], "zhiji": 4, "yan": 4, "princip": 4, "engin": 4, "alibaba": 4, "yzj": 4, "inc": 4, "hold": [], "phd": [], "electr": [], "expert": [], "review": [], "academ": [], "synthesi": [], "voiceprint": [], "appli": 4, "servic": [], "ant": [], "financi": [], "titl": [], "One": [], "100": 6, "grassroot": [], "shiliang": 4, "zhang": 4, "sly": 4, "zsl": 4, "graduat": [], "mainli": [], "understand": [], "machin": [], "learn": [], "40": [], "mainstream": [], "dozen": [], "patent": [], "after": [], "obtain": [5, 6], "doctor": [], "intellig": [], "direct": [], "fundament": [], "damo": [], "academi": [], "yanmin": 4, "qian": 4, "shanghai": 4, "jiao": 4, "tong": 4, "yanminqian": 4, "sjtu": 4, "b": [], "huazhong": [], "wuhan": [], "tsinghua": [], "beij": [], "2012": [], "2013": [], "where": 6, "2015": [], "cambridg": [], "k": [], "isca": [], "found": [], "kaldi": [], "toolkit": [], "than": [], "110": [], "4000": [], "citat": [], "kei": [], "word": [], "spot": [], "zhuo": 4, "chen": 4, "microsoft": 4, "usa": 4, "zhuc": 4, "columbia": [], "york": [], "ny": [], "author": [], "coauthor": [], "80": [], "peer": [], "6000": [], "ten": [], "separ": [], "diaris": [], "event": [], "won": [], "contribut": [], "sourc": 6, "wsj0": [], "2mix": [], "libricss": [], "benchmark": [], "jelinek": [], "student": [], "push": [], "jian": 4, "wu": 4, "wujian": 4, "master": [], "robust": [], "enhanc": [], "dereverber": [], "public": [], "1200": [], "chime5": [], "dn": [], "ffsvc": [], "slt": [], "taslp": [], "spl": [], "hui": 4, "bu": 4, "ceo": 4, "foundat": 4, "buhui": 4, "aishelldata": 4, "artifici": [], "korea": [], "2014": [], "founder": [], "dmash": [], "mia": [], "databas": [], "project": [], "co": [], "forum": [], "should": 5, "augment": 5, "allow": [5, 6], "ad": 5, "speed": 5, "perturb": 5, "tone": 5, "chang": 5, "permit": 5, "purpos": 5, "instead": [5, 6], "util": [5, 6], "tune": 5, "violat": 5, "strictli": [5, 6], "prohibit": [5, 6], "fine": 5, "fusion": 5, "structur": 5, "encourag": 5, "cpcer": [5, 6], "lower": 5, "judg": 5, "superior": 5, "forc": 5, "align": 5, "frame": 5, "level": 5, "classif": 5, "basi": 5, "shallow": 5, "end": 5, "e": [5, 6], "g": 5, "la": 5, "rnnt": 5, "transform": [5, 6], "come": 5, "right": 5, "interpret": 5, "belong": 5, "case": 5, "circumst": 5, "coordin": 5, "assign": 6, "illustr": 6, "aishell4": 6, "constrain": 6, "addition": 6, "corpu": 6, "soon": 6, "simpl": 6, "detect": 6, "vad": 6, "concaten": 6, "minimum": 6, "permut": 6, "charact": 6, "error": 6, "rate": 6, "calcul": 6, "step": 6, "firstli": 6, "hypothesi": 6, "chronolog": 6, "order": 6, "secondli": 6, "cer": 6, "repeat": 6, "lowest": 6, "tthe": 6, "insert": 6, "Ins": 6, "substitut": 6, "delet": 6, "del": 6, "output": 6, "text": 6, "frac": 6, "mathcal": 6, "n_": 6, "usag": 6, "third": 6, "hug": 6, "face": 6, "list": 6, "clearli": 6, "privat": 6, "manual": 6, "simul": 6, "thei": 6, "mandatori": 6, "clear": 6, "scheme": 6, "delight": 7, "introduct": 7, "contact": 7, "index": [], "modul": [], "search": []}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"baselin": 0, "overview": [0, 2], "quick": 0, "start": 0, "result": 0, "contact": 1, "dataset": 2, "train": [2, 6], "data": 2, "detail": 2, "alimeet": 2, "corpu": 2, "get": 2, "introduct": 3, "call": 3, "particip": 3, "timelin": 3, "aoe": 3, "time": 3, "guidelin": 3, "organ": 4, "rule": 5, "track": 6, "evalu": 6, "speaker": 6, "attribut": 6, "asr": 6, "main": 6, "metric": 6, "sub": 6, "arrang": 6, "i": 6, "fix": 6, "condit": 6, "ii": 6, "open": 6, "asru": 7, "2023": 7, "multi": 7, "channel": 7, "parti": 7, "meet": 7, "transcript": 7, "challeng": 7, "2": 7, "0": 7, "m2met2": 7, "content": 7, "indic": [], "tabl": []}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Baseline": [[0, "baseline"]], "Overview": [[0, "overview"]], "Quick start": [[0, "quick-start"]], "Baseline results": [[0, "baseline-results"]], "Contact": [[1, "contact"]], "Datasets": [[2, "datasets"]], "Overview of training data": [[2, "overview-of-training-data"]], "Detail of AliMeeting corpus": [[2, "detail-of-alimeeting-corpus"]], "Get the data": [[2, "get-the-data"]], "Introduction": [[3, "introduction"]], "Call for participation": [[3, "call-for-participation"]], "Timeline(AOE Time)": [[3, "timeline-aoe-time"]], "Guidelines": [[3, "guidelines"]], "Organizers": [[4, "organizers"]], "Rules": [[5, "rules"]], "Track & Evaluation": [[6, "track-evaluation"]], "Speaker-Attributed ASR (Main Track)": [[6, "speaker-attributed-asr-main-track"]], "Evaluation metric": [[6, "evaluation-metric"]], "Sub-track arrangement": [[6, "sub-track-arrangement"]], "Sub-track I (Fixed Training Condition):": [[6, "sub-track-i-fixed-training-condition"]], "Sub-track II (Open Training Condition):": [[6, "sub-track-ii-open-training-condition"]], "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)": [[7, "asru-2023-multi-channel-multi-party-meeting-transcription-challenge-2-0-m2met2-0"]], "Contents:": [[7, null]]}, "indexentries": {}}) \ No newline at end of file +Search.setIndex({"docnames": ["Baseline", "Contact", "Dataset", "Introduction", "Organizers", "Rules", "Track_setting_and_evaluation", "index"], "filenames": ["Baseline.md", "Contact.md", "Dataset.md", "Introduction.md", "Organizers.md", "Rules.md", "Track_setting_and_evaluation.md", "index.rst"], "titles": ["Baseline", "Contact", "Datasets", "Introduction", "Organizers", "Rules", "Track & Evaluation", "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)"], "terms": {"we": [0, 2, 3, 7], "releas": [0, 2, 3, 6], "an": [0, 2, 3, 6], "e2": 0, "sa": 0, "asr": [0, 3, 7], "cite": 0, "kanda21b_interspeech": 0, "conduct": [0, 2], "funasr": 0, "time": [0, 6], "accord": [0, 3], "timelin": [0, 2], "The": [0, 2, 3, 5, 6], "model": [0, 2, 3, 5, 6], "architectur": 0, "i": [0, 2, 3, 5], "shown": [0, 2], "figur": [0, 6], "3": [0, 2, 3], "speakerencod": 0, "initi": 0, "pre": [0, 6], "train": [0, 3, 5, 7], "speaker": [0, 2, 3, 7], "verif": 0, "from": [0, 2, 3, 5, 6], "modelscop": [0, 6], "thi": [0, 3, 5, 6], "also": [0, 2, 6], "us": [0, 2, 5, 6], "extract": 0, "embed": 0, "profil": 0, "todo": 0, "fill": 0, "readm": 0, "md": 0, "system": [0, 3, 5, 6, 7], "ar": [0, 2, 3, 5, 6, 7], "tabl": [0, 2], "adopt": 0, "oracl": [0, 6], "dure": [0, 2, 6], "howev": [0, 3, 6], "due": [0, 3], "lack": 0, "label": [0, 5, 6], "evalu": [0, 2, 3, 7], "provid": [0, 2, 6, 7], "addit": [0, 6], "spectral": 0, "cluster": 0, "meanwhil": 0, "eval": [0, 2, 5, 6], "test": [0, 2, 3, 5, 6], "set": [0, 2, 3, 5, 6], "show": 0, "impact": 0, "accuraci": [0, 6], "If": [1, 5, 6], "you": 1, "have": [1, 3], "ani": [1, 5, 6], "question": 1, "about": 1, "m2met2": [1, 3], "0": [1, 2, 3], "challeng": [1, 3, 5, 6], "pleas": 1, "u": [1, 2], "email": [1, 3, 4], "m2met": [1, 3, 6, 7], "alimeet": [1, 6], "gmail": 1, "com": [1, 4], "wechat": 1, "group": [1, 2], "In": [2, 3, 5], "fix": [2, 3, 7], "condit": [2, 3, 7], "restrict": 2, "three": [2, 3, 6], "publicli": [2, 6], "avail": [2, 6], "corpora": 2, "name": 2, "aishel": [2, 4, 6], "4": [2, 6], "cn": [2, 4, 6], "celeb": [2, 6], "To": [2, 3, 7], "perform": [2, 3], "new": [2, 3, 6], "call": 2, "2023": [2, 3, 5, 6], "score": [2, 6], "rank": [2, 3, 6], "describ": 2, "contain": [2, 6], "118": 2, "75": 2, "hour": [2, 3, 6], "speech": [2, 3, 6, 7], "total": [2, 6], "divid": [2, 6], "104": 2, "10": [2, 3, 6], "specif": [2, 6], "212": 2, "8": 2, "20": 2, "session": [2, 3, 6, 7], "respect": 2, "each": [2, 3, 6], "consist": [2, 6], "15": 2, "30": 2, "minut": 2, "discuss": 2, "particip": [2, 5, 6], "number": [2, 3, 6], "456": 2, "25": 2, "60": 2, "balanc": 2, "gender": 2, "coverag": 2, "collect": 2, "13": [2, 3], "meet": [2, 3, 6], "venu": 2, "which": [2, 3, 6], "categor": 2, "type": 2, "small": 2, "medium": 2, "larg": [2, 3], "room": [2, 3], "size": 2, "rang": 2, "m": 2, "2": [2, 6], "55": 2, "differ": [2, 3, 5, 6], "give": 2, "varieti": 2, "acoust": [2, 3, 6], "properti": 2, "layout": 2, "paramet": [2, 5], "togeth": 2, "wall": 2, "materi": 2, "cover": 2, "cement": 2, "glass": 2, "etc": 2, "other": 2, "furnish": 2, "includ": [2, 3, 5, 6], "sofa": 2, "tv": 2, "blackboard": 2, "fan": 2, "air": 2, "condition": 2, "plant": 2, "record": [2, 6], "sit": 2, "around": 2, "microphon": [2, 3], "arrai": [2, 3], "place": 2, "natur": 2, "convers": 2, "distanc": 2, "5": [2, 3], "all": [2, 3, 5, 6], "nativ": 2, "chines": 2, "speak": [2, 3], "mandarin": [2, 3], "without": 2, "strong": 2, "accent": 2, "variou": [2, 3], "kind": 2, "indoor": 2, "nois": [2, 3, 5], "limit": [2, 3, 5], "click": 2, "keyboard": 2, "door": 2, "open": [2, 3, 7], "close": 2, "bubbl": 2, "made": [2, 3], "For": 2, "both": [2, 6], "requir": [2, 3, 6], "remain": [2, 3], "same": [2, 5], "posit": 2, "There": 2, "overlap": [2, 3], "between": [2, 6], "exampl": 2, "fig": 2, "1": 2, "within": [2, 3], "one": [2, 5], "ensur": 2, "ratio": 2, "select": [2, 3, 5, 6], "topic": 2, "medic": 2, "treatment": 2, "educ": 2, "busi": 2, "organ": [2, 3, 5, 6, 7], "manag": 2, "industri": [2, 3], "product": 2, "daili": 2, "routin": 2, "averag": 2, "42": 2, "27": 2, "34": 2, "76": 2, "more": 2, "A": [2, 4], "distribut": 2, "were": 2, "ident": [2, 6], "compris": [2, 3, 7], "therebi": 2, "share": 2, "similar": 2, "configur": 2, "field": [2, 3, 6], "signal": [2, 3], "headset": 2, "onli": [2, 5, 6], "": [2, 6], "own": 2, "transcrib": [2, 3, 6], "It": [2, 6], "worth": [2, 6], "note": [2, 6], "far": [2, 3], "audio": [2, 3, 6], "synchron": 2, "common": 2, "transcript": [2, 3, 5, 6], "prepar": 2, "textgrid": 2, "format": 2, "inform": [2, 3], "durat": 2, "id": 2, "segment": [2, 6], "timestamp": [2, 6], "mention": 2, "abov": 2, "can": [2, 3, 5, 6], "download": 2, "openslr": 2, "via": 2, "follow": [2, 5], "link": 2, "particularli": 2, "baselin": [2, 3, 7], "conveni": 2, "script": 2, "automat": [3, 7], "recognit": [3, 7], "diariz": 3, "signific": 3, "stride": 3, "recent": 3, "year": 3, "result": 3, "surg": 3, "technologi": 3, "applic": 3, "across": 3, "domain": 3, "present": 3, "uniqu": [3, 6], "complex": [3, 5], "divers": 3, "style": 3, "variabl": 3, "confer": 3, "environment": 3, "reverber": [3, 5], "over": 3, "sever": 3, "been": 3, "advanc": [3, 7], "develop": [3, 6], "rich": 3, "comput": [3, 5], "hear": 3, "multisourc": 3, "environ": 3, "chime": 3, "latest": 3, "iter": 3, "ha": 3, "particular": 3, "focu": 3, "distant": 3, "gener": 3, "topologi": 3, "scenario": 3, "while": 3, "progress": 3, "english": 3, "languag": [3, 5], "barrier": 3, "achiev": 3, "compar": 3, "non": 3, "multimod": 3, "base": 3, "process": [3, 6], "misp": 3, "multi": [3, 5, 6], "channel": 3, "parti": [3, 6], "instrument": 3, "seek": 3, "address": 3, "problem": 3, "visual": 3, "everydai": 3, "home": 3, "focus": 3, "tackl": 3, "issu": 3, "offlin": 3, "icassp2022": 3, "two": [3, 5, 7], "main": 3, "task": [3, 6, 7], "former": 3, "involv": [3, 6], "identifi": 3, "who": 3, "spoke": 3, "when": 3, "latter": 3, "aim": 3, "multipl": [3, 6], "simultan": 3, "pose": [3, 6], "technic": 3, "difficulti": 3, "interfer": 3, "build": [3, 6, 7], "success": [3, 7], "previou": 3, "excit": 3, "propos": [3, 7], "asru2023": [3, 7], "special": [3, 5, 7], "origin": [3, 5], "metric": [3, 7], "wa": [3, 6], "independ": 3, "meant": 3, "could": 3, "determin": 3, "correspond": [3, 5], "further": 3, "current": [3, 7], "talker": [3, 7], "toward": 3, "practic": 3, "attribut": [3, 7], "sub": [3, 5, 7], "track": [3, 5, 7], "what": 3, "facilit": [3, 7], "reproduc": [3, 7], "research": [3, 4, 7], "offer": 3, "comprehens": [3, 7], "overview": [3, 7], "dataset": [3, 5, 6, 7], "rule": [3, 7], "furthermor": 3, "carefulli": 3, "curat": 3, "approxim": [3, 6], "design": 3, "enabl": 3, "valid": 3, "state": [3, 6, 7], "art": [3, 7], "area": 3, "mai": 3, "th": 3, "registr": 3, "deadlin": 3, "date": 3, "join": 3, "june": 3, "9": 3, "data": [3, 5, 6], "rd": 3, "final": [3, 5, 6], "submiss": 3, "19": 3, "juli": 3, "paper": [3, 6], "decemb": 3, "12": 3, "nd": 3, "16": 3, "asru": 3, "workshop": 3, "interest": 3, "whether": 3, "academia": 3, "must": [3, 5, 6], "regist": 3, "complet": 3, "googl": 3, "form": 3, "below": 3, "work": 3, "dai": 3, "send": 3, "invit": 3, "elig": [3, 5], "team": 3, "qualifi": 3, "adher": [3, 5], "publish": 3, "page": 3, "prior": 3, "submit": 3, "descript": [3, 6], "document": 3, "detail": [3, 6], "approach": [3, 5], "method": 3, "top": 3, "proceed": 3, "lei": 4, "xie": 4, "professor": 4, "northwestern": 4, "polytechn": 4, "univers": 4, "china": 4, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "senior": 4, "scientist": 4, "institut": 4, "infocomm": 4, "star": 4, "singapor": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yan": 4, "princip": 4, "engin": 4, "alibaba": 4, "yzj": 4, "inc": 4, "shiliang": 4, "zhang": 4, "sly": 4, "zsl": 4, "yanmin": 4, "qian": 4, "shanghai": 4, "jiao": 4, "tong": 4, "yanminqian": 4, "sjtu": 4, "zhuo": 4, "chen": 4, "appli": 4, "microsoft": 4, "usa": 4, "zhuc": 4, "jian": 4, "wu": 4, "wujian": 4, "hui": 4, "bu": 4, "ceo": 4, "foundat": 4, "buhui": 4, "aishelldata": 4, "should": 5, "augment": 5, "allow": [5, 6], "ad": 5, "speed": 5, "perturb": 5, "tone": 5, "chang": 5, "permit": 5, "purpos": 5, "instead": [5, 6], "util": [5, 6], "tune": 5, "violat": 5, "strictli": [5, 6], "prohibit": [5, 6], "fine": 5, "fusion": 5, "structur": 5, "encourag": 5, "cpcer": [5, 6], "lower": 5, "judg": 5, "superior": 5, "forc": 5, "align": 5, "obtain": [5, 6], "frame": 5, "level": 5, "classif": 5, "basi": 5, "shallow": 5, "end": 5, "e": [5, 6], "g": 5, "la": 5, "rnnt": 5, "transform": [5, 6], "come": 5, "right": 5, "interpret": 5, "belong": 5, "case": 5, "circumst": 5, "coordin": 5, "assign": 6, "illustr": 6, "aishell4": 6, "constrain": 6, "sourc": 6, "addition": 6, "corpu": 6, "soon": 6, "simpl": 6, "voic": 6, "activ": 6, "detect": 6, "vad": 6, "concaten": 6, "minimum": 6, "permut": 6, "charact": 6, "error": 6, "rate": 6, "calcul": 6, "step": 6, "firstli": 6, "refer": 6, "hypothesi": 6, "chronolog": 6, "order": 6, "secondli": 6, "cer": 6, "repeat": 6, "possibl": 6, "lowest": 6, "tthe": 6, "insert": 6, "Ins": 6, "substitut": 6, "delet": 6, "del": 6, "output": 6, "text": 6, "frac": 6, "mathcal": 6, "n_": 6, "100": 6, "where": 6, "usag": 6, "third": 6, "hug": 6, "face": 6, "list": 6, "clearli": 6, "privat": 6, "manual": 6, "simul": 6, "thei": 6, "mandatori": 6, "clear": 6, "scheme": 6, "delight": 7, "introduct": 7, "contact": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"baselin": 0, "overview": [0, 2], "quick": 0, "start": 0, "result": 0, "contact": 1, "dataset": 2, "train": [2, 6], "data": 2, "detail": 2, "alimeet": 2, "corpu": 2, "get": 2, "introduct": 3, "call": 3, "particip": 3, "timelin": 3, "aoe": 3, "time": 3, "guidelin": 3, "organ": 4, "rule": 5, "track": 6, "evalu": 6, "speaker": 6, "attribut": 6, "asr": 6, "metric": 6, "sub": 6, "arrang": 6, "i": 6, "fix": 6, "condit": 6, "ii": 6, "open": 6, "asru": 7, "2023": 7, "multi": 7, "channel": 7, "parti": 7, "meet": 7, "transcript": 7, "challeng": 7, "2": 7, "0": 7, "m2met2": 7, "content": 7}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Baseline": [[0, "baseline"]], "Overview": [[0, "overview"]], "Quick start": [[0, "quick-start"]], "Baseline results": [[0, "baseline-results"]], "Contact": [[1, "contact"]], "Datasets": [[2, "datasets"]], "Overview of training data": [[2, "overview-of-training-data"]], "Detail of AliMeeting corpus": [[2, "detail-of-alimeeting-corpus"]], "Get the data": [[2, "get-the-data"]], "Introduction": [[3, "introduction"]], "Call for participation": [[3, "call-for-participation"]], "Timeline(AOE Time)": [[3, "timeline-aoe-time"]], "Guidelines": [[3, "guidelines"]], "Organizers": [[4, "organizers"]], "Rules": [[5, "rules"]], "Track & Evaluation": [[6, "track-evaluation"]], "Speaker-Attributed ASR": [[6, "speaker-attributed-asr"]], "Evaluation metric": [[6, "evaluation-metric"]], "Sub-track arrangement": [[6, "sub-track-arrangement"]], "Sub-track I (Fixed Training Condition):": [[6, "sub-track-i-fixed-training-condition"]], "Sub-track II (Open Training Condition):": [[6, "sub-track-ii-open-training-condition"]], "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)": [[7, "asru-2023-multi-channel-multi-party-meeting-transcription-challenge-2-0-m2met2-0"]], "Contents:": [[7, null]]}, "indexentries": {}}) \ No newline at end of file diff --git a/docs_m2met2_cn/_build/doctrees/environment.pickle b/docs_m2met2_cn/_build/doctrees/environment.pickle index 2034d4eda665530ceaa252aac2882db361ccea6e..ef89e893a05ecbed07ca2145eb9c47c07632138e 100644 GIT binary patch literal 25266 zcmcg!X^kqZy1aHrU1zNC-1xkP#%QSW46Hy`Jgrd9VAa7fB;R z3LFd?)0nZz9*oET-@KQ-twzA_TUdncU*j0=qDwR|!$-h*p@||<;?d~^k z2AQX2s-|D}IrpA>@44rmbMCpzeQO?jwzPu(lA*9}JKo_j%RA@lO_l?)QXnfYLI;J;0e#j4I!n#?u#&3?j z?R)PSN7B%!U0t)HtBtZZZj^{PL5V<^CT(F*W;$2C5jx0M_Pr@*I!$_28}xVEO@hcO z-RU2+g21Vuw0qNJZD>U?H^Mp`SzhS)UYPDj+q!GBVPw?JAhgh`;pzCuk*Z|~aU-&= znw45fM-`2X&1nmSj@ZUclWEdk4}8}ugK=%r%rxmN`}NttnYIDEB^DqRi|W@%#mGzAJ^YIvVHIFJ@S6#$fO@c_8}{btiT)D<5y=R@+;x!_*Y~CTX;xH4gWiXF?4n8}12wKe~799*w*A z9`N_xR5FfCTaf+3kzrP={vl%uifq=10@DjIa=6>(l6IAb7?E_fdDVl2ZDWTwt{OGo-Ls>ExGM0>~~1o0mo|^l{~;yQ+uE* z%(xmEfsalVMkSlfN(GhoSkxfV0gRJvh3LtniYg_j`IQ(vassFvsjyg>3)}xz8c~wHg#S($Pos8V3Xh z0oK;}USNReshM7#6rvFaRb7=1&xz_zd1f}9<3xSx+i<2mQtkUC@47}XI&vC9u8?h_ zgmlLZE1d*Agg2vN)nRESR&tH)-S%x#^?tHS1`?Doaxl_#?m>oI2f#$F)s#SY>m1~^ zDBg5M)zE-aai(T9sGC4#NLLVrx|LlqoDu_VhkvH2h2Uv)Ack0GoE|E`43h^Ala%Q^)8c}nqFQ6O(T`1L8^9vl^JkJ zDz&KlH0+dRXN3KJ`vxgdc&0t0hD3?50$Ndt`D!65H7789WhJXw(hznM96nYo9~wiv zx~$vRaXV|Kf*XgN3XP)`l1o~{dK06W^seJo9S>v+T>+^##2g(J@Bq_|w~1dC)-< z70~{Gy-n)8-M&#?uD9PS>BRws4h>B03<_eskcPoG#Jh%#W#@#7vB&MJipgML!8PAA zs~`f!3soWLVWuMboO;N?Yyx7!RTUiB=DG!^4xKuK?p>#5X;ZHo7@d z5(}yzs6iUJfvb_jGcS3~SOkZ+$PlUuhK(n)2XYV#6{0bz^Qs<~3Q=K;5W-)sQ0}iP zwrrC6kV7NoAa9LqudZ+=l}I2)SLGA|qwxcw$@*Cu@5fO+CRa*(dTZI!Q_Yic0^b21 z5qug<`7%)uVG}z_W>U`~R*j9Q%!<1_qssju#RAn*K`tJGQmk%e8MYonejob)s#nOn zu82}Xwen7rrjk7(KSFoJo0xSpi{MEF<*+P}E>A6~ewMl7gs|4LY}{Av`!FTi zuM=g2AQ0rOs1DJ25y@E+zI8EN&qS1TKnf3#5g*WSn(~l>(TE<6h{-5`w7g%i@RMVn8xZ&H2s1Dakc}ZHETIk~UFhIW8(h2?7BRvUlxy21wN-|QK?4Fp z91yb4Y9D-N%*BEsJtuE9OlY8%t8vXkR~DO3X%53|kXmB+X2}ux^{REyszQ0O^Gego zNlCFzQZ$*IBH4mBvC8vRwT2}lZIF^NpT zF@fUT1ytnJEZ7#^#RUvSk%`{K+NMl8NT?)dQ(lt7ob!u<{ zbPaZG3%<|O4=Jt_13`H_GCs2**{*y&V#X!r#L8@_^iq~w0F{7UM9`W!1LH0dP0=2_ z`LM_;$uLD>tCkpU9e>nr*dP>w<4yS#f@x3~xW4QIq(s;$U-mVU5F$hL8KX5wkd9#v|sW5Er0RWcigRXe=jHTtV6^Ypx4A6 zA{$~Ch8ahGYUZG16KSd$lUDQl2FFF_Y=;rXMx-P{I9?f}{<7gz48)yz27=-sBb;`K z0pTV?K4-m>G2$m&UH`2Jxh%h^K z10hGxq&TxtLd+at&d}B^x>SPd2AQ1LI{@=WQ}v1w&DM3)P_=zM#2@3(s@CCScoh{? zL?9vQ%rG357|uR*Lij3^?#@*!R*;Ebm!m7zL5ehygOjYaye$}YS+Y%eGF6-og(NHA znJ4SZlgZRCBTy)3rUX$$Ae0uNl!n@plNiWJAR))jmLd{ZH41Sy;y31bB}S;#?_5S= zbDpH;c<_*dno))O1B;O}rNFZgR$e6C-joq3)KS9V2+Cy0{=5V=lkQe(_!jh5-q4Hl zbmYgARA59>bOwVouS!Sgt8K-ni|=kyFjH6nK@< zl{|x49*xLzjpPNvBhS{=0-AJ381ZmcGDZwQphQ6PU>Ys7F_2|*knBiflB83Z1rwt5 zE>F5R(Gv5{97z^aPucVY-~fCS8X$T3Y8tB$9Ea!7xXRI^5E8{p)_RC09MqG)BCcsN zBWr^alcp6TD5WY{O;$6#ygoQRRcnAQw>oK$NG@4b)NBrC5&z*hH7#o7!sRMMMDk%`lBp*Ywed+aUX}xrbo~pB7Prk`j@HwugSv^!a zEzbYBPEq{+a}|q$J7FqbGzdLA{#cp@K^ZA=TuORnw$5w8FhLV5li>!4cpy*?NgHNg zSg}I?Z^k;e>CDRz{0cXtFJss>$A^3 zuyFL#@Ip@CpPqX0<&%$pd*PKw;k%rC?5igqIq}-@2a#ss>6cEu^u07$fBLVFzjptz z(N_W1{np1%zwp4TuY5IIl1$o8ow)DSmmf~&*8dcavdnZjh>aDEw)PxHNX*|!-5#rezgX| zE{6{Ah2r9(%=$RO0w-220Vb>9*(ANZTLWGh@6v!HN+S(H91BZdaGqU!yw?_9FJ zj8rpN4}uOf;9Ap)BSi6N(o@HxBv}piFA~HlB?ty^I3_HhAj&v4B~Z$Ovp<>i1XJZg z-apiNu{Xqo=Um22am=xzDJB3(94CcElqg>Vr9Q`N2xQtvl0IY*)p>Zz4{F3TnyNwn zG(rdU1TM6yQxZ;`!!iQ4eZbj4$O2Qwmh$QemF_9fIL?`DM|X+XhhBB`=ddntRLN^~Tw zj@`Ov7q)z8fbF=BHRje)3k^FIyV%Mmuwj)Bdeu8($G$-O6?De;rll4G3Q|4H)T{!z zQf$W#@)1lYJ0JxBG|!x-I<8YB(2NZ80vNOyLL4Ti~=v9Gl-6x(X2^3Gf>KLq718v5jy zK63hdUnS*(6jv_>bGq+6|Jr>|FT8T!!m|&j6YZ34E2S%fb9$#`?1bFvtXfCBISeL9h1sH13pV^jlV2Dn={0I&Et{jV`+m}sB4L|+yZ!SFgxDZJv)@cScfYEUGT4{;< z9{f&py4TSUwUmZ)v8CE1X3f+9D(uwrpM^!$70r6B0gQC(Pd@RD*Pi(L!il+)AOEE3 zTk@Gs^kKWFj}Z38B7SuH?K|cB@-#2w%o+%-9pg39Z z%TB)dSYgOEfRV0$;Vbh+_bi?0Rim~E)70Jl8B?(vkIias#n=GTJUSfnXwD9Ak5kAZ zF!+^p2Ie7IVI>c>qikdUFoJ1_uDKt>ut?V2Nh_8xLF3qNeAftVw;fsd!jrGv_auBt zyr5}tMNgkNar!%7Uii*qCzDT{Jo>4TtH-W_g`!=}Nt^8bO5I05E-YbZWg&(~HGp=a z*%m@?6oW?{@xE64UA%Q*i$wlG0Vwz!lqOEvBNKrs?AUW32I;uO*3eDxF8it7NAYU! zbLa7c<{ZOM<_$~b+)n}KehCj$%>6WeNraB$7wMmppQq*L8Tt7<|HSSW@vDm^m*5P= z7Sv<0{UZiArWXs^Ap~U`zOQOy8)Fo+S z*Vv`rXSdj;u3(qgCBCqe=2G2aY=J9!;i7}OOEES@%kGCvb1fR(tgJd|e}sYU0kvh4 zjVAqSf{5vp(ujk4a|INYIUSgFdzs8qna*s^vs^M(G%c!2(MTKDyivVCZY=G{M?mdp z(uOL=fs^t?2Ktsa{cS&nPJ zzxK^d3Qg3KcEnV86Y~Ca6ahjN#dr`Q!h+VN&t=93p|pF>@5%{b3VE=FJ{xt*gE-mH zqh=3!1$)%Jf!MN*9yiis2R&}4$1U`@6%U+q!M;7c@fCK@l+AWY`>Eq6U_-A*5P%Ln zV7A<5I~7a|@z$#f$^ab_v%~~=Vnxy=LsDY6Nf&H-&J_OCWw?7#0VM-h;T3@nK7f!W z*ESrWz*OLM4;pcpMouD7zyiU?9ucQ4M86erT8oTALd3~>Prdws5`g}#@(3|4jwbW_34miLhmem2H?s80Pm1m8! z=Aw~%{CsgPS#uf9%$ro`v9o92e5)jKnSVrj;N0`M7+xsy`P=7}&s&@M{Eb$bl#A($ zt&+F^eC9a43eObz{IBPg&(}2b`B$wnDd+PqS|zcV&n=jN+>(e=YJo&v3$N9NP%U6d zcyf5*EJ?xTDT>g#wdcl|P;1U)Oa@WbSsIf*CUF6cNmE#|$h@uRi+Ri0ldI02gIii9 zkxTv+(h29YZnEWjTV-6X`}VX-;=J>|F+g5q_w0FNH~CCU z*z$v|GAZYC*eZ#|d~U%&zbXNPeg(XEHD+ zT4h%*I?pqS3us^(SBr|A`|`$n%DcN~_VFW!8%(&yfUCW6oq&CR*+Je30JKDv7+?cD71F zbsMjcU9fIz42T!U{_HvGIcm?DdhTGWJj;77Y?VabbM;n9sGj5H#S7PS%dS*@;T-iG zwdYJd_e`ri%X{v#t&+%l?s%&tRL}9+>>KX6mRAdwS;s1lz8{^VKBShNsSkh9D);g} z{6{8n0p0O4ZQUvI@;}ZOFPC%B|LyDe&3ncPdh_PsaL|zN= z0a_rD*W&M5wR+;m#YsQ*DAA>J*}aCe9h5d1O|0boZ4+yFJr6$=lE4qmD}kd)3Vz7r z#Sa;viMANmse)SJ^gA}gG6+|`$le{=r<)9@eNi~##v@E`P%!`WI1C-2=~f@?|JUis2z;8;Mh?<#a~Yn(T| zy@XrbXdjmPzR|)JocLbL=aBJ&b2HdQj4LSU(AgxN#p085IG+4rRDCk(cfvcYVw$<6 z(MuG!MUXgIjbr%46!!;E3GT1K3-^b3YvhCb6M!b&pVAZe6VTb~+k@Nbd~Bb-u5u3= zU&9GQ+)JSD^4%Wb2+E}UGtT;ReljoIU*LVt{Ux5*?EEYK!Fn}qcUJARBVGM$iyhUF z#shNDz9Rh4iZFg?Jm81M1Ab^c;D^QoerP;kzjmCg!W~OQ;WsD@lHV@3i{^Y?!>9NP zDSC15p@B)oafyHb)4TpB4y7TtBAi!Jo}!fbWWJ{dIb%rA$<8PCJB z*a11}u1D&mALnKZx5rw*xW5Cp=mJ^K3b2_oS^Gi0FgC(wdUrGOFy6h0-Y&-DAspH* z;Q&F0zGf6x2-jz~WMA0+z7fer7v)WQZsVI>N0z64S%G?+x>n7-9KhvBcJJk@#76eu z8X#PM(p=vw3KSbE>0ZTTli@vlMf1qX=bl-3>WNcd|J=eWkKq^$b{*58%(>32R{VZ2 zbuUHk#jOUI>Z&S?FDgQn^I*zSfuj((Oo4|V4o=ANFmO-p59Z4Ul7X9XbsX)S#r5C^ zcF18o?ED*byD&^hFVZNO$EM;%N9#~vXG4nNrB3D{RgX$V!)QJkXgYw0B)ttuhL$*q zheYdCA{v16xNkH&C&iuR`a6=8a zo`$1+Ko2&cy9%cPfn48+ES&8Fa+5-mI+;&;8xQiKPpLa@q+Z@hj|qC*LXW%YaVI^> z^!N}xCh75~^q8W@AJbz$9`n%lm8xv!b>9y_(z^o}j?&>=l7vt;FDFB|-;(ZzQCF?w z)U5-Bk#v&IHI$iydBb%CI4?|>-4Ir8z?Y-pIQGQ{1hM5D$63b&I41Y^_T5&276|-|XvwPeE~u;ScIx;4;qu|%MOxZNxJBYgQTzNVt#!yprMaQx!Xe{i{_{DN!E%ecyM^%-r3* z7_MzBdFIY{&iT$c-}zqWJCD76)&1vIm+`-3By8A@w{P6?_Bes>)h#bdlc8&3r&hhL z0J<-|;Ewb_I+m<78;x`->8bjaD6p)m6Qs#UC+c1_F>BWx)0>&t>j%5ShFP&Du8+OV z+i#vg*3hV5Qn#W@jEXm5lnFUO2|<`99br&mJQuwaJSbQ3y=iAAP5M3nRx&Hnq>%bBeJc! zm0C$x4V{e584HN^+s3t1Y0}vUeAlXga2?XkH0iGRjoHANu>rjKl8Z07^rF)ChGkXk z(l)bvQE79{DVt@pv^DVK#+9Yb+qc~$zb`6H`9WmwwZg~>ywXj6Z8kzF QT(yg9@ zeuPdmi<-ZcxDEqziyha? z;dr%--{V`h8`~p4poa8wvqCUQ84N&tvf{!EnJort{*1pVLSM%zPnrybmKjv+vcFFQ zW@>)ftVMoB09Kd`GCF3OtfAYQ_!|;8f&byebnKa7FaegBXv|_zy-UYGG=WjJHbIJZ zflT4VQmkkyN(y7z>yjbqL1acwC9AX32pIs%3|Ni0Y*fO~n09Iq1(0rli!_0R4sIb+ zf9S_S1%ePyPdoe4q!*)H1G}@_B)vBVRV%QnTO0@gB<^Mi(yH-*hJ7di8wYVEijhwA zOOuu3^=g`Q)y-g6)!*x-ankR_bx5%>?Gt~nj_C{qJGlJV=RZ?=FX;$m0T%q2f9Sgm zSsf7vQkKV`Vn9#DuUebRHNUb8Lpy))0hJIZ9ed1L%(6dbZ?sQKR)Xy5)ZM9lrhS(E zXZVo}D+x9VV!JF&R`GaK&eQEP?46XelO@piQhSFK?F_7%WrkJ;SRMIQb2j_z=1*g~ z9;H)U!d`E0kfOb4s|hKB^5fCyl607{5)Y9k>nj zJ&;Pl(u@V(-xnEXt>*7FrlH7YjVLg^5R${gHkWj&JVc13r*4wMF`$&8)*P>D?Ng|_ zs2+yfku?8Wd#ltksQ3cX?vwO_>Txu^>BE~sXGTif7uxTayhDz+NGe&tR9E|;E6lhS z8G(;M6+$H&%xV>l_n6hd(E*H;ZG{-gg9s`VDpgHZGl$Tz!l}baJSH1C7}k zGN>VBaF)BpEHq-nF4V|2WFsrek_ez`9nSOk4H+^uxc2+(F=53)t@eyEY#X%cl)(FU zSysc?YX%+$C2P!Z7&jUa$S_aYsVZ9s67raX{p)ooaO5LNM>;pS2hwa<4F`Hq4-JDL zGzHKI6N=0_q~4gWnW$Th(?A0>G=o|25mPo<(SU$3dHXy#5g-tIA?ef>7xF#${7rSb3IY1}4exN>dA}ew$7JO$4cN#Ftwi!Ah%x(=0 zxBVK7LdJXjY_eFLI_G z@)|p1Z$ttaA z_#}49vU7s{G5h0EqtHxyMopO#AqBLeV)K=PR9a48_{xe`v#cTPBq)4ev3_U_@#?Z+ zU&H;Zn+k61b*dys%cPX_hV>?-ne?vX)f^9K3qt{}H^dwr6z~w^gePAK99E1(LoOgG zm;*o$m1B5yS;{b%*&8?!`Iu04FrYo~GoAg&Y}TYg`y=)`R(JO0_J>68jTC&i0;h>i zlr-kFWI){hQt8N7?mqJP*Ob~+{unBXjOeX0ibvdJBf1$qIi8SquCR|G)8-xrT~t8( zD*I|_^Je=R`EiAPrQ{a}6go68wKFIQYXvn7z9HT>DD zC|;-vNgib^qR(lB9Ly%bCfrm>Bir1t;M8GIcVT$fs9BoUDav;qB?pz}5`onSl+7kT z@(xvz^rL`CRkDT~ki!k8OS0ncUs46dw=FUdH%^U8qvk}4v*x7Ll0XiiA&!mij?~1A zDhbpfow$yhk;5}DdDVCXhqp)&Y6`^0li33~h=m4`OzN^~#HB%0*@6V2FE=RnR~1<{ zNPEbkk$Ol8dgE?bwQA0uVBE8!39nI~PPeKnI; zCh4>+Dc4ky830O7i&&cVI!}$1o}LF$@MPolw{F{3n%aEBb)}D9cf;mSY`c2%mMu45 zw|)D^Q>l4vUQ_}nh&7mXD6?&wr?%d@c`M3o+je!$8TW%3U8ax16Z;@6Gi1$@&V3dG zh&R#mK}m(59P`|OsBcG@fpLIr93^21br|X04(@co#fxDPBTPZLwLQ{WWr!FwBp}2A z!TYTCL1#`mT~bKT$*(#lG*HX6xb9&ni`}O@`(QRmEirtfl!*LB&DvwtpuE_5rD^4q zSYj6?O3<(JpbQvcqqDrC)Y&EqI{89n?vH?~lpwQq~R(9nR9Vu}- zsX9r~9$%5$7G>ff0rVt-G)@4r=XDI^n8<%7f#SM1=J3GUjnGPDPy~-WeoesPViuWz zV*S1&50gM^B6Hsm#_l$;>RioZBVv+@~vzJOwiHKz_2 zK(}D))xr09`XSYILlBh5Bl4MvWT*1=h!|&?lgl$v>8Bz&11bhPji7aN7mT}bG(~&x z%ZEi)Sw<-eTeHM)>-eK~!-k;{9B>pn8}~`G~G$385a+X^UkpBwad=q8MqdM&Y4@R%ALg zh+(V=vq7g;0v~aUEc8O&g>VuwM4GS_o8q=f=Y=2J^yzco|4F6$`#Fne?Ipq>c~$Ho zvLSY1l+(yh%^Z|$B26u0)M|d;#H)^?u6SsgSap{#J~J$w7u;62tRj{Ev=EA=%y&T6iDj@ zl|l7!eWlS5$9ZwIh%pRvh9$|chLYZb5J-jd70;lh{w_>AwSZ7DNnl>0bt9d&5fNsm zVIbt_nG|O>%7~dG%o*CcN9RaT-5`?_d%I!YXsTW{qS=Oy8mhjp1^Yt|ty%*%25pED3Fn3XG1A?s)K!g1S+I`vZ%S6Qw}25LR9|-QJKPDYQ|-;0VfOiGz6!Y9`&Q)bJl+wDOLg zp64S!p11-MNzoYyY2KCs<+E%Ou-uvBT9@PEj$`)1hB+CWh_PH#cDxDOi6f_$7b(yx z!z*Qmvobm%FICDbf=8aMs|7Uajv(=HRx&~ifuV#!^I#e+v@wu*bC~Q%bC#rAhy@d( z^e#`jIMWjI&ODJUrk=9t3BYdnC?p_x{c0Mk5;zXep>b8BPbEkcFInv&ny^Pt{)%af zCNr`&C^2bTF@jR6k=0~1)6eUJGt>1ZXsgvpyCkJ#RZ){UoJIVHG0k__OTazKx~9V~XybPh}6t9e?7A8?`_tOerN%4tebp;9SLL1D3%S!h?P z$dr8m7-&tJG+Wr&D>44$Co=0ym_k^zoUzL;)xs})uuh!9M#zlJp^Us`d| z^VtEenq3pD-<#y#iJ4%yw&pUY;y{r$g=;wjOF2b>3@C;;veL;PN%;{@4&!o&Rzrwr z30~utF+v)2K09gT2PVsJ!~X;6T}=%j21Vdpt#Hv-X26B2Ri34@CBvMgY(bcm9g~h1 z(`7+Nn>lV^F}do@IPi^C`i4S!)knaX;*{iLXtFQ8el5M1A<A|AAmV{QIV2sJ zePP84%$x6O7*LVB%#ZB6No{z@c39{=9Co*j^}(bIszw&%?xm%LBlkalfk~v68g$3hgM^ zun=ftGX$0sBcjLpg^3O=o5O4)P4AJ(uwe_#WLFl@)L^y}lz-y_T&}stlz267Y*>&= zgG}(X3*h3mq?0Qb)lZA-D6p_uVN@k>hu}#kKN2Tvlt)2vZw$D^aIN z#thz~V6RM$xB8s)V=~9vkubhk*RVGveT>r3>w%dG6UOML zwv5!!I%d~!A~a!Uv6hN0JmU@8ws4wCAT}fw4&De6x#Gk_Zl@}~UgfHTqe-5yF`_Qz zH>wokJG4a2WL5EQReVTg+Ko+e<8H%3y~7H$H+BVBfR$rAc2vFrm2ZDo8~@{*wjze8 z0K0r|Mw&3HAkhBaemnM2vVd&A?YNF5Fx`&YK8BzjhumJ(#x)8Hy0Hs>$n>=8!#`ta zK**o*3k;<;b5v7z043dFJc#y5KO*5Y(KKs&&DjwC3wHXzYaqI)tb@p!tWrbB=_V(m;>TDRCp4>NW&uUmDLx0x-Lg1lE?8X6 zSjP8R4cge;jmpIC9Qm8i9Q)q$q+8L3OBRAT-}jz=^Y+K)U%P$&$uFdnos{oV%2x#E z_>R7M-;qZjPA7XP-(?v-1-=`bqFVH6XhENdHPI*cDn-JAq0PEefT0D0nJHe_sYNNs zy%BR|#qL=-81JL6ev!0VPo9dbn_Fh3jvu?{+w=E+MUbSEYczqH!05R9C#px=@t*8< zZ=n~BUYCOFo@zHD>!t=!Wk;WW6wZZiXf|q1VB}kSz>7MxKJO5zg?#9WN;K<>*5% z&wuWPqYvJB?73&(eC&l|&)qeD@aau&y!@&-fw^d1_|`>AV_6v)2Jsr<1|E6g{(|5& zfzgt|`DYIm{mpc;U&-F0ltqI!l#vEI3EB3uA!WVAUIN7c!f}t{4RBvMhb&-|y6Fzg zy|Ow?-gQ^S#@=TH0udQ>9|fV4RkzS;E6iE5+4@{c+pUuh*_4*Lk0Wnb#+J`QaFNhM zfzUJ%F+o}!Km+i5wRqn6)rGwq`3FVD;Pp_RIO&W`M9gR_p=xnlW)5^b-tO1% z)864efft(lBwm@el=ivL0Or1m2ioR7i}#7N$^8!1kT-|4+i{vxP;SK*{iL}??ho*d z9{hv6ekiYhl-G~>6?5h8Kj8y}AZ>G;nzgaoLzkG=lP{Ok80NxpRC;9r(58NV0~DM_(N$zXH)+Ll~l)W<-xxEv=v*l$0x zKq;}Z5Md_KWCzwIkr2KnE_IpJh-+LPdKyCZ0$m~_yW%c2IL3BAHF02lCpxRQ9Ag8x z?DWVq-2!dF%B`FBf*9BeQE!ng8B~*AObL|&9oG9up!&?2z--tp%1T?hvw2*rY^=AI z(ylTtA8H;K_c@RPy5bx1dO=>_lGnHSMgA|HEI?rJVA6s5q$hG`ll-kCmLdT40^3Cqi8Q;svV6;xRq+P7UCDIj-ol2|17)2Q&2 zurHJQ(#7O{s!duI$o=(pQ9OZyrXRe0H1}E3FiX=aB3Q0~&G>eSNf3-`moMV5{E{ao5a>d$e zaj8h>?v9h=buLm0t!9cHsPF_`&z0?>SV-qKEJAKnL_f8GB5%b*?J|s-xNs=;9UGN) zqdP4Y5H_DA4x-K+&#ZjB-I33Q^vZTowjkT)>PSO}s&yF{SX}i?R$M8#x;$3v=-pQ_j zqFsFHRMEAS%lE>G6ZiRcQRD*htc=gecKQ06c2yMV{PR;v=Ore!KWmprITQc1T@-Jd z&TVUqMSD_;wEpcWrS%fg>R-1@sGQdS$0*)Kg>i1S-{*~&Z!e9#`PD=7hp|Hc?9o?0 zD|^r`xT0xs{*2C(<3KM`z^o`fxOk$n5fz@G1HGN@Lg- zc_jVrTTuV1Q%e2I7gPVE?UF8M>5sIFVj=b0n0w^Y7ZI#BP~@!;W6=hRycOaPw1FaT z#XapR8RgUQ?wxnfr#qpYK%ZT}?&}ei#Ez4V@;Y5!XUgkrel5K%f%?3(xA#GmlA!7B z3)uAdb*Y%j#TkkNjH0FTb|;V%+X>}5*H)Ww)ZRe%`1R*|9=U&=IE}>D_w7u_aI{XH z9=Z$XlL{+v^x4AArX;|8qjaH`h4cL6))%HnOKxG+Pp|3q3ryXFSLt#ol;b~hxp zt?~x0$>qFnZgLebC*wsCF}x_Ofi#*U<3(0}vV(WYO)igdB`vTOZVB^u#H0BaF>C%s zT$_Ip`{rN7$JrOIXeE9{z3Red9P1d??AZjxHNSMdBOmub(~{*CYGKmLSBQqQ^)hZR zaQ~gOR>yYv(&q=6yC8v~gPZrfnayR~ghvvozOS)xjVZoY^I2~Ez*!({c*ZpzbV3hD z71eP=9P$1knm&aCGU3fuG0$AmZ4~xiC;hl;*!Ql@cVp8DLb))M2j<_8c#)joMS2`n#mNfX zUqu*JqHYLDr`+7yMDJC68nTe7AJbYNL9Tp`>z?DP4|2_O zaW+zP>u8SJy^x!crLLr7m4#$IzANcmCE!Uf&cDv&7x(CHbf|O1l%)&zbILJL_bOn= zj(>eOd5y37h2$8w@+9c*XvmU_M9fGKf(9& zM%Y~JZb3QsI(l4BkF9vzjWg0^oO$Tdx4PoS=Em%%>-9BE^Kl#dJ|ubT{`mA!Zuy@>r$$}E$)pU;q3oFij z$iX;JQtemfdmoaa>v3xz?Fhxq?7O$fxkPMt9dp-%^GSh{kQ~A$=LIJYQRjxH9HUDe zJw&e4R4$UTL&?yhqld_HdQ+B>CC(rs(|c4VlFvif+L)dFL)tk_X2E&5h+(DS@M)l9dfZ8mfF7Tx$3A-8O^+|q;|_X!njZJj<2HI6!s8H(M71W~pz8vV z^l!mkuXF&KI3X18HyOdzqI4~fy5$}xn;j^Sq?4k)>|KNP2NaOz{p3_e;n-<$jr;_wh5OXDq$c$I7RYktJ`r zNJhk+;3LfXZVUHeV1TDa7pE34wZP5#i~m@_u!a>but?+%Rlc^u{W>8VUGlz)WN3+7 zDxkNU?xt{`Ap`>pE~Rk4!D!*KEWU*z87W>Kq3@nZhKo>j<3w{-ea}QPDp?oYE|Cl@ z`kM&)Qbt1`0&~Al?b0LObR)zgJ8|$azg)a{o5;bW*YW7~2e2=A8D2^`E|32|l||R+ diff --git a/docs_m2met2_cn/_build/doctrees/数据集.doctree b/docs_m2met2_cn/_build/doctrees/数据集.doctree index 26addbfc8c91bbce668c3a09b5cd6c6a9dc6f2df..da11ffcf8d4d51532c99e321c868b62a0c718312 100644 GIT binary patch delta 152 zcmX>U|23Yafo1BejVx?zjMFF2WnVKnN?2raFh>^O`#n9bWr;aYC-sD+7MDyu!ja68 z!5%v$Lo5QHy3I#8)^ajF*c>XN#3&M@3N*ynM9Xd`Qda(z!@w6Bq`YlxAa!LzQVPbc)oYGjm@6ck(O>!;)=7we_x z=cnhS>Lusr>Zj!w!Ap#5Et4Uep_3t6$WDPX41S z!Pq@nR8K2WoQZ)UO9V({^k#Hr?E%`DwG%|}fyB9iL}?FaUSe))No7H5F(^2*Ksq`z zdON|2JHZM&OEVNfQebmf^0-7*+f!pUJcI%Q&TgI=vZzswPVs_;Lz@ LcoHBL`3B+ujcmlL delta 253 zcmcbS(w4~Dz%sSkU?XdV@@9Y48BC1#CmSdmPBvB*+N`aymx)nn^GmHi%!~<>dG)0Th{`Z^{@u`JBE4WB%ml`danB85tO|egKJ#-i)rS zJs{pr5b+5_yay7cJ)C)oxv3?U1*yeTYNu#qfpm0c^mc+3cY+mmmS!k|q`>CzWOQZd zf(^--1U4b-0N5DbwEX;%y!?{X_@dM_kWoFn$(fTU7~Ezwn4Du+#`zDZ%9{~r#^l$A QzMNk{JPDxw$&NRIW2;vfn_T9MwX+DjB=CzGRkpI$zYG2k|EXu=1o>-N@A4R{Ejt-N&4yfolkc4 zg`^gj=o%Rq89&?F{bK#pm$Td6@9CLr$SN^8p2LjiMMKZC8S9_*Pk6F#_GB4OX`wJj zmuD*$KAqS7bl>!+9UGp{YMK0=LlI_41oe&H?8%t~^@{M6#0+(yUo>N<^sp4?l%`LC xcxtmC_X1u<-O0{U@tXxD4H+4AH#TbY@71Om5L}0RRb#f2RNd delta 462 zcmbOhaUznXfo1C6jVwnQ8MP*VWRw$C(0HtTbVJ*d9g}~t*iIJYkZ^jozW>RNRWH_Wd9i-(lU=)> z&uV!-Xa3W5^PWxL_Hz1`XC0g0@9BBb(DQ7@`loYNgrpXi=o%Rq8NZwjG^6WT|AZ&| zW=}3)RpfcGf98{2v!8diKU=qTayEywbXj7`vlR=U&g*{BxBcn9>5m(lpw5gK5clw- zxo7h>j#Ow^$|Hx^l*9~mU|?#-PU&GO&M8fw(xU+iZ!TSUaBt4$TENSwFnO+I{N`*4 iLqIx{4uP~+p0whI8V_wcp= diff --git a/docs_m2met2_cn/_build/html/_sources/数据集.md.txt b/docs_m2met2_cn/_build/html/_sources/数据集.md.txt index 52965a1c5..24bfaf3d9 100644 --- a/docs_m2met2_cn/_build/html/_sources/数据集.md.txt +++ b/docs_m2met2_cn/_build/html/_sources/数据集.md.txt @@ -3,13 +3,13 @@ 在限定数据集条件下,训练数据集仅限于三个公开的语料库,即AliMeeting、AISHELL-4和CN-Celeb。为了评估参赛者提交的模型的性能,我们将发布一个新的测试集(Test-2023)用于打分和排名。下面我们将详细描述AliMeeting数据集和Test-2023测试集。 ## Alimeeting数据集介绍 -AliMeeting总共包含118.75小时的语音数据,包括104.75小时的训练集(Train)、4小时的验证集(Eval)和10小时的测试集(Test)。Train集和Eval集分别包含212场和8场会议,其中每场会议由多个说话人进行15到30分钟的讨论。Train和Eval集中参与会议的总人数分别为456人和25人,并且参会的男女比例人数均衡。 +AliMeeting总共包含118.75小时的语音数据,包括104.75小时的训练集(Train)、4小时的验证集(Eval)和10小时的测试集(Test)。Train集,Eval集和Test集分别包含212场和8场会议,其中每场会议由多个说话人进行15到30分钟的讨论。Train,Eval和Test集中参与会议的总人数分别为456人和25人,并且参会的男女比例人数均衡。 该数据集收集于13个不同的会议室,按照大小规格分为小型、中型和大型三种,房间面积从8到55平方米不等。不同房间具有不同的布局和声学特性,每个房间的详细参数也将发送给参与者。会议场地的墙体材料类型包括水泥、玻璃等。会议场地的家具包括沙发、电视、黑板、风扇、空调、植物等。在录制过程中,麦克风阵列放置于桌上,多个说话人围坐在桌边进行自然对话。麦克风阵列离说话人距离约0.3到5.0米之间。所有说话人的母语均是汉语,并且说的都是普通话,没有浓重的口音。在会议录制期间可能会产生各种室内的噪音,包括键盘声、开门/关门声、风扇声、气泡声等。所有说话人在会议的录制期间均保持相同位置,不发生走动。训练集和验证集的说话人没有重复。图1展示了一个会议室的布局以及麦克风的拓扑结构。 ![meeting room](images/meeting_room.png) -每场会议的说话人数量从2到4人不等。同时为了覆盖各种内容的会议场景,我们选择了多种会议主题,包括医疗、教育、商业、组织管理、工业生产等不同内容的例会。Train集,Eval集和Test集的平均语音重叠率分别为42.27\%和34.76\%。AliMeeting Train集,Eval集和Test集的详细信息见表1。表2显示了Train集,Eval集和Test集中不同发言者人数会议的语音重叠率和会议数量。 +每场会议的说话人数量从2到4人不等。同时为了覆盖各种内容的会议场景,我们选择了多种会议主题,包括医疗、教育、商业、组织管理、工业生产等不同内容的例会。Train集,Eval集和Test集的平均语音重叠率分别为42.27\%,34.76\%和42.8\%。AliMeeting Train集,Eval集和Test集的详细信息见表1。表2显示了Train集,Eval集和Test集中不同发言者人数会议的语音重叠率和会议数量。 ![dataset detail](images/dataset_detail.png) Test-2023测试集由20场会议组成,这些会议是在与AliMeeting数据集相同的声学环境下录制的。Test-2023测试集中的每个会议环节由2到4个参与者组成并且与AliMeeting测试集的配置相似。 diff --git a/docs_m2met2_cn/_build/html/_sources/简介.md.txt b/docs_m2met2_cn/_build/html/_sources/简介.md.txt index cf4dfb86a..4ae9b6759 100644 --- a/docs_m2met2_cn/_build/html/_sources/简介.md.txt +++ b/docs_m2met2_cn/_build/html/_sources/简介.md.txt @@ -25,4 +25,6 @@ ASSP2022 M2MeT挑战的侧重点是会议场景,它包括两个赛道:说话 来自学术界和工业界的有意向参赛者均应在2023年5月5日及之前填写下方的谷歌表单: +[M2MET2.0报名](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link) + 主办方将在3个工作日内通过电子邮件通知符合条件的参赛团队,团队必须遵守将在挑战网站上发布的挑战规则。在排名发布之前,每个参赛者必须提交一份系统描述文件,详细说明使用的方法和模型。主办方将选择前三名纳入ASRU2023论文集。 \ No newline at end of file diff --git a/docs_m2met2_cn/_build/html/_sources/赛道设置与评估.md.txt b/docs_m2met2_cn/_build/html/_sources/赛道设置与评估.md.txt index a99c1eddc..94a623690 100644 --- a/docs_m2met2_cn/_build/html/_sources/赛道设置与评估.md.txt +++ b/docs_m2met2_cn/_build/html/_sources/赛道设置与评估.md.txt @@ -1,6 +1,6 @@ # 赛道设置与评估 -## 说话人相关的语音识别 (主赛道) -说话人相关的ASR任务需要从重叠的语音中识别每个说话人的语音,并为识别内容分配一个说话人标签。图2展示了说话人相关语音识别任务和多说话人语音识别任务的主要区别。在本次竞赛中AliMeeting、Aishell4和Cn-Celeb数据集可作为受限数据源。在M2MeT挑战赛中使用的AliMeeting数据集包含训练、评估和测试集,在M2MET2.0可以在训练和评估中使用。此外,一个包含约10小时会议数据的新的Test-2023集将根据赛程安排发布并用于挑战赛的评分和排名。值得注意的是,组织者将不提供耳机的近场音频、转录以及真实时间戳。主办方将不再提供每个说话人的真实时间戳,而是在Test-2023集上提供包含多个说话人的片段。这些片段可以通过一个简单的vad模型获得。 +## 说话人相关的语音识别 +说话人相关的ASR任务需要从重叠的语音中识别每个说话人的语音,并为识别内容分配一个说话人标签。图2展示了说话人相关语音识别任务和多说话人语音识别任务的主要区别。在本次竞赛中AliMeeting、Aishell4和Cn-Celeb数据集可作为受限数据源。在M2MeT挑战赛中使用的AliMeeting数据集包含训练、评估和测试集,在M2MET2.0可以在训练和评估中使用。此外,一个包含约10小时会议数据的新的Test-2023集将根据赛程安排发布并用于挑战赛的评分和排名。值得注意的是,对于Test-2023测试集,主办方将不再提供耳机的近场音频、转录以及真实时间戳。而是提供可以通过一个简单的VAD模型得到的包含多个说话人的片段。 ![task difference](images/task_diff.png) @@ -12,6 +12,6 @@ $$ \text{CER} = \frac {\mathcal N_{\text{Ins}} + \mathcal N_{\text{Sub}} + \math 其中 $\mathcal N_{\text{Ins}}$ , $\mathcal N_{\text{Sub}}$ , $\mathcal N_{\text{Del}}$ 是三种错误的字符数, $\mathcal N_{\text{Total}}$ 是字符总数. ## 子赛道设置 ### 子赛道一 (限定训练数据): -参赛者在系统构建过程中仅能使用AliMeeting、AISHELL-4和CN Celeb,严禁使用额外数据。参赛者可以任何第三方开源的预训练模型,如[Hugging Face](https://huggingface.co/models)以及[ModelScope](https://www.modelscope.cn/models)上提供的模型。参赛者需要在最终的系统描述文档中详细列出使用的预训练模型名称以及链接。 +参赛者在系统构建过程中仅能使用AliMeeting、AISHELL-4和CN-Celeb,严禁使用额外数据。参赛者可以任何第三方开源的预训练模型,如[Hugging Face](https://huggingface.co/models)以及[ModelScope](https://www.modelscope.cn/models)上提供的模型。参赛者需要在最终的系统描述文档中详细列出使用的预训练模型名称以及链接。 ### 子赛道二 (开放训练数据): 除了限定数据外,参与者可以使用任何公开可用、私人录制和模拟仿真的数据集。但是,参与者必须清楚地列出使用的数据。同样,参赛者也可以使用任何第三方开源的预训练模型,但必须在最后的系统描述文件中明确的列出所使用的数据和模型链接,如果使用模拟仿真数据,请详细描述数据模拟的方案。 \ No newline at end of file diff --git a/docs_m2met2_cn/_build/html/genindex.html b/docs_m2met2_cn/_build/html/genindex.html index 98b17f018..5558bcf2f 100644 --- a/docs_m2met2_cn/_build/html/genindex.html +++ b/docs_m2met2_cn/_build/html/genindex.html @@ -80,7 +80,7 @@
  • 赛道设置与评估 diff --git a/docs_m2met2_cn/_build/html/index.html b/docs_m2met2_cn/_build/html/index.html index 5e92ef3d7..fbc2fceac 100644 --- a/docs_m2met2_cn/_build/html/index.html +++ b/docs_m2met2_cn/_build/html/index.html @@ -85,7 +85,7 @@
  • 赛道设置与评估 diff --git a/docs_m2met2_cn/_build/html/search.html b/docs_m2met2_cn/_build/html/search.html index 0257e4c2e..4fe0684db 100644 --- a/docs_m2met2_cn/_build/html/search.html +++ b/docs_m2met2_cn/_build/html/search.html @@ -73,7 +73,7 @@
  • 赛道设置与评估 diff --git a/docs_m2met2_cn/_build/html/searchindex.js b/docs_m2met2_cn/_build/html/searchindex.js index 5a84a48fe..e9ff2b1c1 100644 --- a/docs_m2met2_cn/_build/html/searchindex.js +++ b/docs_m2met2_cn/_build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["index", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "filenames": ["index.rst", "\u57fa\u7ebf.md", "\u6570\u636e\u96c6.md", "\u7b80\u4ecb.md", "\u7ec4\u59d4\u4f1a.md", "\u8054\u7cfb\u65b9\u5f0f.md", "\u89c4\u5219.md", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30.md"], "titles": ["ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u7ade\u8d5b\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "terms": {"m2met": [0, 3, 5, 7], "asru2023": [0, 3], "m2met2": [0, 3, 5, 7], "contact": [], "funasr": 1, "sa": 1, "asr": [1, 3, 7], "speakerencod": 1, "modelscop": [1, 7], "todo": 1, "fill": 1, "with": 1, "the": 1, "readm": 1, "md": 1, "of": 1, "baselin": [1, 2], "aishel": [2, 7], "cn": [2, 4, 7], "celeb": [2, 7], "test": [2, 6, 7], "2023": [2, 3, 6, 7], "118": 2, "75": 2, "104": 2, "train": 2, "eval": [2, 6], "10": [2, 3, 7], "212": 2, "15": 2, "30": 2, "456": 2, "25": 2, "13": [2, 3], "55": 2, "42": 2, "27": 2, "34": 2, "76": 2, "20": 2, "textgrid": 2, "id": 2, "openslr": 2, "automat": 3, "speech": 3, "recognit": 3, "speaker": 3, "diariz": 3, "rich": 3, "transcript": 3, "evalu": 3, "chime": 3, "comput": 3, "hear": 3, "in": 3, "multisourc": 3, "environ": 3, "misp": 3, "multimod": 3, "inform": 3, "base": 3, "process": 3, "multi": 3, "channel": 3, "parti": 3, "meet": 3, "assp2022": 3, "19": 3, "12": 3, "asru": 3, "workshop": 3, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "star": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yzj": 4, "alibaba": 4, "inc": 4, "com": [4, 5], "sli": 4, "zsl": 4, "yanminqian": 4, "sjtu": 4, "zhuc": 4, "microsoft": 4, "wujian": 4, "ceo": 4, "buhui": 4, "aishelldata": 4, "alimeet": [5, 7], "gmail": 5, "cpcer": [6, 7], "las": 6, "rnnt": 6, "transform": 6, "aishell4": 7, "vad": 7, "cer": 7, "ins": 7, "sub": 7, "del": 7, "text": 7, "frac": 7, "mathcal": 7, "n_": 7, "total": 7, "time": 7, "100": 7, "hug": 7, "face": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"asru": 0, "2023": 0, "indic": [], "and": [], "tabl": [], "alimeet": 2, "aoe": 3, "contact": []}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"\u8054\u7cfb\u65b9\u5f0f": [[5, "id1"]], "\u57fa\u7ebf": [[1, "id1"]], "\u57fa\u7ebf\u6982\u8ff0": [[1, "id2"]], "\u5feb\u901f\u5f00\u59cb": [[1, "id3"]], "\u57fa\u7ebf\u7ed3\u679c": [[1, "id4"]], "\u6570\u636e\u96c6": [[2, "id1"]], "\u6570\u636e\u96c6\u6982\u8ff0": [[2, "id2"]], "Alimeeting\u6570\u636e\u96c6\u4ecb\u7ecd": [[2, "alimeeting"]], "\u83b7\u53d6\u6570\u636e": [[2, "id3"]], "\u7b80\u4ecb": [[3, "id1"]], "\u7ade\u8d5b\u4ecb\u7ecd": [[3, "id2"]], "\u65f6\u95f4\u5b89\u6392(AOE\u65f6\u95f4)": [[3, "aoe"]], "\u7ade\u8d5b\u62a5\u540d": [[3, "id3"]], "\u7ec4\u59d4\u4f1a": [[4, "id1"]], "\u7ade\u8d5b\u89c4\u5219": [[6, "id1"]], "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30": [[7, "id1"]], "\u8bf4\u8bdd\u4eba\u76f8\u5173\u7684\u8bed\u97f3\u8bc6\u522b (\u4e3b\u8d5b\u9053)": [[7, "id2"]], "\u8bc4\u4f30\u65b9\u6cd5": [[7, "id3"]], "\u5b50\u8d5b\u9053\u8bbe\u7f6e": [[7, "id4"]], "\u5b50\u8d5b\u9053\u4e00 (\u9650\u5b9a\u8bad\u7ec3\u6570\u636e):": [[7, "id5"]], "\u5b50\u8d5b\u9053\u4e8c (\u5f00\u653e\u8bad\u7ec3\u6570\u636e):": [[7, "id6"]], "ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0": [[0, "asru-2023-2-0"]], "\u76ee\u5f55:": [[0, null]]}, "indexentries": {}}) \ No newline at end of file +Search.setIndex({"docnames": ["index", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "filenames": ["index.rst", "\u57fa\u7ebf.md", "\u6570\u636e\u96c6.md", "\u7b80\u4ecb.md", "\u7ec4\u59d4\u4f1a.md", "\u8054\u7cfb\u65b9\u5f0f.md", "\u89c4\u5219.md", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30.md"], "titles": ["ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u7ade\u8d5b\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "terms": {"m2met": [0, 3, 5, 7], "asru2023": [0, 3], "m2met2": [0, 3, 5, 7], "funasr": 1, "sa": 1, "asr": [1, 3, 7], "speakerencod": 1, "modelscop": [1, 7], "todo": 1, "fill": 1, "with": 1, "the": 1, "readm": 1, "md": 1, "of": 1, "baselin": [1, 2], "aishel": [2, 7], "cn": [2, 4, 7], "celeb": [2, 7], "test": [2, 6, 7], "2023": [2, 3, 6, 7], "118": 2, "75": 2, "104": 2, "train": 2, "eval": [2, 6], "10": [2, 3, 7], "212": 2, "15": 2, "30": 2, "456": 2, "25": 2, "13": [2, 3], "55": 2, "42": 2, "27": 2, "34": 2, "76": 2, "20": 2, "textgrid": 2, "id": 2, "openslr": 2, "automat": 3, "speech": 3, "recognit": 3, "speaker": 3, "diariz": 3, "rich": 3, "transcript": 3, "evalu": 3, "chime": 3, "comput": 3, "hear": 3, "in": 3, "multisourc": 3, "environ": 3, "misp": 3, "multimod": 3, "inform": 3, "base": 3, "process": 3, "multi": 3, "channel": 3, "parti": 3, "meet": 3, "assp2022": 3, "19": 3, "12": 3, "asru": 3, "workshop": 3, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "star": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yzj": 4, "alibaba": 4, "inc": 4, "com": [4, 5], "sli": 4, "zsl": 4, "yanminqian": 4, "sjtu": 4, "zhuc": 4, "microsoft": 4, "wujian": 4, "ceo": 4, "buhui": 4, "aishelldata": 4, "alimeet": [5, 7], "gmail": 5, "cpcer": [6, 7], "las": 6, "rnnt": 6, "transform": 6, "aishell4": 7, "vad": 7, "cer": 7, "ins": 7, "sub": 7, "del": 7, "text": 7, "frac": 7, "mathcal": 7, "n_": 7, "total": 7, "time": 7, "100": 7, "hug": 7, "face": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"asru": 0, "2023": 0, "alimeet": 2, "aoe": 3}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0": [[0, "asru-2023-2-0"]], "\u76ee\u5f55:": [[0, null]], "\u57fa\u7ebf": [[1, "id1"]], "\u57fa\u7ebf\u6982\u8ff0": [[1, "id2"]], "\u5feb\u901f\u5f00\u59cb": [[1, "id3"]], "\u57fa\u7ebf\u7ed3\u679c": [[1, "id4"]], "\u6570\u636e\u96c6": [[2, "id1"]], "\u6570\u636e\u96c6\u6982\u8ff0": [[2, "id2"]], "Alimeeting\u6570\u636e\u96c6\u4ecb\u7ecd": [[2, "alimeeting"]], "\u83b7\u53d6\u6570\u636e": [[2, "id3"]], "\u7b80\u4ecb": [[3, "id1"]], "\u7ade\u8d5b\u4ecb\u7ecd": [[3, "id2"]], "\u65f6\u95f4\u5b89\u6392(AOE\u65f6\u95f4)": [[3, "aoe"]], "\u7ade\u8d5b\u62a5\u540d": [[3, "id3"]], "\u7ec4\u59d4\u4f1a": [[4, "id1"]], "\u8054\u7cfb\u65b9\u5f0f": [[5, "id1"]], "\u7ade\u8d5b\u89c4\u5219": [[6, "id1"]], "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30": [[7, "id1"]], "\u8bf4\u8bdd\u4eba\u76f8\u5173\u7684\u8bed\u97f3\u8bc6\u522b": [[7, "id2"]], "\u8bc4\u4f30\u65b9\u6cd5": [[7, "id3"]], "\u5b50\u8d5b\u9053\u8bbe\u7f6e": [[7, "id4"]], "\u5b50\u8d5b\u9053\u4e00 (\u9650\u5b9a\u8bad\u7ec3\u6570\u636e):": [[7, "id5"]], "\u5b50\u8d5b\u9053\u4e8c (\u5f00\u653e\u8bad\u7ec3\u6570\u636e):": [[7, "id6"]]}, "indexentries": {}}) \ No newline at end of file diff --git a/docs_m2met2_cn/_build/html/基线.html b/docs_m2met2_cn/_build/html/基线.html index 322a26197..f28043ea7 100644 --- a/docs_m2met2_cn/_build/html/基线.html +++ b/docs_m2met2_cn/_build/html/基线.html @@ -89,7 +89,7 @@
  • 赛道设置与评估 diff --git a/docs_m2met2_cn/_build/html/数据集.html b/docs_m2met2_cn/_build/html/数据集.html index 276f7b76e..ddefcc188 100644 --- a/docs_m2met2_cn/_build/html/数据集.html +++ b/docs_m2met2_cn/_build/html/数据集.html @@ -89,7 +89,7 @@
  • 赛道设置与评估 @@ -131,10 +131,10 @@

    Alimeeting数据集介绍

    -

    AliMeeting总共包含118.75小时的语音数据,包括104.75小时的训练集(Train)、4小时的验证集(Eval)和10小时的测试集(Test)。Train集和Eval集分别包含212场和8场会议,其中每场会议由多个说话人进行15到30分钟的讨论。Train和Eval集中参与会议的总人数分别为456人和25人,并且参会的男女比例人数均衡。

    +

    AliMeeting总共包含118.75小时的语音数据,包括104.75小时的训练集(Train)、4小时的验证集(Eval)和10小时的测试集(Test)。Train集,Eval集和Test集分别包含212场和8场会议,其中每场会议由多个说话人进行15到30分钟的讨论。Train,Eval和Test集中参与会议的总人数分别为456人和25人,并且参会的男女比例人数均衡。

    该数据集收集于13个不同的会议室,按照大小规格分为小型、中型和大型三种,房间面积从8到55平方米不等。不同房间具有不同的布局和声学特性,每个房间的详细参数也将发送给参与者。会议场地的墙体材料类型包括水泥、玻璃等。会议场地的家具包括沙发、电视、黑板、风扇、空调、植物等。在录制过程中,麦克风阵列放置于桌上,多个说话人围坐在桌边进行自然对话。麦克风阵列离说话人距离约0.3到5.0米之间。所有说话人的母语均是汉语,并且说的都是普通话,没有浓重的口音。在会议录制期间可能会产生各种室内的噪音,包括键盘声、开门/关门声、风扇声、气泡声等。所有说话人在会议的录制期间均保持相同位置,不发生走动。训练集和验证集的说话人没有重复。图1展示了一个会议室的布局以及麦克风的拓扑结构。

    meeting room

    -

    每场会议的说话人数量从2到4人不等。同时为了覆盖各种内容的会议场景,我们选择了多种会议主题,包括医疗、教育、商业、组织管理、工业生产等不同内容的例会。Train集,Eval集和Test集的平均语音重叠率分别为42.27%和34.76%。AliMeeting Train集,Eval集和Test集的详细信息见表1。表2显示了Train集,Eval集和Test集中不同发言者人数会议的语音重叠率和会议数量。

    +

    每场会议的说话人数量从2到4人不等。同时为了覆盖各种内容的会议场景,我们选择了多种会议主题,包括医疗、教育、商业、组织管理、工业生产等不同内容的例会。Train集,Eval集和Test集的平均语音重叠率分别为42.27%,34.76%和42.8%。AliMeeting Train集,Eval集和Test集的详细信息见表1。表2显示了Train集,Eval集和Test集中不同发言者人数会议的语音重叠率和会议数量。

    dataset detail Test-2023测试集由20场会议组成,这些会议是在与AliMeeting数据集相同的声学环境下录制的。Test-2023测试集中的每个会议环节由2到4个参与者组成并且与AliMeeting测试集的配置相似。

    我们还使用耳机麦克风记录了每个说话人的近场音频信号,并确保只转录对应说话人自己的语音。需要注意的是,麦克风阵列记录的远场音频和耳机麦克风记录的近场音频在时间上是同步的。每场会议的所有抄本均以TextGrid格式存储,内容包括会议的时长、说话人信息(说话人数量、说话人ID、性别等)、每个说话人的片段总数、每个片段的时间戳和转录内容。

    diff --git a/docs_m2met2_cn/_build/html/简介.html b/docs_m2met2_cn/_build/html/简介.html index a75969a41..917a2ec31 100644 --- a/docs_m2met2_cn/_build/html/简介.html +++ b/docs_m2met2_cn/_build/html/简介.html @@ -90,7 +90,7 @@
  • 赛道设置与评估 @@ -150,6 +150,7 @@

    竞赛报名

    来自学术界和工业界的有意向参赛者均应在2023年5月5日及之前填写下方的谷歌表单:

    +

    M2MET2.0报名

    主办方将在3个工作日内通过电子邮件通知符合条件的参赛团队,团队必须遵守将在挑战网站上发布的挑战规则。在排名发布之前,每个参赛者必须提交一份系统描述文件,详细说明使用的方法和模型。主办方将选择前三名纳入ASRU2023论文集。

    diff --git a/docs_m2met2_cn/_build/html/组委会.html b/docs_m2met2_cn/_build/html/组委会.html index 3fac9f690..ddf93bb58 100644 --- a/docs_m2met2_cn/_build/html/组委会.html +++ b/docs_m2met2_cn/_build/html/组委会.html @@ -89,7 +89,7 @@
  • 赛道设置与评估 diff --git a/docs_m2met2_cn/_build/html/联系方式.html b/docs_m2met2_cn/_build/html/联系方式.html index 4486e13fe..249e5dd69 100644 --- a/docs_m2met2_cn/_build/html/联系方式.html +++ b/docs_m2met2_cn/_build/html/联系方式.html @@ -85,7 +85,7 @@
  • 赛道设置与评估 diff --git a/docs_m2met2_cn/_build/html/规则.html b/docs_m2met2_cn/_build/html/规则.html index 6a66fd730..dafef6684 100644 --- a/docs_m2met2_cn/_build/html/规则.html +++ b/docs_m2met2_cn/_build/html/规则.html @@ -89,7 +89,7 @@
  • 赛道设置与评估 diff --git a/docs_m2met2_cn/_build/html/赛道设置与评估.html b/docs_m2met2_cn/_build/html/赛道设置与评估.html index b9876b438..072ea5467 100644 --- a/docs_m2met2_cn/_build/html/赛道设置与评估.html +++ b/docs_m2met2_cn/_build/html/赛道设置与评估.html @@ -90,7 +90,7 @@
  • 赛道设置与评估 @@ -127,8 +127,8 @@

    赛道设置与评估

    -

    说话人相关的语音识别 (主赛道)

    -

    说话人相关的ASR任务需要从重叠的语音中识别每个说话人的语音,并为识别内容分配一个说话人标签。图2展示了说话人相关语音识别任务和多说话人语音识别任务的主要区别。在本次竞赛中AliMeeting、Aishell4和Cn-Celeb数据集可作为受限数据源。在M2MeT挑战赛中使用的AliMeeting数据集包含训练、评估和测试集,在M2MET2.0可以在训练和评估中使用。此外,一个包含约10小时会议数据的新的Test-2023集将根据赛程安排发布并用于挑战赛的评分和排名。值得注意的是,组织者将不提供耳机的近场音频、转录以及真实时间戳。主办方将不再提供每个说话人的真实时间戳,而是在Test-2023集上提供包含多个说话人的片段。这些片段可以通过一个简单的vad模型获得。

    +

    说话人相关的语音识别

    +

    说话人相关的ASR任务需要从重叠的语音中识别每个说话人的语音,并为识别内容分配一个说话人标签。图2展示了说话人相关语音识别任务和多说话人语音识别任务的主要区别。在本次竞赛中AliMeeting、Aishell4和Cn-Celeb数据集可作为受限数据源。在M2MeT挑战赛中使用的AliMeeting数据集包含训练、评估和测试集,在M2MET2.0可以在训练和评估中使用。此外,一个包含约10小时会议数据的新的Test-2023集将根据赛程安排发布并用于挑战赛的评分和排名。值得注意的是,对于Test-2023测试集,主办方将不再提供耳机的近场音频、转录以及真实时间戳。而是提供可以通过一个简单的VAD模型得到的包含多个说话人的片段。

    task difference

    @@ -142,7 +142,7 @@

    子赛道设置

    子赛道一 (限定训练数据):

    -

    参赛者在系统构建过程中仅能使用AliMeeting、AISHELL-4和CN Celeb,严禁使用额外数据。参赛者可以任何第三方开源的预训练模型,如Hugging Face以及ModelScope上提供的模型。参赛者需要在最终的系统描述文档中详细列出使用的预训练模型名称以及链接。

    +

    参赛者在系统构建过程中仅能使用AliMeeting、AISHELL-4和CN-Celeb,严禁使用额外数据。参赛者可以任何第三方开源的预训练模型,如Hugging Face以及ModelScope上提供的模型。参赛者需要在最终的系统描述文档中详细列出使用的预训练模型名称以及链接。

    子赛道二 (开放训练数据):

    diff --git a/docs_m2met2_cn/数据集.md b/docs_m2met2_cn/数据集.md index 52965a1c5..24bfaf3d9 100644 --- a/docs_m2met2_cn/数据集.md +++ b/docs_m2met2_cn/数据集.md @@ -3,13 +3,13 @@ 在限定数据集条件下,训练数据集仅限于三个公开的语料库,即AliMeeting、AISHELL-4和CN-Celeb。为了评估参赛者提交的模型的性能,我们将发布一个新的测试集(Test-2023)用于打分和排名。下面我们将详细描述AliMeeting数据集和Test-2023测试集。 ## Alimeeting数据集介绍 -AliMeeting总共包含118.75小时的语音数据,包括104.75小时的训练集(Train)、4小时的验证集(Eval)和10小时的测试集(Test)。Train集和Eval集分别包含212场和8场会议,其中每场会议由多个说话人进行15到30分钟的讨论。Train和Eval集中参与会议的总人数分别为456人和25人,并且参会的男女比例人数均衡。 +AliMeeting总共包含118.75小时的语音数据,包括104.75小时的训练集(Train)、4小时的验证集(Eval)和10小时的测试集(Test)。Train集,Eval集和Test集分别包含212场和8场会议,其中每场会议由多个说话人进行15到30分钟的讨论。Train,Eval和Test集中参与会议的总人数分别为456人和25人,并且参会的男女比例人数均衡。 该数据集收集于13个不同的会议室,按照大小规格分为小型、中型和大型三种,房间面积从8到55平方米不等。不同房间具有不同的布局和声学特性,每个房间的详细参数也将发送给参与者。会议场地的墙体材料类型包括水泥、玻璃等。会议场地的家具包括沙发、电视、黑板、风扇、空调、植物等。在录制过程中,麦克风阵列放置于桌上,多个说话人围坐在桌边进行自然对话。麦克风阵列离说话人距离约0.3到5.0米之间。所有说话人的母语均是汉语,并且说的都是普通话,没有浓重的口音。在会议录制期间可能会产生各种室内的噪音,包括键盘声、开门/关门声、风扇声、气泡声等。所有说话人在会议的录制期间均保持相同位置,不发生走动。训练集和验证集的说话人没有重复。图1展示了一个会议室的布局以及麦克风的拓扑结构。 ![meeting room](images/meeting_room.png) -每场会议的说话人数量从2到4人不等。同时为了覆盖各种内容的会议场景,我们选择了多种会议主题,包括医疗、教育、商业、组织管理、工业生产等不同内容的例会。Train集,Eval集和Test集的平均语音重叠率分别为42.27\%和34.76\%。AliMeeting Train集,Eval集和Test集的详细信息见表1。表2显示了Train集,Eval集和Test集中不同发言者人数会议的语音重叠率和会议数量。 +每场会议的说话人数量从2到4人不等。同时为了覆盖各种内容的会议场景,我们选择了多种会议主题,包括医疗、教育、商业、组织管理、工业生产等不同内容的例会。Train集,Eval集和Test集的平均语音重叠率分别为42.27\%,34.76\%和42.8\%。AliMeeting Train集,Eval集和Test集的详细信息见表1。表2显示了Train集,Eval集和Test集中不同发言者人数会议的语音重叠率和会议数量。 ![dataset detail](images/dataset_detail.png) Test-2023测试集由20场会议组成,这些会议是在与AliMeeting数据集相同的声学环境下录制的。Test-2023测试集中的每个会议环节由2到4个参与者组成并且与AliMeeting测试集的配置相似。 diff --git a/docs_m2met2_cn/简介.md b/docs_m2met2_cn/简介.md index cf4dfb86a..4ae9b6759 100644 --- a/docs_m2met2_cn/简介.md +++ b/docs_m2met2_cn/简介.md @@ -25,4 +25,6 @@ ASSP2022 M2MeT挑战的侧重点是会议场景,它包括两个赛道:说话 来自学术界和工业界的有意向参赛者均应在2023年5月5日及之前填写下方的谷歌表单: +[M2MET2.0报名](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link) + 主办方将在3个工作日内通过电子邮件通知符合条件的参赛团队,团队必须遵守将在挑战网站上发布的挑战规则。在排名发布之前,每个参赛者必须提交一份系统描述文件,详细说明使用的方法和模型。主办方将选择前三名纳入ASRU2023论文集。 \ No newline at end of file diff --git a/docs_m2met2_cn/赛道设置与评估.md b/docs_m2met2_cn/赛道设置与评估.md index a99c1eddc..94a623690 100644 --- a/docs_m2met2_cn/赛道设置与评估.md +++ b/docs_m2met2_cn/赛道设置与评估.md @@ -1,6 +1,6 @@ # 赛道设置与评估 -## 说话人相关的语音识别 (主赛道) -说话人相关的ASR任务需要从重叠的语音中识别每个说话人的语音,并为识别内容分配一个说话人标签。图2展示了说话人相关语音识别任务和多说话人语音识别任务的主要区别。在本次竞赛中AliMeeting、Aishell4和Cn-Celeb数据集可作为受限数据源。在M2MeT挑战赛中使用的AliMeeting数据集包含训练、评估和测试集,在M2MET2.0可以在训练和评估中使用。此外,一个包含约10小时会议数据的新的Test-2023集将根据赛程安排发布并用于挑战赛的评分和排名。值得注意的是,组织者将不提供耳机的近场音频、转录以及真实时间戳。主办方将不再提供每个说话人的真实时间戳,而是在Test-2023集上提供包含多个说话人的片段。这些片段可以通过一个简单的vad模型获得。 +## 说话人相关的语音识别 +说话人相关的ASR任务需要从重叠的语音中识别每个说话人的语音,并为识别内容分配一个说话人标签。图2展示了说话人相关语音识别任务和多说话人语音识别任务的主要区别。在本次竞赛中AliMeeting、Aishell4和Cn-Celeb数据集可作为受限数据源。在M2MeT挑战赛中使用的AliMeeting数据集包含训练、评估和测试集,在M2MET2.0可以在训练和评估中使用。此外,一个包含约10小时会议数据的新的Test-2023集将根据赛程安排发布并用于挑战赛的评分和排名。值得注意的是,对于Test-2023测试集,主办方将不再提供耳机的近场音频、转录以及真实时间戳。而是提供可以通过一个简单的VAD模型得到的包含多个说话人的片段。 ![task difference](images/task_diff.png) @@ -12,6 +12,6 @@ $$ \text{CER} = \frac {\mathcal N_{\text{Ins}} + \mathcal N_{\text{Sub}} + \math 其中 $\mathcal N_{\text{Ins}}$ , $\mathcal N_{\text{Sub}}$ , $\mathcal N_{\text{Del}}$ 是三种错误的字符数, $\mathcal N_{\text{Total}}$ 是字符总数. ## 子赛道设置 ### 子赛道一 (限定训练数据): -参赛者在系统构建过程中仅能使用AliMeeting、AISHELL-4和CN Celeb,严禁使用额外数据。参赛者可以任何第三方开源的预训练模型,如[Hugging Face](https://huggingface.co/models)以及[ModelScope](https://www.modelscope.cn/models)上提供的模型。参赛者需要在最终的系统描述文档中详细列出使用的预训练模型名称以及链接。 +参赛者在系统构建过程中仅能使用AliMeeting、AISHELL-4和CN-Celeb,严禁使用额外数据。参赛者可以任何第三方开源的预训练模型,如[Hugging Face](https://huggingface.co/models)以及[ModelScope](https://www.modelscope.cn/models)上提供的模型。参赛者需要在最终的系统描述文档中详细列出使用的预训练模型名称以及链接。 ### 子赛道二 (开放训练数据): 除了限定数据外,参与者可以使用任何公开可用、私人录制和模拟仿真的数据集。但是,参与者必须清楚地列出使用的数据。同样,参赛者也可以使用任何第三方开源的预训练模型,但必须在最后的系统描述文件中明确的列出所使用的数据和模型链接,如果使用模拟仿真数据,请详细描述数据模拟的方案。 \ No newline at end of file