diff --git a/docs/m2met2/Baseline.md b/docs/m2met2/Baseline.md index cdaff8a14..4e12162a5 100644 --- a/docs/m2met2/Baseline.md +++ b/docs/m2met2/Baseline.md @@ -31,4 +31,4 @@ For more details you can see [here](https://github.com/alibaba-damo-academy/FunA ## Baseline results The results of the baseline system are shown in Table 3. The speaker profile adopts the oracle speaker embedding during training. However, due to the lack of oracle speaker label during evaluation, the speaker profile provided by an additional spectral clustering is used. Meanwhile, the results of using the oracle speaker profile on Eval and Test Set are also provided to show the impact of speaker profile accuracy. -![baseline result](images/baseline_result.png) \ No newline at end of file +![baseline_result](images/baseline_result.png) \ No newline at end of file diff --git a/docs/m2met2/Introduction.md b/docs/m2met2/Introduction.md index 06c27b239..fc7c356d7 100644 --- a/docs/m2met2/Introduction.md +++ b/docs/m2met2/Introduction.md @@ -10,14 +10,14 @@ Building on the success of the previous M2MeT challenge, we are excited to propo ## Timeline(AOE Time) - $ April~29, 2023: $ Challenge and registration open. -- $ May~8, 2023: $ Baseline release. -- $ May~15, 2023: $ Registration deadline, the due date for participants to join the Challenge. -- $ June~9, 2023: $ Test data release and leaderboard open. -- $ June~13, 2023: $ Final submission deadline and leaderboar close. -- $ June~19, 2023: $ Evaluation result and ranking release. +- $ May~11, 2023: $ Baseline release. +- $ May~22, 2023: $ Registration deadline, the due date for participants to join the Challenge. +- $ June~16, 2023: $ Test data release and leaderboard open. +- $ June~20, 2023: $ Final submission deadline and leaderboar close. +- $ June~26, 2023: $ Evaluation result and ranking release. - $ July~3, 2023: $ Deadline for paper submission. - $ July~10, 2023: $ Deadline for final paper submission. -- $ December~12\ to\ 16, 2023: $ ASRU Workshop and challenge Session +- $ December~12\ to\ 16, 2023: $ ASRU Workshop and Challenge Session. ## Guidelines diff --git a/docs/m2met2/_build/doctrees/Baseline.doctree b/docs/m2met2/_build/doctrees/Baseline.doctree index a99a9aff0..f6ea62f86 100644 Binary files a/docs/m2met2/_build/doctrees/Baseline.doctree and b/docs/m2met2/_build/doctrees/Baseline.doctree differ diff --git a/docs/m2met2/_build/doctrees/Introduction.doctree b/docs/m2met2/_build/doctrees/Introduction.doctree index 96e4276b7..6ffceef26 100644 Binary files a/docs/m2met2/_build/doctrees/Introduction.doctree and b/docs/m2met2/_build/doctrees/Introduction.doctree differ diff --git a/docs/m2met2/_build/doctrees/environment.pickle b/docs/m2met2/_build/doctrees/environment.pickle index aae3d365d..fe6805987 100644 Binary files a/docs/m2met2/_build/doctrees/environment.pickle and b/docs/m2met2/_build/doctrees/environment.pickle differ diff --git a/docs/m2met2/_build/html/Baseline.html b/docs/m2met2/_build/html/Baseline.html index 7fef3293c..62c656cca 100644 --- a/docs/m2met2/_build/html/Baseline.html +++ b/docs/m2met2/_build/html/Baseline.html @@ -157,7 +157,7 @@ Before running run.

Baseline results

The results of the baseline system are shown in Table 3. The speaker profile adopts the oracle speaker embedding during training. However, due to the lack of oracle speaker label during evaluation, the speaker profile provided by an additional spectral clustering is used. Meanwhile, the results of using the oracle speaker profile on Eval and Test Set are also provided to show the impact of speaker profile accuracy.

-

baseline result

+

baseline_result

diff --git a/docs/m2met2/_build/html/Introduction.html b/docs/m2met2/_build/html/Introduction.html index b3079bd62..82394fc96 100644 --- a/docs/m2met2/_build/html/Introduction.html +++ b/docs/m2met2/_build/html/Introduction.html @@ -136,14 +136,14 @@

Timeline(AOE Time)

diff --git a/docs/m2met2/_build/html/_images/baseline_result.png b/docs/m2met2/_build/html/_images/baseline_result.png index d51d7753c..6b7636192 100644 Binary files a/docs/m2met2/_build/html/_images/baseline_result.png and b/docs/m2met2/_build/html/_images/baseline_result.png differ diff --git a/docs/m2met2/_build/html/_images/qrcode.png b/docs/m2met2/_build/html/_images/qrcode.png index 54b2f55d2..fc4c3498c 100644 Binary files a/docs/m2met2/_build/html/_images/qrcode.png and b/docs/m2met2/_build/html/_images/qrcode.png differ diff --git a/docs/m2met2/_build/html/_sources/Baseline.md.txt b/docs/m2met2/_build/html/_sources/Baseline.md.txt index cdaff8a14..4e12162a5 100644 --- a/docs/m2met2/_build/html/_sources/Baseline.md.txt +++ b/docs/m2met2/_build/html/_sources/Baseline.md.txt @@ -31,4 +31,4 @@ For more details you can see [here](https://github.com/alibaba-damo-academy/FunA ## Baseline results The results of the baseline system are shown in Table 3. The speaker profile adopts the oracle speaker embedding during training. However, due to the lack of oracle speaker label during evaluation, the speaker profile provided by an additional spectral clustering is used. Meanwhile, the results of using the oracle speaker profile on Eval and Test Set are also provided to show the impact of speaker profile accuracy. -![baseline result](images/baseline_result.png) \ No newline at end of file +![baseline_result](images/baseline_result.png) \ No newline at end of file diff --git a/docs/m2met2/_build/html/_sources/Introduction.md.txt b/docs/m2met2/_build/html/_sources/Introduction.md.txt index 06c27b239..fc7c356d7 100644 --- a/docs/m2met2/_build/html/_sources/Introduction.md.txt +++ b/docs/m2met2/_build/html/_sources/Introduction.md.txt @@ -10,14 +10,14 @@ Building on the success of the previous M2MeT challenge, we are excited to propo ## Timeline(AOE Time) - $ April~29, 2023: $ Challenge and registration open. -- $ May~8, 2023: $ Baseline release. -- $ May~15, 2023: $ Registration deadline, the due date for participants to join the Challenge. -- $ June~9, 2023: $ Test data release and leaderboard open. -- $ June~13, 2023: $ Final submission deadline and leaderboar close. -- $ June~19, 2023: $ Evaluation result and ranking release. +- $ May~11, 2023: $ Baseline release. +- $ May~22, 2023: $ Registration deadline, the due date for participants to join the Challenge. +- $ June~16, 2023: $ Test data release and leaderboard open. +- $ June~20, 2023: $ Final submission deadline and leaderboar close. +- $ June~26, 2023: $ Evaluation result and ranking release. - $ July~3, 2023: $ Deadline for paper submission. - $ July~10, 2023: $ Deadline for final paper submission. -- $ December~12\ to\ 16, 2023: $ ASRU Workshop and challenge Session +- $ December~12\ to\ 16, 2023: $ ASRU Workshop and Challenge Session. ## Guidelines diff --git a/docs/m2met2/_build/html/searchindex.js b/docs/m2met2/_build/html/searchindex.js index 6481ef293..3387db5b5 100644 --- a/docs/m2met2/_build/html/searchindex.js +++ b/docs/m2met2/_build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["Baseline", "Contact", "Dataset", "Introduction", "Organizers", "Rules", "Track_setting_and_evaluation", "index"], "filenames": ["Baseline.md", "Contact.md", "Dataset.md", "Introduction.md", "Organizers.md", "Rules.md", "Track_setting_and_evaluation.md", "index.rst"], "titles": ["Baseline", "Contact", "Datasets", "Introduction", "Organizers", "Rules", "Track & Evaluation", "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)"], "terms": {"we": [0, 2, 3, 7], "releas": [0, 2, 3, 6], "an": [0, 2, 3, 6], "e2": 0, "sa": 0, "asr": [0, 3, 7], "conduct": [0, 2], "funasr": 0, "time": [0, 6], "accord": [0, 3], "timelin": [0, 2], "The": [0, 2, 3, 5, 6], "model": [0, 2, 3, 5, 6], "architectur": 0, "i": [0, 2, 3, 5], "shown": [0, 2], "figur": [0, 6], "3": [0, 2, 3], "speakerencod": 0, "initi": 0, "pre": [0, 6], "train": [0, 3, 5, 7], "speaker": [0, 2, 3, 7], "verif": 0, "from": [0, 2, 3, 5, 6], "modelscop": [0, 6], "thi": [0, 3, 5, 6], "also": [0, 2, 3, 6], "us": [0, 2, 5, 6], "extract": 0, "embed": 0, "profil": 0, "To": [0, 2, 3, 7], "run": 0, "first": 0, "you": [0, 1], "need": 0, "instal": 0, "There": [0, 2], "ar": [0, 2, 3, 5, 6, 7], "two": [0, 3, 5, 7], "startup": 0, "script": [0, 2], "sh": 0, "evalu": [0, 2, 3, 7], "old": 0, "eval": [0, 2, 5, 6], "test": [0, 2, 3, 5, 6], "set": [0, 2, 3, 5, 6], "run_m2met_2023_inf": 0, "infer": 0, "new": [0, 2, 3, 6], "multi": [0, 3, 6], "channel": [0, 3], "parti": [0, 3, 6], "meet": [0, 2, 3, 6], "transcript": [0, 2, 3, 5, 6], "2": [0, 2, 6], "0": [0, 1, 2, 3], "m2met2": [0, 1, 3], "challeng": [0, 1, 3, 5, 6], "befor": 0, "must": [0, 3, 5, 6], "manual": [0, 6], "download": [0, 2], "unpack": 0, "alimeet": [0, 1, 6], "corpu": [0, 6], "place": [0, 2], "dataset": [0, 3, 5, 6, 7], "directori": 0, "eval_ali_far": 0, "eval_ali_near": 0, "test_ali_far": 0, "test_ali_near": 0, "train_ali_far": 0, "train_ali_near": 0, "test_2023_ali_far": 0, "after": 0, "which": [0, 2, 3, 6], "contain": [0, 2, 6], "onli": [0, 2, 5, 6], "raw": 0, "audio": [0, 2, 3, 6], "Then": 0, "put": 0, "given": 0, "wav": 0, "scp": 0, "wav_raw": 0, "segment": [0, 2, 6], "utt2spk": 0, "spk2utt": 0, "data": [0, 3, 5, 6], "For": [0, 2], "more": [0, 2], "detail": [0, 3, 6], "can": [0, 2, 3, 5, 6], "see": 0, "here": 0, "system": [0, 3, 5, 6, 7], "tabl": [0, 2], "adopt": 0, "oracl": [0, 6], "dure": [0, 2, 6], "howev": [0, 3, 6], "due": [0, 3], "lack": 0, "label": [0, 5, 6], "provid": [0, 2, 6, 7], "addit": [0, 6], "spectral": 0, "cluster": 0, "meanwhil": 0, "show": 0, "impact": 0, "accuraci": [0, 6], "If": [1, 5, 6], "have": [1, 3], "ani": [1, 5, 6], "question": 1, "about": [1, 3], "pleas": 1, "u": [1, 2], "email": [1, 3, 4], "m2met": [1, 3, 6, 7], "gmail": 1, "com": [1, 4], "wechat": [1, 3], "group": [1, 2, 3], "In": [2, 3, 5], "fix": [2, 3, 7], "condit": [2, 3, 7], "restrict": 2, "three": [2, 3, 6], "publicli": [2, 6], "avail": [2, 6], "corpora": 2, "name": 2, "aishel": [2, 4, 6], "4": [2, 6], "cn": [2, 4, 6], "celeb": [2, 6], "perform": [2, 3], "call": 2, "2023": [2, 3, 5, 6], "score": [2, 6], "rank": [2, 3, 6], "describ": 2, "118": 2, "75": 2, "hour": [2, 3, 6], "speech": [2, 3, 6, 7], "total": [2, 6], "divid": [2, 6], "104": 2, "10": [2, 3, 6], "specif": [2, 6], "212": 2, "8": [2, 3], "20": 2, "session": [2, 3, 6, 7], "respect": 2, "each": [2, 3, 6], "consist": [2, 6], "15": [2, 3], "30": 2, "minut": 2, "discuss": 2, "particip": [2, 5, 6], "number": [2, 3, 6], "456": 2, "25": 2, "60": 2, "balanc": 2, "gender": 2, "coverag": 2, "collect": 2, "13": [2, 3], "venu": 2, "categor": 2, "type": 2, "small": 2, "medium": 2, "larg": [2, 3], "room": [2, 3], "size": 2, "rang": 2, "m": 2, "55": 2, "differ": [2, 3, 6], "give": 2, "varieti": 2, "acoust": [2, 3, 6], "properti": 2, "layout": 2, "paramet": [2, 5], "togeth": 2, "wall": 2, "materi": 2, "cover": 2, "cement": 2, "glass": 2, "etc": 2, "other": 2, "furnish": 2, "includ": [2, 3, 5, 6], "sofa": 2, "tv": 2, "blackboard": 2, "fan": 2, "air": 2, "condition": 2, "plant": 2, "record": [2, 6], "sit": 2, "around": 2, "microphon": [2, 3], "arrai": [2, 3], "natur": 2, "convers": 2, "distanc": 2, "5": 2, "all": [2, 3, 5, 6], "nativ": 2, "chines": 2, "speak": [2, 3], "mandarin": [2, 3], "without": 2, "strong": 2, "accent": 2, "variou": [2, 3], "kind": 2, "indoor": 2, "nois": [2, 3, 5], "limit": [2, 3, 5], "click": 2, "keyboard": 2, "door": 2, "open": [2, 3, 7], "close": [2, 3], "bubbl": 2, "made": [2, 3], "both": [2, 6], "requir": [2, 3, 6], "remain": [2, 3], "same": [2, 5], "posit": 2, "overlap": [2, 3], "between": [2, 6], "exampl": 2, "fig": 2, "1": 2, "within": [2, 3], "one": [2, 5], "ensur": 2, "ratio": 2, "select": [2, 3, 5, 6], "topic": 2, "medic": 2, "treatment": 2, "educ": 2, "busi": 2, "organ": [2, 3, 5, 6, 7], "manag": 2, "industri": [2, 3], "product": 2, "daili": 2, "routin": 2, "averag": 2, "42": 2, "27": 2, "34": 2, "76": 2, "A": [2, 4], "distribut": 2, "were": 2, "ident": [2, 6], "compris": [2, 3, 7], "therebi": 2, "share": 2, "similar": 2, "configur": 2, "field": [2, 3, 6], "signal": [2, 3], "headset": 2, "": [2, 6], "own": 2, "transcrib": [2, 3, 6], "It": [2, 6], "worth": [2, 6], "note": [2, 6], "far": [2, 3], "synchron": 2, "common": 2, "prepar": 2, "textgrid": 2, "format": 2, "inform": [2, 3], "durat": 2, "id": 2, "timestamp": [2, 6], "mention": 2, "abov": 2, "openslr": 2, "via": 2, "follow": [2, 5], "link": 2, "particularli": 2, "baselin": [2, 3, 7], "conveni": 2, "automat": [3, 7], "recognit": [3, 7], "diariz": 3, "signific": 3, "stride": 3, "recent": 3, "year": 3, "result": 3, "surg": 3, "technologi": 3, "applic": 3, "across": 3, "domain": 3, "present": 3, "uniqu": [3, 6], "complex": [3, 5], "divers": 3, "style": 3, "variabl": 3, "confer": 3, "environment": 3, "reverber": [3, 5], "over": 3, "sever": 3, "been": 3, "advanc": [3, 7], "develop": [3, 6], "rich": 3, "comput": [3, 5], "hear": 3, "multisourc": 3, "environ": 3, "chime": 3, "latest": 3, "iter": 3, "ha": 3, "particular": 3, "focu": 3, "distant": 3, "gener": 3, "topologi": 3, "scenario": 3, "while": 3, "progress": 3, "english": 3, "languag": [3, 5], "barrier": 3, "achiev": 3, "compar": 3, "non": 3, "multimod": 3, "base": 3, "process": [3, 6], "misp": 3, "instrument": 3, "seek": 3, "address": 3, "problem": 3, "visual": 3, "everydai": 3, "home": 3, "focus": 3, "tackl": 3, "issu": 3, "offlin": 3, "icassp2022": 3, "main": 3, "task": [3, 6, 7], "former": 3, "involv": [3, 6], "identifi": 3, "who": 3, "spoke": 3, "when": 3, "latter": 3, "aim": 3, "multipl": [3, 6], "simultan": 3, "pose": [3, 6], "technic": 3, "difficulti": 3, "interfer": 3, "build": [3, 6, 7], "success": [3, 7], "previou": 3, "excit": 3, "propos": [3, 7], "asru": 3, "special": [3, 5, 7], "origin": [3, 5], "metric": [3, 7], "wa": [3, 6], "independ": 3, "meant": 3, "could": 3, "determin": 3, "correspond": [3, 5], "further": 3, "current": [3, 7], "talker": [3, 7], "toward": 3, "practic": 3, "attribut": [3, 7], "sub": [3, 5, 7], "track": [3, 5, 7], "what": 3, "facilit": [3, 7], "reproduc": [3, 7], "research": [3, 4, 7], "offer": 3, "comprehens": [3, 7], "overview": [3, 7], "rule": [3, 7], "furthermor": 3, "carefulli": 3, "curat": 3, "approxim": [3, 6], "design": 3, "enabl": 3, "valid": 3, "state": [3, 6, 7], "art": [3, 7], "area": 3, "april": 3, "29": 3, "registr": 3, "mai": 3, "deadlin": 3, "date": 3, "join": 3, "june": 3, "9": 3, "leaderboard": 3, "final": [3, 5, 6], "submiss": 3, "leaderboar": 3, "19": 3, "juli": 3, "paper": [3, 6], "decemb": 3, "12": 3, "16": 3, "workshop": 3, "interest": 3, "whether": 3, "academia": 3, "regist": 3, "complet": 3, "googl": 3, "form": 3, "below": 3, "22": 3, "welcom": 3, "keep": 3, "up": 3, "updat": 3, "work": 3, "dai": 3, "send": 3, "invit": 3, "elig": [3, 5], "team": 3, "qualifi": 3, "adher": [3, 5], "publish": 3, "page": 3, "prior": 3, "submit": 3, "descript": [3, 6], "document": 3, "approach": [3, 5], "method": 3, "top": 3, "asru2023": [3, 7], "proceed": 3, "lei": 4, "xie": 4, "professor": 4, "foundat": 4, "china": 4, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "senior": 4, "scientist": 4, "institut": 4, "infocomm": 4, "star": 4, "singapor": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yan": 4, "princip": 4, "engin": 4, "alibaba": 4, "yzj": 4, "inc": 4, "shiliang": 4, "zhang": 4, "sly": 4, "zsl": 4, "yanmin": 4, "qian": 4, "shanghai": 4, "jiao": 4, "tong": 4, "univers": 4, "yanminqian": 4, "sjtu": 4, "zhuo": 4, "chen": 4, "appli": 4, "microsoft": 4, "usa": 4, "zhuc": 4, "jian": 4, "wu": 4, "wujian": 4, "hui": 4, "bu": 4, "ceo": 4, "buhui": 4, "aishelldata": 4, "should": 5, "augment": 5, "allow": [5, 6], "ad": 5, "speed": 5, "perturb": 5, "tone": 5, "chang": 5, "permit": 5, "purpos": 5, "instead": [5, 6], "util": [5, 6], "tune": 5, "violat": 5, "strictli": [5, 6], "prohibit": [5, 6], "fine": 5, "cpcer": [5, 6], "lower": 5, "judg": 5, "superior": 5, "forc": 5, "align": 5, "obtain": [5, 6], "frame": 5, "level": 5, "classif": 5, "basi": 5, "shallow": 5, "fusion": 5, "end": 5, "e": [5, 6], "g": 5, "la": 5, "rnnt": 5, "transform": [5, 6], "come": 5, "right": 5, "interpret": 5, "belong": 5, "case": 5, "circumst": 5, "coordin": 5, "assign": 6, "illustr": 6, "aishell4": 6, "constrain": 6, "sourc": 6, "addition": 6, "soon": 6, "simpl": 6, "voic": 6, "activ": 6, "detect": 6, "vad": 6, "concaten": 6, "minimum": 6, "permut": 6, "charact": 6, "error": 6, "rate": 6, "calcul": 6, "step": 6, "firstli": 6, "refer": 6, "hypothesi": 6, "chronolog": 6, "order": 6, "secondli": 6, "cer": 6, "repeat": 6, "possibl": 6, "lowest": 6, "tthe": 6, "insert": 6, "Ins": 6, "substitut": 6, "delet": 6, "del": 6, "output": 6, "text": 6, "frac": 6, "mathcal": 6, "n_": 6, "100": 6, "where": 6, "usag": 6, "third": 6, "hug": 6, "face": 6, "list": 6, "clearli": 6, "privat": 6, "simul": 6, "thei": 6, "mandatori": 6, "clear": 6, "scheme": 6, "delight": 7, "introduct": 7, "contact": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"baselin": 0, "overview": [0, 2], "quick": 0, "start": 0, "result": 0, "contact": 1, "dataset": 2, "train": [2, 6], "data": 2, "detail": 2, "alimeet": 2, "corpu": 2, "get": 2, "introduct": 3, "call": 3, "particip": 3, "timelin": 3, "aoe": 3, "time": 3, "guidelin": 3, "organ": 4, "rule": 5, "track": 6, "evalu": 6, "speaker": 6, "attribut": 6, "asr": 6, "metric": 6, "sub": 6, "arrang": 6, "i": 6, "fix": 6, "condit": 6, "ii": 6, "open": 6, "asru": 7, "2023": 7, "multi": 7, "channel": 7, "parti": 7, "meet": 7, "transcript": 7, "challeng": 7, "2": 7, "0": 7, "m2met2": 7, "content": 7}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Baseline": [[0, "baseline"]], "Overview": [[0, "overview"]], "Quick start": [[0, "quick-start"]], "Baseline results": [[0, "baseline-results"]], "Contact": [[1, "contact"]], "Datasets": [[2, "datasets"]], "Overview of training data": [[2, "overview-of-training-data"]], "Detail of AliMeeting corpus": [[2, "detail-of-alimeeting-corpus"]], "Get the data": [[2, "get-the-data"]], "Introduction": [[3, "introduction"]], "Call for participation": [[3, "call-for-participation"]], "Timeline(AOE Time)": [[3, "timeline-aoe-time"]], "Guidelines": [[3, "guidelines"]], "Organizers": [[4, "organizers"]], "Rules": [[5, "rules"]], "Track & Evaluation": [[6, "track-evaluation"]], "Speaker-Attributed ASR": [[6, "speaker-attributed-asr"]], "Evaluation metric": [[6, "evaluation-metric"]], "Sub-track arrangement": [[6, "sub-track-arrangement"]], "Sub-track I (Fixed Training Condition):": [[6, "sub-track-i-fixed-training-condition"]], "Sub-track II (Open Training Condition):": [[6, "sub-track-ii-open-training-condition"]], "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)": [[7, "asru-2023-multi-channel-multi-party-meeting-transcription-challenge-2-0-m2met2-0"]], "Contents:": [[7, null]]}, "indexentries": {}}) \ No newline at end of file +Search.setIndex({"docnames": ["Baseline", "Contact", "Dataset", "Introduction", "Organizers", "Rules", "Track_setting_and_evaluation", "index"], "filenames": ["Baseline.md", "Contact.md", "Dataset.md", "Introduction.md", "Organizers.md", "Rules.md", "Track_setting_and_evaluation.md", "index.rst"], "titles": ["Baseline", "Contact", "Datasets", "Introduction", "Organizers", "Rules", "Track & Evaluation", "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)"], "terms": {"we": [0, 2, 3, 7], "releas": [0, 2, 3, 6], "an": [0, 2, 3, 6], "e2": 0, "sa": 0, "asr": [0, 3, 7], "conduct": [0, 2], "funasr": 0, "time": [0, 6], "accord": [0, 3], "timelin": [0, 2], "The": [0, 2, 3, 5, 6], "model": [0, 2, 3, 5, 6], "architectur": 0, "i": [0, 2, 3, 5], "shown": [0, 2], "figur": [0, 6], "3": [0, 2, 3], "speakerencod": 0, "initi": 0, "pre": [0, 6], "train": [0, 3, 5, 7], "speaker": [0, 2, 3, 7], "verif": 0, "from": [0, 2, 3, 5, 6], "modelscop": [0, 6], "thi": [0, 3, 5, 6], "also": [0, 2, 3, 6], "us": [0, 2, 5, 6], "extract": 0, "embed": 0, "profil": 0, "To": [0, 2, 3, 7], "run": 0, "first": 0, "you": [0, 1], "need": 0, "instal": 0, "There": [0, 2], "ar": [0, 2, 3, 5, 6, 7], "two": [0, 3, 5, 7], "startup": 0, "script": [0, 2], "sh": 0, "evalu": [0, 2, 3, 7], "old": 0, "eval": [0, 2, 5, 6], "test": [0, 2, 3, 5, 6], "set": [0, 2, 3, 5, 6], "run_m2met_2023_inf": 0, "infer": 0, "new": [0, 2, 3, 6], "multi": [0, 3, 6], "channel": [0, 3], "parti": [0, 3, 6], "meet": [0, 2, 3, 6], "transcript": [0, 2, 3, 5, 6], "2": [0, 2, 6], "0": [0, 1, 2, 3], "m2met2": [0, 1, 3], "challeng": [0, 1, 3, 5, 6], "befor": 0, "must": [0, 3, 5, 6], "manual": [0, 6], "download": [0, 2], "unpack": 0, "alimeet": [0, 1, 6], "corpu": [0, 6], "place": [0, 2], "dataset": [0, 3, 5, 6, 7], "directori": 0, "eval_ali_far": 0, "eval_ali_near": 0, "test_ali_far": 0, "test_ali_near": 0, "train_ali_far": 0, "train_ali_near": 0, "test_2023_ali_far": 0, "after": 0, "which": [0, 2, 3, 6], "contain": [0, 2, 6], "onli": [0, 2, 5, 6], "raw": 0, "audio": [0, 2, 3, 6], "Then": 0, "put": 0, "given": 0, "wav": 0, "scp": 0, "wav_raw": 0, "segment": [0, 2, 6], "utt2spk": 0, "spk2utt": 0, "data": [0, 3, 5, 6], "For": [0, 2], "more": [0, 2], "detail": [0, 3, 6], "can": [0, 2, 3, 5, 6], "see": 0, "here": 0, "system": [0, 3, 5, 6, 7], "tabl": [0, 2], "adopt": 0, "oracl": [0, 6], "dure": [0, 2, 6], "howev": [0, 3, 6], "due": [0, 3], "lack": 0, "label": [0, 5, 6], "provid": [0, 2, 6, 7], "addit": [0, 6], "spectral": 0, "cluster": 0, "meanwhil": 0, "show": 0, "impact": 0, "accuraci": [0, 6], "If": [1, 5, 6], "have": [1, 3], "ani": [1, 5, 6], "question": 1, "about": [1, 3], "pleas": 1, "u": [1, 2], "email": [1, 3, 4], "m2met": [1, 3, 6, 7], "gmail": 1, "com": [1, 4], "wechat": [1, 3], "group": [1, 2, 3], "In": [2, 3, 5], "fix": [2, 3, 7], "condit": [2, 3, 7], "restrict": 2, "three": [2, 3, 6], "publicli": [2, 6], "avail": [2, 6], "corpora": 2, "name": 2, "aishel": [2, 4, 6], "4": [2, 6], "cn": [2, 4, 6], "celeb": [2, 6], "perform": [2, 3], "call": 2, "2023": [2, 3, 5, 6], "score": [2, 6], "rank": [2, 3, 6], "describ": 2, "118": 2, "75": 2, "hour": [2, 3, 6], "speech": [2, 3, 6, 7], "total": [2, 6], "divid": [2, 6], "104": 2, "10": [2, 3, 6], "specif": [2, 6], "212": 2, "8": 2, "20": [2, 3], "session": [2, 3, 6, 7], "respect": 2, "each": [2, 3, 6], "consist": [2, 6], "15": 2, "30": 2, "minut": 2, "discuss": 2, "particip": [2, 5, 6], "number": [2, 3, 6], "456": 2, "25": 2, "60": 2, "balanc": 2, "gender": 2, "coverag": 2, "collect": 2, "13": 2, "venu": 2, "categor": 2, "type": 2, "small": 2, "medium": 2, "larg": [2, 3], "room": [2, 3], "size": 2, "rang": 2, "m": 2, "55": 2, "differ": [2, 3, 6], "give": 2, "varieti": 2, "acoust": [2, 3, 6], "properti": 2, "layout": 2, "paramet": [2, 5], "togeth": 2, "wall": 2, "materi": 2, "cover": 2, "cement": 2, "glass": 2, "etc": 2, "other": 2, "furnish": 2, "includ": [2, 3, 5, 6], "sofa": 2, "tv": 2, "blackboard": 2, "fan": 2, "air": 2, "condition": 2, "plant": 2, "record": [2, 6], "sit": 2, "around": 2, "microphon": [2, 3], "arrai": [2, 3], "natur": 2, "convers": 2, "distanc": 2, "5": 2, "all": [2, 3, 5, 6], "nativ": 2, "chines": 2, "speak": [2, 3], "mandarin": [2, 3], "without": 2, "strong": 2, "accent": 2, "variou": [2, 3], "kind": 2, "indoor": 2, "nois": [2, 3, 5], "limit": [2, 3, 5], "click": 2, "keyboard": 2, "door": 2, "open": [2, 3, 7], "close": [2, 3], "bubbl": 2, "made": [2, 3], "both": [2, 6], "requir": [2, 3, 6], "remain": [2, 3], "same": [2, 5], "posit": 2, "overlap": [2, 3], "between": [2, 6], "exampl": 2, "fig": 2, "1": 2, "within": [2, 3], "one": [2, 5], "ensur": 2, "ratio": 2, "select": [2, 3, 5, 6], "topic": 2, "medic": 2, "treatment": 2, "educ": 2, "busi": 2, "organ": [2, 3, 5, 6, 7], "manag": 2, "industri": [2, 3], "product": 2, "daili": 2, "routin": 2, "averag": 2, "42": 2, "27": 2, "34": 2, "76": 2, "A": [2, 4], "distribut": 2, "were": 2, "ident": [2, 6], "compris": [2, 3, 7], "therebi": 2, "share": 2, "similar": 2, "configur": 2, "field": [2, 3, 6], "signal": [2, 3], "headset": 2, "": [2, 6], "own": 2, "transcrib": [2, 3, 6], "It": [2, 6], "worth": [2, 6], "note": [2, 6], "far": [2, 3], "synchron": 2, "common": 2, "prepar": 2, "textgrid": 2, "format": 2, "inform": [2, 3], "durat": 2, "id": 2, "timestamp": [2, 6], "mention": 2, "abov": 2, "openslr": 2, "via": 2, "follow": [2, 5], "link": 2, "particularli": 2, "baselin": [2, 3, 7], "conveni": 2, "automat": [3, 7], "recognit": [3, 7], "diariz": 3, "signific": 3, "stride": 3, "recent": 3, "year": 3, "result": 3, "surg": 3, "technologi": 3, "applic": 3, "across": 3, "domain": 3, "present": 3, "uniqu": [3, 6], "complex": [3, 5], "divers": 3, "style": 3, "variabl": 3, "confer": 3, "environment": 3, "reverber": [3, 5], "over": 3, "sever": 3, "been": 3, "advanc": [3, 7], "develop": [3, 6], "rich": 3, "comput": [3, 5], "hear": 3, "multisourc": 3, "environ": 3, "chime": 3, "latest": 3, "iter": 3, "ha": 3, "particular": 3, "focu": 3, "distant": 3, "gener": 3, "topologi": 3, "scenario": 3, "while": 3, "progress": 3, "english": 3, "languag": [3, 5], "barrier": 3, "achiev": 3, "compar": 3, "non": 3, "multimod": 3, "base": 3, "process": [3, 6], "misp": 3, "instrument": 3, "seek": 3, "address": 3, "problem": 3, "visual": 3, "everydai": 3, "home": 3, "focus": 3, "tackl": 3, "issu": 3, "offlin": 3, "icassp2022": 3, "main": 3, "task": [3, 6, 7], "former": 3, "involv": [3, 6], "identifi": 3, "who": 3, "spoke": 3, "when": 3, "latter": 3, "aim": 3, "multipl": [3, 6], "simultan": 3, "pose": [3, 6], "technic": 3, "difficulti": 3, "interfer": 3, "build": [3, 6, 7], "success": [3, 7], "previou": 3, "excit": 3, "propos": [3, 7], "asru": 3, "special": [3, 5, 7], "origin": [3, 5], "metric": [3, 7], "wa": [3, 6], "independ": 3, "meant": 3, "could": 3, "determin": 3, "correspond": [3, 5], "further": 3, "current": [3, 7], "talker": [3, 7], "toward": 3, "practic": 3, "attribut": [3, 7], "sub": [3, 5, 7], "track": [3, 5, 7], "what": 3, "facilit": [3, 7], "reproduc": [3, 7], "research": [3, 4, 7], "offer": 3, "comprehens": [3, 7], "overview": [3, 7], "rule": [3, 7], "furthermor": 3, "carefulli": 3, "curat": 3, "approxim": [3, 6], "design": 3, "enabl": 3, "valid": 3, "state": [3, 6, 7], "art": [3, 7], "area": 3, "april": 3, "29": 3, "registr": 3, "mai": 3, "11": 3, "22": 3, "deadlin": 3, "date": 3, "join": 3, "june": 3, "16": 3, "leaderboard": 3, "final": [3, 5, 6], "submiss": 3, "leaderboar": 3, "26": 3, "juli": 3, "paper": [3, 6], "decemb": 3, "12": 3, "workshop": 3, "interest": 3, "whether": 3, "academia": 3, "regist": 3, "complet": 3, "googl": 3, "form": 3, "below": 3, "welcom": 3, "keep": 3, "up": 3, "updat": 3, "work": 3, "dai": 3, "send": 3, "invit": 3, "elig": [3, 5], "team": 3, "qualifi": 3, "adher": [3, 5], "publish": 3, "page": 3, "prior": 3, "submit": 3, "descript": [3, 6], "document": 3, "approach": [3, 5], "method": 3, "top": 3, "asru2023": [3, 7], "proceed": 3, "lei": 4, "xie": 4, "professor": 4, "foundat": 4, "china": 4, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "senior": 4, "scientist": 4, "institut": 4, "infocomm": 4, "star": 4, "singapor": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yan": 4, "princip": 4, "engin": 4, "alibaba": 4, "yzj": 4, "inc": 4, "shiliang": 4, "zhang": 4, "sly": 4, "zsl": 4, "yanmin": 4, "qian": 4, "shanghai": 4, "jiao": 4, "tong": 4, "univers": 4, "yanminqian": 4, "sjtu": 4, "zhuo": 4, "chen": 4, "appli": 4, "microsoft": 4, "usa": 4, "zhuc": 4, "jian": 4, "wu": 4, "wujian": 4, "hui": 4, "bu": 4, "ceo": 4, "buhui": 4, "aishelldata": 4, "should": 5, "augment": 5, "allow": [5, 6], "ad": 5, "speed": 5, "perturb": 5, "tone": 5, "chang": 5, "permit": 5, "purpos": 5, "instead": [5, 6], "util": [5, 6], "tune": 5, "violat": 5, "strictli": [5, 6], "prohibit": [5, 6], "fine": 5, "cpcer": [5, 6], "lower": 5, "judg": 5, "superior": 5, "forc": 5, "align": 5, "obtain": [5, 6], "frame": 5, "level": 5, "classif": 5, "basi": 5, "shallow": 5, "fusion": 5, "end": 5, "e": [5, 6], "g": 5, "la": 5, "rnnt": 5, "transform": [5, 6], "come": 5, "right": 5, "interpret": 5, "belong": 5, "case": 5, "circumst": 5, "coordin": 5, "assign": 6, "illustr": 6, "aishell4": 6, "constrain": 6, "sourc": 6, "addition": 6, "soon": 6, "simpl": 6, "voic": 6, "activ": 6, "detect": 6, "vad": 6, "concaten": 6, "minimum": 6, "permut": 6, "charact": 6, "error": 6, "rate": 6, "calcul": 6, "step": 6, "firstli": 6, "refer": 6, "hypothesi": 6, "chronolog": 6, "order": 6, "secondli": 6, "cer": 6, "repeat": 6, "possibl": 6, "lowest": 6, "tthe": 6, "insert": 6, "Ins": 6, "substitut": 6, "delet": 6, "del": 6, "output": 6, "text": 6, "frac": 6, "mathcal": 6, "n_": 6, "100": 6, "where": 6, "usag": 6, "third": 6, "hug": 6, "face": 6, "list": 6, "clearli": 6, "privat": 6, "simul": 6, "thei": 6, "mandatori": 6, "clear": 6, "scheme": 6, "delight": 7, "introduct": 7, "contact": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"baselin": 0, "overview": [0, 2], "quick": 0, "start": 0, "result": 0, "contact": 1, "dataset": 2, "train": [2, 6], "data": 2, "detail": 2, "alimeet": 2, "corpu": 2, "get": 2, "introduct": 3, "call": 3, "particip": 3, "timelin": 3, "aoe": 3, "time": 3, "guidelin": 3, "organ": 4, "rule": 5, "track": 6, "evalu": 6, "speaker": 6, "attribut": 6, "asr": 6, "metric": 6, "sub": 6, "arrang": 6, "i": 6, "fix": 6, "condit": 6, "ii": 6, "open": 6, "asru": 7, "2023": 7, "multi": 7, "channel": 7, "parti": 7, "meet": 7, "transcript": 7, "challeng": 7, "2": 7, "0": 7, "m2met2": 7, "content": 7}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Baseline": [[0, "baseline"]], "Overview": [[0, "overview"]], "Quick start": [[0, "quick-start"]], "Baseline results": [[0, "baseline-results"]], "Contact": [[1, "contact"]], "Datasets": [[2, "datasets"]], "Overview of training data": [[2, "overview-of-training-data"]], "Detail of AliMeeting corpus": [[2, "detail-of-alimeeting-corpus"]], "Get the data": [[2, "get-the-data"]], "Introduction": [[3, "introduction"]], "Call for participation": [[3, "call-for-participation"]], "Timeline(AOE Time)": [[3, "timeline-aoe-time"]], "Guidelines": [[3, "guidelines"]], "Organizers": [[4, "organizers"]], "Rules": [[5, "rules"]], "Track & Evaluation": [[6, "track-evaluation"]], "Speaker-Attributed ASR": [[6, "speaker-attributed-asr"]], "Evaluation metric": [[6, "evaluation-metric"]], "Sub-track arrangement": [[6, "sub-track-arrangement"]], "Sub-track I (Fixed Training Condition):": [[6, "sub-track-i-fixed-training-condition"]], "Sub-track II (Open Training Condition):": [[6, "sub-track-ii-open-training-condition"]], "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)": [[7, "asru-2023-multi-channel-multi-party-meeting-transcription-challenge-2-0-m2met2-0"]], "Contents:": [[7, null]]}, "indexentries": {}}) \ No newline at end of file diff --git a/docs/m2met2/images/baseline_result.png b/docs/m2met2/images/baseline_result.png index d51d7753c..6b7636192 100644 Binary files a/docs/m2met2/images/baseline_result.png and b/docs/m2met2/images/baseline_result.png differ diff --git a/docs/m2met2/images/qrcode.png b/docs/m2met2/images/qrcode.png index 54b2f55d2..fc4c3498c 100644 Binary files a/docs/m2met2/images/qrcode.png and b/docs/m2met2/images/qrcode.png differ diff --git a/docs/m2met2_cn/_build/doctrees/environment.pickle b/docs/m2met2_cn/_build/doctrees/environment.pickle index 178fe1817..8426df67c 100644 Binary files a/docs/m2met2_cn/_build/doctrees/environment.pickle and b/docs/m2met2_cn/_build/doctrees/environment.pickle differ diff --git a/docs/m2met2_cn/_build/doctrees/基线.doctree b/docs/m2met2_cn/_build/doctrees/基线.doctree index 92159865d..e9e895ce2 100644 Binary files a/docs/m2met2_cn/_build/doctrees/基线.doctree and b/docs/m2met2_cn/_build/doctrees/基线.doctree differ diff --git a/docs/m2met2_cn/_build/html/_images/baseline_result.png b/docs/m2met2_cn/_build/html/_images/baseline_result.png index d51d7753c..6b7636192 100644 Binary files a/docs/m2met2_cn/_build/html/_images/baseline_result.png and b/docs/m2met2_cn/_build/html/_images/baseline_result.png differ diff --git a/docs/m2met2_cn/_build/html/_images/qrcode.png b/docs/m2met2_cn/_build/html/_images/qrcode.png index 54b2f55d2..fc4c3498c 100644 Binary files a/docs/m2met2_cn/_build/html/_images/qrcode.png and b/docs/m2met2_cn/_build/html/_images/qrcode.png differ diff --git a/docs/m2met2_cn/_build/html/_sources/基线.md.txt b/docs/m2met2_cn/_build/html/_sources/基线.md.txt index fab780b0c..e8fc32c2c 100644 --- a/docs/m2met2_cn/_build/html/_sources/基线.md.txt +++ b/docs/m2met2_cn/_build/html/_sources/基线.md.txt @@ -29,4 +29,5 @@ data/Test_2023_Ali_far 更多基线系统详情见[此处](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs/alimeeting/sa-asr/README.md) ## 基线结果 基线系统的结果如表3所示。在训练期间,说话人档案采用了真实说话人嵌入。然而由于在评估过程中缺乏真实说话人标签,因此使用了由额外的谱聚类提供的说话人特征。同时我们还提供了在评估和测试集上使用真实说话人档案的结果,以显示说话人档案准确性的影响。 -![baseline result](images/baseline_result.png) \ No newline at end of file + +![baseline_result](images/baseline_result.png) \ No newline at end of file diff --git a/docs/m2met2_cn/_build/html/基线.html b/docs/m2met2_cn/_build/html/基线.html index 7bfec9309..f1afb2dd3 100644 --- a/docs/m2met2_cn/_build/html/基线.html +++ b/docs/m2met2_cn/_build/html/基线.html @@ -157,8 +157,8 @@

基线结果

-

基线系统的结果如表3所示。在训练期间,说话人档案采用了真实说话人嵌入。然而由于在评估过程中缺乏真实说话人标签,因此使用了由额外的谱聚类提供的说话人特征。同时我们还提供了在评估和测试集上使用真实说话人档案的结果,以显示说话人档案准确性的影响。 -baseline result

+

基线系统的结果如表3所示。在训练期间,说话人档案采用了真实说话人嵌入。然而由于在评估过程中缺乏真实说话人标签,因此使用了由额外的谱聚类提供的说话人特征。同时我们还提供了在评估和测试集上使用真实说话人档案的结果,以显示说话人档案准确性的影响。

+

baseline_result

diff --git a/docs/m2met2_cn/images/baseline_result.png b/docs/m2met2_cn/images/baseline_result.png index d51d7753c..6b7636192 100644 Binary files a/docs/m2met2_cn/images/baseline_result.png and b/docs/m2met2_cn/images/baseline_result.png differ diff --git a/docs/m2met2_cn/images/qrcode.png b/docs/m2met2_cn/images/qrcode.png index 54b2f55d2..fc4c3498c 100644 Binary files a/docs/m2met2_cn/images/qrcode.png and b/docs/m2met2_cn/images/qrcode.png differ diff --git a/docs/m2met2_cn/基线.md b/docs/m2met2_cn/基线.md index fab780b0c..e8fc32c2c 100644 --- a/docs/m2met2_cn/基线.md +++ b/docs/m2met2_cn/基线.md @@ -29,4 +29,5 @@ data/Test_2023_Ali_far 更多基线系统详情见[此处](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs/alimeeting/sa-asr/README.md) ## 基线结果 基线系统的结果如表3所示。在训练期间,说话人档案采用了真实说话人嵌入。然而由于在评估过程中缺乏真实说话人标签,因此使用了由额外的谱聚类提供的说话人特征。同时我们还提供了在评估和测试集上使用真实说话人档案的结果,以显示说话人档案准确性的影响。 -![baseline result](images/baseline_result.png) \ No newline at end of file + +![baseline_result](images/baseline_result.png) \ No newline at end of file diff --git a/egs/alimeeting/sa-asr/README.md b/egs/alimeeting/sa-asr/README.md index 882345c25..bc6d04c39 100644 --- a/egs/alimeeting/sa-asr/README.md +++ b/egs/alimeeting/sa-asr/README.md @@ -19,7 +19,7 @@ stage 6: Generate speaker profiles (Stage 6 takes a lot of time). stage 7 - 9: Language model training (Optional). stage 10 - 11: ASR training (SA-ASR requires loading the pre-trained ASR model). stage 12: SA-ASR training. -stage 13 - 18: Inference and evaluation. +stage 13 - 16: Inference and evaluation. ``` Before running `run_m2met_2023_infer.sh`, you need to place the new test set `Test_2023_Ali_far` (to be released after the challenge starts) in the `./dataset` directory, which contains only raw audios. Then put the given `wav.scp`, `wav_raw.scp`, `segments`, `utt2spk` and `spk2utt` in the `./data/Test_2023_Ali_far` directory. ```shell @@ -37,6 +37,10 @@ stage 2: Generate speaker profiles for inference. stage 3: Inference. stage 4: Generation of SA-ASR results required for final submission. ``` + +The baseline model is available on [ModelScope](https://www.modelscope.cn/models/damo/speech_saasr_asr-zh-cn-16k-alimeeting/summary). +After generate stats of AliMeeting corpus(stage 10 in `run.sh`), you can set the `infer_with_pretrained_model=true` in `run.sh` to infer with our official baseline model released on ModelScope without training. + # Format of Final Submission Finally, you need to submit a file called `text_spk_merge` with the following format: ```shell diff --git a/egs/alimeeting/sa-asr/asr_local.sh b/egs/alimeeting/sa-asr/asr_local.sh index f8cdcd3b6..543352efb 100755 --- a/egs/alimeeting/sa-asr/asr_local.sh +++ b/egs/alimeeting/sa-asr/asr_local.sh @@ -107,8 +107,8 @@ inference_asr_model=valid.acc.ave.pb # ASR model path for decoding. # inference_asr_model=valid.acc.best.pth # inference_asr_model=valid.loss.ave.pth inference_sa_asr_model=valid.acc_spk.ave.pb -download_model= # Download a model from Model Zoo and use it for decoding. - +infer_with_pretrained_model=false # Use pretrained model for decoding +download_sa_asr_model= # Download the SA-ASR model from ModelScope and use it for decoding. # [Task dependent] Set the datadir name created by local/data.sh train_set= # Name of training set. valid_set= # Name of validation set used for monitoring/tuning network training. @@ -203,7 +203,8 @@ Options: # Note that it will overwrite args in inference config. --inference_lm # Language modle path for decoding (default="${inference_lm}"). --inference_asr_model # ASR model path for decoding (default="${inference_asr_model}"). - --download_model # Download a model from Model Zoo and use it for decoding (default="${download_model}"). + --infer_with_pretrained_model # Use pretrained model for decoding (default="${infer_with_pretrained_model}"). + --download_sa_asr_model= # Download the SA-ASR model from ModelScope and use it for decoding(default="${download_sa_asr_model}"). # [Task dependent] Set the datadir name created by local/data.sh --train_set # Name of training set (required). @@ -304,6 +305,9 @@ else lm_token_type="${token_type}" fi +if ${infer_with_pretrained_model}; then + skip_train=true +fi # Set tag for naming of model directory if [ -z "${asr_tag}" ]; then @@ -1220,119 +1224,20 @@ else log "Skip the training stages" fi +if ${infer_with_pretrained_model}; then + log "Use ${download_sa_asr_model} for decoding and evaluation" + + sa_asr_exp="${expdir}/${download_sa_asr_model}" + mkdir -p "${sa_asr_exp}" + + python local/download_pretrained_model_from_modelscope.py $download_sa_asr_model ${expdir} + inference_sa_asr_model="model.pb" + inference_config=${sa_asr_exp}/decoding.yaml +fi if ! "${skip_eval}"; then if [ ${stage} -le 13 ] && [ ${stop_stage} -ge 13 ]; then - log "Stage 13: Decoding multi-talker ASR: training_dir=${asr_exp}" - - if ${gpu_inference}; then - _cmd="${cuda_cmd}" - inference_nj=$[${ngpu}*${njob_infer}] - _ngpu=1 - - else - _cmd="${decode_cmd}" - inference_nj=$inference_nj - _ngpu=0 - fi - - _opts= - if [ -n "${inference_config}" ]; then - _opts+="--config ${inference_config} " - fi - if "${use_lm}"; then - if "${use_word_lm}"; then - _opts+="--word_lm_train_config ${lm_exp}/config.yaml " - _opts+="--word_lm_file ${lm_exp}/${inference_lm} " - else - _opts+="--lm_train_config ${lm_exp}/config.yaml " - _opts+="--lm_file ${lm_exp}/${inference_lm} " - fi - fi - - # 2. Generate run.sh - log "Generate '${asr_exp}/${inference_tag}/run.sh'. You can resume the process from stage 13 using this script" - mkdir -p "${asr_exp}/${inference_tag}"; echo "${run_args} --stage 13 \"\$@\"; exit \$?" > "${asr_exp}/${inference_tag}/run.sh"; chmod +x "${asr_exp}/${inference_tag}/run.sh" - - for dset in ${test_sets}; do - _data="${data_feats}/${dset}" - _dir="${asr_exp}/${inference_tag}/${dset}" - _logdir="${_dir}/logdir" - mkdir -p "${_logdir}" - - _feats_type="$(<${_data}/feats_type)" - if [ "${_feats_type}" = raw ]; then - _scp=wav.scp - if [[ "${audio_format}" == *ark* ]]; then - _type=kaldi_ark - else - _type=sound - fi - else - _scp=feats.scp - _type=kaldi_ark - fi - - # 1. Split the key file - key_file=${_data}/${_scp} - split_scps="" - _nj=$(min "${inference_nj}" "$(<${key_file} wc -l)") - echo $_nj - for n in $(seq "${_nj}"); do - split_scps+=" ${_logdir}/keys.${n}.scp" - done - # shellcheck disable=SC2086 - utils/split_scp.pl "${key_file}" ${split_scps} - - # 2. Submit decoding jobs - log "Decoding started... log: '${_logdir}/asr_inference.*.log'" - - ${_cmd} --gpu "${_ngpu}" --max-jobs-run "${_nj}" JOB=1:"${_nj}" "${_logdir}"/asr_inference.JOB.log \ - python -m funasr.bin.asr_inference_launch \ - --batch_size 1 \ - --mc True \ - --nbest 1 \ - --ngpu "${_ngpu}" \ - --njob ${njob_infer} \ - --gpuid_list ${device} \ - --data_path_and_name_and_type "${_data}/${_scp},speech,${_type}" \ - --key_file "${_logdir}"/keys.JOB.scp \ - --asr_train_config "${asr_exp}"/config.yaml \ - --asr_model_file "${asr_exp}"/"${inference_asr_model}" \ - --output_dir "${_logdir}"/output.JOB \ - --mode asr \ - ${_opts} - - # 3. Concatenates the output files from each jobs - for f in token token_int score text; do - for i in $(seq "${_nj}"); do - cat "${_logdir}/output.${i}/1best_recog/${f}" - done | LC_ALL=C sort -k1 >"${_dir}/${f}" - done - done - fi - - - if [ ${stage} -le 14 ] && [ ${stop_stage} -ge 14 ]; then - log "Stage 14: Scoring multi-talker ASR" - - for dset in ${test_sets}; do - _data="${data_feats}/${dset}" - _dir="${asr_exp}/${inference_tag}/${dset}" - - python utils/proce_text.py ${_data}/text ${_data}/text.proc - python utils/proce_text.py ${_dir}/text ${_dir}/text.proc - - python utils/compute_wer.py ${_data}/text.proc ${_dir}/text.proc ${_dir}/text.cer - tail -n 3 ${_dir}/text.cer > ${_dir}/text.cer.txt - cat ${_dir}/text.cer.txt - - done - - fi - - if [ ${stage} -le 15 ] && [ ${stop_stage} -ge 15 ]; then - log "Stage 15: Decoding SA-ASR (oracle profile): training_dir=${sa_asr_exp}" + log "Stage 13: Decoding SA-ASR (oracle profile): training_dir=${sa_asr_exp}" if ${gpu_inference}; then _cmd="${cuda_cmd}" @@ -1423,8 +1328,8 @@ if ! "${skip_eval}"; then done fi - if [ ${stage} -le 16 ] && [ ${stop_stage} -ge 16 ]; then - log "Stage 16: Scoring SA-ASR (oracle profile)" + if [ ${stage} -le 14 ] && [ ${stop_stage} -ge 14 ]; then + log "Stage 14: Scoring SA-ASR (oracle profile)" for dset in ${test_sets}; do _data="${data_feats}/${dset}" @@ -1448,8 +1353,8 @@ if ! "${skip_eval}"; then fi - if [ ${stage} -le 17 ] && [ ${stop_stage} -ge 17 ]; then - log "Stage 17: Decoding SA-ASR (cluster profile): training_dir=${sa_asr_exp}" + if [ ${stage} -le 15 ] && [ ${stop_stage} -ge 15 ]; then + log "Stage 15: Decoding SA-ASR (cluster profile): training_dir=${sa_asr_exp}" if ${gpu_inference}; then _cmd="${cuda_cmd}" @@ -1539,8 +1444,8 @@ if ! "${skip_eval}"; then done fi - if [ ${stage} -le 18 ] && [ ${stop_stage} -ge 18 ]; then - log "Stage 18: Scoring SA-ASR (cluster profile)" + if [ ${stage} -le 16 ] && [ ${stop_stage} -ge 16 ]; then + log "Stage 16: Scoring SA-ASR (cluster profile)" for dset in ${test_sets}; do _data="${data_feats}/${dset}" diff --git a/egs/alimeeting/sa-asr/local/download_pretrained_model_from_modelscope.py b/egs/alimeeting/sa-asr/local/download_pretrained_model_from_modelscope.py new file mode 100644 index 000000000..b4b54127a --- /dev/null +++ b/egs/alimeeting/sa-asr/local/download_pretrained_model_from_modelscope.py @@ -0,0 +1,7 @@ +from modelscope.hub.snapshot_download import snapshot_download +import sys + +if __name__ == "__main__": + model_tag = sys.argv[1] + local_model_dir = sys.argv[2] + model_dir = snapshot_download(model_tag, cache_dir=local_model_dir, revision='1.0.0') \ No newline at end of file diff --git a/egs/alimeeting/sa-asr/run.sh b/egs/alimeeting/sa-asr/run.sh index e5297b8b3..2869164a4 100755 --- a/egs/alimeeting/sa-asr/run.sh +++ b/egs/alimeeting/sa-asr/run.sh @@ -8,8 +8,8 @@ set -o pipefail ngpu=4 device="0,1,2,3" -stage=1 -stop_stage=18 +stage=12 +stop_stage=13 train_set=Train_Ali_far @@ -18,6 +18,8 @@ test_sets="Test_Ali_far" asr_config=conf/train_asr_conformer.yaml sa_asr_config=conf/train_sa_asr_conformer.yaml inference_config=conf/decode_asr_rnn.yaml +infer_with_pretrained_model=true +download_sa_asr_model="damo/speech_saasr_asr-zh-cn-16k-alimeeting" lm_config=conf/train_lm_transformer.yaml use_lm=false @@ -29,6 +31,8 @@ use_wordlm=false --stop_stage ${stop_stage} \ --gpu_inference true \ --njob_infer 4 \ + --infer_with_pretrained_model ${infer_with_pretrained_model} \ + --download_sa_asr_model $download_sa_asr_model \ --asr_exp exp/asr_train_multispeaker_conformer_raw_zh_char_data_alimeeting \ --sa_asr_exp exp/sa_asr_train_conformer_raw_zh_char_data_alimeeting \ --asr_stats_dir exp/asr_stats_multispeaker_conformer_raw_zh_char_data_alimeeting \