From b1c0815c7959fed6e9e1840799c0a7eae8db095c Mon Sep 17 00:00:00 2001
From: Brett Balquist <113616657+brett-b112@users.noreply.github.com>
Date: Fri, 5 May 2023 01:47:45 -0500
Subject: [PATCH] Updated README.md to provide more insight on BLEU and
 specific appendices (#1236)

* Updated README.md to provide more insight on BLEU and specific appendices in the research paper

* Update README.md

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b4d3998..2053257 100644
--- a/README.md
+++ b/README.md
@@ -70,7 +70,7 @@ There are five model sizes, four with English-only versions, offering speed and
 
 The `.en` models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models.
 
-Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the `large-v2` model. More WER and BLEU scores corresponding to the other models and datasets can be found in Appendix D in [the paper](https://arxiv.org/abs/2212.04356). The smaller, the better.
+Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the `large-v2` model (The smaller the numbers, the better the performance). Additional WER scores corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4. Meanwhile, more BLEU (Bilingual Evaluation Understudy) scores can be found in Appendix D.3. Both are found in [the paper](https://arxiv.org/abs/2212.04356). 
 
 ![WER breakdown by language](https://raw.githubusercontent.com/openai/whisper/main/language-breakdown.svg)