在 Python 中计算 BLEU 分数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32395880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculate BLEU score in Python
提问by Alapan Kuila
There is a test sentence and a reference sentence. How can I write a Python script that measures similarity between these two sentences in the form of BLEU metric used in automatic machine translation evaluation?
有一个测试句和一个参考句。如何编写一个 Python 脚本,以自动机器翻译评估中使用的 BLEU 指标的形式测量这两个句子之间的相似性?
回答by Semih Yagcioglu
You are actually asking for two differentthings. I will try to shed light on each of the questions.
你实际上是在要求两件不同的事情。我将尝试阐明每个问题。
Part I: Computing the BLEU score
第一部分:计算 BLEU 分数
You can calculate BLEU score using the BLEU module under nltk
. See here.
您可以使用 下的 BLEU 模块计算 BLEU 分数nltk
。见这里。
From there you can easily compute the alignment score between the candidate and reference sentences.
从那里您可以轻松计算候选句子和参考句子之间的对齐分数。
Part II: Computing the similarity
第二部分:计算相似度
I would not suggest using the BLEU score as similarity measure between the first candidate and second candidate if you aim to measure the similarity based on the reference sentence.
如果您打算根据参考句子来衡量相似度,我不建议使用 BLEU 分数作为第一个候选和第二个候选之间的相似度度量。
Now, let me elaborate this. If you calculate a BLEU score for a candidate against a reference, then this score would merely help you understand the similarity between another canditate's BLEU score against the reference sentence, even though the reference sentence remains the same.
现在,让我详细说明一下。如果您根据参考来计算候选人的 BLEU 分数,那么该分数只会帮助您了解另一个候选人的 BLEU 分数与参考句子之间的相似性,即使参考句子保持不变。
If you intend to measure the similarity between two sentences, word2vecwould be a better method. You can compute the angular cosine distance between the two sentence vectors to understand their similarity.
如果您打算测量两个句子之间的相似度,word2vec将是一个更好的方法。您可以计算两个句子向量之间的角余弦距离以了解它们的相似性。
For a thorough understanding of what BLEUmetric does, I'd suggest reading thisas well as thisfor word2vecsimilarity.
回答by ccy
The BLEU score consists of two parts, modified precision and brevity penalty.
Details can be seen in the paper.
You can use the nltk.align.bleu_score
module inside the NLTK.
One code example can be seen as below:
BLEU 分数由两部分组成,修正精度和简洁惩罚。详情可见论文。您可以nltk.align.bleu_score
在 NLTK 中使用该模块。一个代码示例如下所示:
import nltk
hypothesis = ['It', 'is', 'a', 'cat', 'at', 'room']
reference = ['It', 'is', 'a', 'cat', 'inside', 'the', 'room']
#there may be several references
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis)
print(BLEUscore)
Note that the default BLEU score uses n=4 which includes unigrams to 4 grams. If your sentence is smaller than 4, you need to reset the N value, otherwise ZeroDivisionError: Fraction(0, 0)
error will be returned.
So, you should reset the weight like this:
请注意,默认的 BLEU 分数使用 n=4,其中包括 4 克的 unigram。如果你的句子小于4,你需要重新设置N值,否则ZeroDivisionError: Fraction(0, 0)
会返回错误。所以,你应该像这样重置权重:
import nltk
hypothesis = ["open", "the", "file"]
reference = ["open", "file"]
#the maximum is bigram, so assign the weight into 2 half.
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = (0.5, 0.5))
print(BLEUscore)
回答by Franck Dernoncourt
You may want to use the python package SacréBLEU(Python 3 only):
您可能想要使用 python 包SacréBLEU(仅限 Python 3):
SacréBLEU provides hassle-free computation of shareable, comparable, and reproducible BLEU scores. Inspired by Rico Sennrich's
multi-bleu-detok.perl
, it produces the official WMT scores but works with plain text. It also knows all the standard test sets and handles downloading, processing, and tokenization for you.Why use this version of BLEU?
- It automatically downloads common WMT test sets and processes them to plain text
- It produces a short version string that facilitates cross-paper comparisons
- It properly computes scores on detokenized outputs, using WMT (Conference on Machine Translation) standard tokenization
- It produces the same values as official script (
mteval-v13a.pl
) used by WMT- It outputs the BLEU score without the comma, so you don't have to remove it with
sed
(Looking at you,multi-bleu.perl
)
SacréBLEU 可轻松计算可共享、可比较和可重现的 BLEU 分数。受 Rico Sennrich 的启发
multi-bleu-detok.perl
,它生成官方 WMT 分数,但使用纯文本。它还知道所有标准测试集并为您处理下载、处理和标记化。为什么要使用这个版本的 BLEU?
- 它会自动下载常见的 WMT 测试集并将其处理为纯文本
- 它生成一个简短的版本字符串,便于跨论文比较
- 它使用 WMT(机器翻译会议)标准标记化正确计算去标记化输出的分数
- 它产生与
mteval-v13a.pl
WMT 使用的官方脚本 ( )相同的值- 它输出不带逗号的 BLEU 分数,因此您不必使用
sed
(Looking at you,multi-bleu.perl
)将其删除
To install: pip install sacrebleu
安装: pip install sacrebleu
回答by Ameet Deshpande
The following is the code for calculating Bleu
score between two files.
以下是计算Bleu
两个文件之间的分数的代码。
from nltk.translate.bleu_score import sentence_bleu
import argparse
def argparser():
Argparser = argparse.ArgumentParser()
Argparser.add_argument('--reference', type=str, default='summaries.txt', help='Reference File')
Argparser.add_argument('--candidate', type=str, default='candidates.txt', help='Candidate file')
args = Argparser.parse_args()
return args
args = argparser()
reference = open(args.reference, 'r').readlines()
candidate = open(args.candidate, 'r').readlines()
if len(reference) != len(candidate):
raise ValueError('The number of sentences in both files do not match.')
score = 0.
for i in range(len(reference)):
score += sentence_bleu([reference[i].strip().split()], candidate[i].strip().split())
score /= len(reference)
print("The bleu score is: "+str(score))
Use the command python file_name.py --reference file1.txt --candidate file2.txt
使用命令 python file_name.py --reference file1.txt --candidate file2.txt
回答by Aryan Singh
I can show some examples of how to calculate BLEU score if test and reference sentences are known.
如果测试和参考句子已知,我可以展示一些如何计算 BLEU 分数的示例。
You can even take both sentences as input in the form of a string and convert to lists.
您甚至可以将两个句子作为字符串形式的输入并转换为列表。
from nltk.translate.bleu_score import sentence_bleu
reference = [['the', 'cat',"is","sitting","on","the","mat"]]
test = ["on",'the',"mat","is","a","cat"]
score = sentence_bleu( reference, test)
print(score)
from nltk.translate.bleu_score import sentence_bleu
reference = [['the', 'cat',"is","sitting","on","the","mat"]]
test = ["there",'is',"cat","sitting","cat"]
score = sentence_bleu( reference, test)
print(score)