在 Python 中计算 BLEU 分数

Question

提问by Alapan Kuila

There is a test sentence and a reference sentence. How can I write a Python script that measures similarity between these two sentences in the form of BLEU metric used in automatic machine translation evaluation?

有一个测试句和一个参考句。如何编写一个 Python 脚本，以自动机器翻译评估中使用的 BLEU 指标的形式测量这两个句子之间的相似性？

Answer 1

回答by Semih Yagcioglu

You are actually asking for two differentthings. I will try to shed light on each of the questions.

你实际上是在要求两件不同的事情。我将尝试阐明每个问题。

Part I: Computing the BLEU score

第一部分：计算 BLEU 分数

You can calculate BLEU score using the BLEU module under nltk. See here.

您可以使用下的 BLEU 模块计算 BLEU 分数nltk。见这里。

From there you can easily compute the alignment score between the candidate and reference sentences.

从那里您可以轻松计算候选句子和参考句子之间的对齐分数。

Part II: Computing the similarity

第二部分：计算相似度

I would not suggest using the BLEU score as similarity measure between the first candidate and second candidate if you aim to measure the similarity based on the reference sentence.

如果您打算根据参考句子来衡量相似度，我不建议使用 BLEU 分数作为第一个候选和第二个候选之间的相似度度量。

Now, let me elaborate this. If you calculate a BLEU score for a candidate against a reference, then this score would merely help you understand the similarity between another canditate's BLEU score against the reference sentence, even though the reference sentence remains the same.

现在，让我详细说明一下。如果您根据参考来计算候选人的 BLEU 分数，那么该分数只会帮助您了解另一个候选人的 BLEU 分数与参考句子之间的相似性，即使参考句子保持不变。

If you intend to measure the similarity between two sentences, word2vecwould be a better method. You can compute the angular cosine distance between the two sentence vectors to understand their similarity.

如果您打算测量两个句子之间的相似度，word2vec将是一个更好的方法。您可以计算两个句子向量之间的角余弦距离以了解它们的相似性。

For a thorough understanding of what BLEUmetric does, I'd suggest reading thisas well as thisfor word2vecsimilarity.

为了彻底了解BLEU指标的作用，我建议您阅读本文以及本文以了解word2vec相似度。

Answer 2

回答by ccy

The BLEU score consists of two parts, modified precision and brevity penalty. Details can be seen in the paper. You can use the nltk.align.bleu_scoremodule inside the NLTK. One code example can be seen as below:

BLEU 分数由两部分组成，修正精度和简洁惩罚。详情可见论文。您可以nltk.align.bleu_score在 NLTK 中使用该模块。一个代码示例如下所示：

import nltk

hypothesis = ['It', 'is', 'a', 'cat', 'at', 'room']
reference = ['It', 'is', 'a', 'cat', 'inside', 'the', 'room']
#there may be several references
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis)
print(BLEUscore)

Note that the default BLEU score uses n=4 which includes unigrams to 4 grams. If your sentence is smaller than 4, you need to reset the N value, otherwise ZeroDivisionError: Fraction(0, 0)error will be returned. So, you should reset the weight like this:

请注意，默认的 BLEU 分数使用 n=4，其中包括 4 克的 unigram。如果你的句子小于4，你需要重新设置N值，否则ZeroDivisionError: Fraction(0, 0)会返回错误。所以，你应该像这样重置权重：

import nltk

hypothesis = ["open", "the", "file"]
reference = ["open", "file"]
#the maximum is bigram, so assign the weight into 2 half.
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = (0.5, 0.5))
print(BLEUscore)

Answer 3

回答by Franck Dernoncourt

You may want to use the python package SacréBLEU(Python 3 only):

您可能想要使用 python 包SacréBLEU（仅限 Python 3）：

SacréBLEU provides hassle-free computation of shareable, comparable, and reproducible BLEU scores. Inspired by Rico Sennrich's multi-bleu-detok.perl, it produces the official WMT scores but works with plain text. It also knows all the standard test sets and handles downloading, processing, and tokenization for you.
Why use this version of BLEU?
It automatically downloads common WMT test sets and processes them to plain text
It produces a short version string that facilitates cross-paper comparisons
It properly computes scores on detokenized outputs, using WMT (Conference on Machine Translation) standard tokenization
It produces the same values as official script (mteval-v13a.pl) used by WMT
It outputs the BLEU score without the comma, so you don't have to remove it with sed(Looking at you, multi-bleu.perl)

SacréBLEU 可轻松计算可共享、可比较和可重现的 BLEU 分数。受 Rico Sennrich 的启发multi-bleu-detok.perl，它生成官方 WMT 分数，但使用纯文本。它还知道所有标准测试集并为您处理下载、处理和标记化。
为什么要使用这个版本的 BLEU？
它会自动下载常见的 WMT 测试集并将其处理为纯文本
它生成一个简短的版本字符串，便于跨论文比较
它使用 WMT（机器翻译会议）标准标记化正确计算去标记化输出的分数
它产生与mteval-v13a.plWMT 使用的官方脚本 ( )相同的值
它输出不带逗号的 BLEU 分数，因此您不必使用sed(Looking at you, multi-bleu.perl)将其删除

To install: pip install sacrebleu

安装： pip install sacrebleu

Answer 4

回答by Ameet Deshpande

The following is the code for calculating Bleuscore between two files.

以下是计算Bleu两个文件之间的分数的代码。

from nltk.translate.bleu_score import sentence_bleu
import argparse

def argparser():
    Argparser = argparse.ArgumentParser()
    Argparser.add_argument('--reference', type=str, default='summaries.txt', help='Reference File')
    Argparser.add_argument('--candidate', type=str, default='candidates.txt', help='Candidate file')

    args = Argparser.parse_args()
    return args

args = argparser()

reference = open(args.reference, 'r').readlines()
candidate = open(args.candidate, 'r').readlines()

if len(reference) != len(candidate):
    raise ValueError('The number of sentences in both files do not match.')

score = 0.

for i in range(len(reference)):
    score += sentence_bleu([reference[i].strip().split()], candidate[i].strip().split())

score /= len(reference)
print("The bleu score is: "+str(score))

Use the command python file_name.py --reference file1.txt --candidate file2.txt

使用命令 python file_name.py --reference file1.txt --candidate file2.txt

Answer 5

回答by Aryan Singh

I can show some examples of how to calculate BLEU score if test and reference sentences are known.

如果测试和参考句子已知，我可以展示一些如何计算 BLEU 分数的示例。

You can even take both sentences as input in the form of a string and convert to lists.

您甚至可以将两个句子作为字符串形式的输入并转换为列表。

from nltk.translate.bleu_score import sentence_bleu
reference = [['the', 'cat',"is","sitting","on","the","mat"]]
test = ["on",'the',"mat","is","a","cat"]
score = sentence_bleu(  reference, test)
print(score)


from nltk.translate.bleu_score import sentence_bleu
reference = [['the', 'cat',"is","sitting","on","the","mat"]]
test = ["there",'is',"cat","sitting","cat"]
score = sentence_bleu(  reference, test)
print(score)

在 Python 中计算 BLEU 分数

提问by Alapan Kuila

回答by Semih Yagcioglu

回答by ccy

回答by Franck Dernoncourt

回答by Ameet Deshpande

回答by Aryan Singh

相关推荐

最近更新

标签

在 Python 中计算 BLEU 分数

提问by Alapan Kuila

回答by Semih Yagcioglu

回答by ccy

回答by Franck Dernoncourt

回答by Ameet Deshpande

回答by Aryan Singh

相关推荐

Python 如何在 Pandas 的特定列索引处插入一列？

Python 在 Tkinter 中使用 OpenCV

Python 如何运行康达？

Python 如何将日期时间列舍入到最接近的一刻钟

相关推荐

最近更新

标签