用 NLTK 和 Python 检查两个单词之间的相似性

Question

提问by Punuth

I have a two lists and I want to check the similarity between each words in the two list and find out the maximum similarity.Here is my code,

我有两个列表，我想检查两个列表中每个单词之间的相似度并找出最大的相似度。这是我的代码，

from nltk.corpus import wordnet

list1 = ['Compare', 'require']
list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write']
list = []

for word1 in list1:
    for word2 in list2:
        wordFromList1 = wordnet.synsets(word1)[0]
        wordFromList2 = wordnet.synsets(word2)[0]
        s = wordFromList1.wup_similarity(wordFromList2)
        list.append(s)

print(max(list))

But this will result an error:

但这会导致错误：

wordFromList2 = wordnet.synsets(word2)[0]
        IndexError: list index out of range

Please help me to fix this.
Thanking you

请帮我解决这个问题。
感谢您

Answer 1

回答by omerbp

Try checking whether these lists are empty before you use then:

在使用之前尝试检查这些列表是否为空：

from nltk.corpus import wordnet

list1 = ['Compare', 'require']
list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write']
list = []

for word1 in list1:
    for word2 in list2:
        wordFromList1 = wordnet.synsets(word1)
        wordFromList2 = wordnet.synsets(word2)
        if wordFromList1 and wordFromList2: #Thanks to @alexis' note
            s = wordFromList1[0].wup_similarity(wordFromList2[0])
            list.append(s)

print(max(list))

Answer 2

回答by alexis

You're getting an error if a synset list is empty, and you try to get the element at (non-existent) index zero. But why only check the zero'th element? If you want to check everything, try all pairs of elements in the returned synsets. You can use itertools.product()to save yourself two for-loops:

如果同义词集列表为空，则会出现错误，并且您尝试在（不存在的）索引零处获取元素。但是为什么只检查第零个元素呢？如果您想检查所有内容，请尝试返回的同义词集中的所有元素对。您可以使用itertools.product()来保存自己的两个 for 循环：

from itertools import product
sims = []

for word1, word2 in product(list1, list2):
    syns1 = wordnet.synsets(word1)
    syns2 = wordnet.synsets(word2)
    for sense1, sense2 in product(syns1, syns2):
        d = wordnet.wup_similarity(sense1, sense2)
        sims.append((d, syns1, syns2))

This is inefficient because the same synsets are looked up again and again, but it is the closest to the logic of your code. If you have enough data to make speed an issue, you can speed it up by collecting the synsets for all words in list1and list2once, and taking the product of the synsets.

这是低效的，因为一遍又一遍地查找相同的同义词集，但它最接近您的代码逻辑。如果您有足够的数据来解决速度问题，您可以通过收集 inlist1和list2once 中所有单词的同义词集并取同义词集的乘积来加快速度。

>>> allsyns1 = set(ss for word in list1 for ss in wordnet.synsets(word))
>>> allsyns2 = set(ss for word in list2 for ss in wordnet.synsets(word))
>>> best = max((wordnet.wup_similarity(s1, s2) or 0, s1, s2) for s1, s2 in 
        product(allsyns1, allsyns2))
>>> print(best)
(0.9411764705882353, Synset('command.v.02'), Synset('order.v.01'))

用 NLTK 和 Python 检查两个单词之间的相似性

提问by Punuth

回答by omerbp

回答by alexis

相关推荐

最近更新

标签

用 NLTK 和 Python 检查两个单词之间的相似性

提问by Punuth

回答by omerbp

回答by alexis

相关推荐

Python偏导数容易

如何返回字典 | Python

使用 Python 从 .txt 文件创建字典

Python & Windows：python 启动器在哪里？

相关推荐

最近更新

标签