Python 使用 NLTK WordNet 查找专有名词

Question

提问by Backue

Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?

有没有办法使用 NLTK WordNet 找到专有名词？即，我可以使用 nltk Wordnet 标记所有格名词吗？

Answer 1

采纳答案by alvas

I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag.

我认为您不需要 WordNet 来查找专有名词，我建议使用 Part-Of-Speech tagger pos_tag。

To find Proper Nouns, look for the NNPtag:

要查找专有名词，请查找NNP标签：

from nltk.tag import pos_tag

sentence = "Michael Hymanson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Hymanson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]

propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
# ['Michael','Hymanson', 'McDonalds']

You may not be very satisfied since Michaeland Hymansonis split into 2 tokens, then you might need something more complex such as Name Entity tagger.

您可能不会很满意，因为Michael它Hymanson被分成 2 个标记，那么您可能需要更复杂的东西，例如 Name Entity tagger。

By right, as documented by the penntreebanktagset, for possessive nouns, you can simply look for the POStag, http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POSwhen it's an NNP.

没错，正如penntreebank标签集所记录的那样，对于所有格名词，您可以简单地查找POS标签，http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html。但是通常标记器POS在它是NNP.

To find Possessive Nouns, look for str.endswith("'s") or str.endswith("s'"):

要查找所有格名词，请查找 str.endswith("'s") 或 str.endswith("s'")：

from nltk.tag import pos_tag

sentence = "Michael Hymanson took Daniel Hymanson's hamburger and Agnes' fries"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Hymanson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Hymanson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]

possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]
# ["Hymanson's", "Agnes'"]

Alternatively, you can use NLTK ne_chunkbut it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:

或者，您可以使用 NLTKne_chunk但它似乎并没有做太多其他的事情，除非您担心从句子中得到什么样的专有名词：

>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk
>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
[Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Hymanson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]
>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]
['Michael', 'Hymanson', 'Daniel']

Using ne_chunkis a little verbose and it doesn't get you the possessives.

使用ne_chunk有点冗长，它不会让你拥有所有格。

Answer 2

回答by turdus-merula

I think what you need is a tagger, a part-of-speech tagger. This tool assigns a part-of-speech tag(e.g., proper noun, possesive pronoun etc) to each word in a sentence.

我认为你需要的是一个tagger，一个词性标注器。该工具为句子中的每个单词分配一个词性标签（例如，专有名词、物主代词等）。

NLTKincludes some taggers: http://nltk.org/book/ch05.html

NLTK包括一些标签：http: //nltk.org/book/ch05.html

There's also the Stanford Part-Of-Speech Tagger(open source too, better performance).

还有斯坦福词性标注器（也是开源的，性能更好）。

Python 使用 NLTK WordNet 查找专有名词

提问by Backue

采纳答案by alvas

回答by turdus-merula

相关推荐

最近更新

标签

Python 使用 NLTK WordNet 查找专有名词

提问by Backue

采纳答案by alvas

回答by turdus-merula

相关推荐

如何在python中播放wav文件？

Python 超出相对导入中的顶级包错误

Python 如何编写 tkinter “scrolledtext” 模块

python中for循环中的[]括号是什么意思？

相关推荐

最近更新

标签