Python 使用 NLTK WordNet 查找专有名词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17669952/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding Proper Nouns using NLTK WordNet
提问by Backue
Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?
有没有办法使用 NLTK WordNet 找到专有名词?即,我可以使用 nltk Wordnet 标记所有格名词吗?
采纳答案by alvas
I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag
.
我认为您不需要 WordNet 来查找专有名词,我建议使用 Part-Of-Speech tagger pos_tag
。
To find Proper Nouns, look for the NNP
tag:
要查找专有名词,请查找NNP
标签:
from nltk.tag import pos_tag
sentence = "Michael Hymanson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Hymanson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]
propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
# ['Michael','Hymanson', 'McDonalds']
You may not be very satisfied since Michael
and Hymanson
is split into 2 tokens, then you might need something more complex such as Name Entity tagger.
您可能不会很满意,因为Michael
它Hymanson
被分成 2 个标记,那么您可能需要更复杂的东西,例如 Name Entity tagger。
By right, as documented by the penntreebank
tagset, for possessive nouns, you can simply look for the POS
tag, http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POS
when it's an NNP
.
没错,正如penntreebank
标签集所记录的那样,对于所有格名词,您可以简单地查找POS
标签,http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html。但是通常标记器POS
在它是NNP
.
To find Possessive Nouns, look for str.endswith("'s") or str.endswith("s'"):
要查找所有格名词,请查找 str.endswith("'s") 或 str.endswith("s'"):
from nltk.tag import pos_tag
sentence = "Michael Hymanson took Daniel Hymanson's hamburger and Agnes' fries"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Hymanson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Hymanson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]
possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]
# ["Hymanson's", "Agnes'"]
Alternatively, you can use NLTK ne_chunk
but it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:
或者,您可以使用 NLTKne_chunk
但它似乎并没有做太多其他的事情,除非您担心从句子中得到什么样的专有名词:
>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk
>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
[Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Hymanson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]
>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]
['Michael', 'Hymanson', 'Daniel']
Using ne_chunk
is a little verbose and it doesn't get you the possessives.
使用ne_chunk
有点冗长,它不会让你拥有所有格。
回答by turdus-merula
I think what you need is a tagger, a part-of-speech tagger. This tool assigns a part-of-speech tag(e.g., proper noun, possesive pronoun etc) to each word in a sentence.
我认为你需要的是一个tagger,一个词性标注器。该工具为句子中的每个单词分配一个词性标签(例如,专有名词、物主代词等)。
NLTKincludes some taggers: http://nltk.org/book/ch05.html
NLTK包括一些标签:http: //nltk.org/book/ch05.html
There's also the Stanford Part-Of-Speech Tagger(open source too, better performance).
还有斯坦福词性标注器(也是开源的,性能更好)。