java 使用斯坦福类型解析器从文本文件中提取名词短语
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10974532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extracting noun phrases from a text file using stanford typed parser
提问by S Gaber
I have a text which I want to extract the noun phrases from it. I can easily get the typed parser for the text that i have, but wondering how i can extract the noun phrases in the text ?
我有一个文本,我想从中提取名词短语。我可以轻松获得文本的类型解析器,但想知道如何提取文本中的名词短语?
回答by alan turing
You can extract noun phrases from Tree by using following code. It assumes you have parsed sentence stored in parse(i.e. parse is output of LexicalizedParser class apply method)
您可以使用以下代码从 Tree 中提取名词短语。它假设您已解析存储在parse 中的句子(即 parse 是 LexicalizedParser 类应用方法的输出)
public static List<Tree> GetNounPhrases()
{
List<Tree> phraseList=new ArrayList<Tree>();
for (Tree subtree: parse)
{
if(subtree.label().value().equals("NP"))
{
phraseList.add(subtree);
System.out.println(subtree);
}
}
return phraseList;
}
回答by MARK
Try this linkas well. I am not sure whether the stanford pos tagger and the tagger available in the corenlp are the same or not but I found this link to be more useful.
也试试这个链接。我不确定 stanford pos tagger 和 corenlp 中可用的 tagger 是否相同,但我发现这个链接更有用。
After PoS Tagging you will have to detect patterns like this (Adjective | Noun)* (Noun Preposition)? (Adjective | Noun)* Noun
在 PoS 标记之后,您将必须检测这样的模式 (形容词 | 名词)*(名词介词)?(形容词 | 名词)* 名词
Try this linkfor some details on Noun phrase detection.
试试这个链接,了解名词短语检测的一些细节。
回答by zingler
You can use Stanford Core NLP for POS tagging. You could find a sample code at http://nlp.stanford.edu/software/corenlp.shtml#Usagewhich might be a good starting point for experimentation. You would need to give tokenize, split and pos as the properties. This outputs a list of tokens with their corresponding tags.
您可以使用斯坦福核心 NLP 进行词性标注。您可以在http://nlp.stanford.edu/software/corenlp.shtml#Usage找到示例代码,这可能是一个很好的实验起点。您需要将 tokenize、split 和 pos 作为属性。这将输出带有相应标签的令牌列表。
The entire tag list can be viwed at http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html. All the noun tags would start with NN. Performing this check would give you the required tokens.
可以在http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html 查看整个标签列表。所有名词标签都以 NN 开头。执行此检查将为您提供所需的令牌。