java 如何使用 Open nlp 的分块解析器提取名词短语
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14708047/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract the noun phrases using Open nlp's chunking parser
提问by user2024234
I am newbie to Natural Language processing.I need to extract the noun phrases from the text.So far i have used open nlp's chunking parser for parsing my text to get the Tree structure.But i am not able to extract the noun phrases from the tree structure, is there any regular expression pattern in open nlp so that i can use it to extract the noun phrases.
我是自然语言处理的新手。我需要从文本中提取名词短语。到目前为止,我已经使用 open nlp 的分块解析器来解析我的文本以获得树结构。但是我无法从文本中提取名词短语树结构,open nlp 中是否有任何正则表达式模式,以便我可以使用它来提取名词短语。
Below is the code that i am using
下面是我正在使用的代码
InputStream is = new FileInputStream("en-parser-chunking.bin");
ParserModel model = new ParserModel(is);
Parser parser = ParserFactory.create(model);
Parse topParses[] = ParserTool.parseLine(line, parser, 1);
for (Parse p : topParses){
p.show();}
Here I am getting the output as
在这里,我得到的输出为
(TOP (S (S (ADJP (JJ welcome) (PP (TO to) (NP (NNP Big) (NNP Data.))))) (S (NP (PRP We)) (VP (VP (VBP are) (VP (VBG working) (PP (IN on) (NP (NNP Natural) (NNP Language) (NNP Processing.can))))) (NP (DT some) (CD one) (NN help)) (NP (PRP us)) (PP (IN in) (S (VP (VBG extracting) (NP (DT the) (NN noun) (NNS phrases)) (PP (IN from) (NP (DT the) (NN tree) (WP stucture.))))))))))
(TOP(S(S(ADJP(JJwelcome))(PP(TO)(NP(NNP Big)(NNP Data.))))(S(NP(PRP We))(VP(VP(VBP)) (VP (VBG working) (PP (IN on) (NP (NNP Natural) (NNP Language) (NNP Processing.can))))) (NP (DT some) (CD one) (NN help)) (NP ( PRP us)) (PP (IN in) (S (VP (VBG提取) (NP (DT the) (NN名词) (NNS短语)) (PP (IN from) (NP (DT the) (NN tree) ( WP 结构。)))))))))))
Can some one please help me in getting the noun phrases like NP,NNP,NN etc.Can some one tell me do I need to use any other NP Chunker to get the noun phrases?Is there any regex pattern to achieve the same.
有人可以帮助我获取名词短语,如 NP、NNP、NN 等。有人可以告诉我我是否需要使用任何其他 NP Chunker 来获取名词短语?是否有任何正则表达式模式可以实现相同的目标。
Please help me on this.
请帮我解决这个问题。
Thanks in advance
提前致谢
Gouse.
鹅。
采纳答案by icecream
The Parse
object is a tree; you can use getParent()
and getChildren()
and getType()
to navigate the tree.
该Parse
对象是一个树; 你可以用getParent()
与getChildren()
和getType()
导航树。
List<Parse> nounPhrases;
public void getNounPhrases(Parse p) {
if (p.getType().equals("NP")) {
nounPhrases.add(p);
}
for (Parse child : p.getChildren()) {
getNounPhrases(child);
}
}
回答by markgiaconia
if you only want noun phrases, then use the sentence chunker rather than the tree parser. the code is something like this (you need to get the model from the same place you got the parser model)
如果你只想要名词短语,那么使用句子块器而不是树解析器。代码是这样的(你需要从你得到解析器模型的同一个地方得到模型)
public void chunk() {
InputStream modelIn = null;
ChunkerModel model = null;
try {
modelIn = new FileInputStream("en-chunker.bin");
model = new ChunkerModel(modelIn);
}
catch (IOException e) {
// Model loading failed, handle the error
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
//After the model is loaded a Chunker can be instantiated.
ChunkerME chunker = new ChunkerME(model);
String sent[] = new String[]{"Rockwell", "International", "Corp.", "'s",
"Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
"extending", "its", "contract", "with", "Boeing", "Co.", "to",
"provide", "structural", "parts", "for", "Boeing", "'s", "747",
"jetliners", "."};
String pos[] = new String[]{"NNP", "NNP", "NNP", "POS", "NNP", "NN",
"VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
"NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
"."};
String tag[] = chunker.chunk(sent, pos);
}
then look at the tag array for the types you want
然后查看您想要的类型的标签数组
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.chunking.api
回答by Sinoy Siby
Will continue from your codeitself . This program block will provide all the noun phrases in sentence. Use getTagNodes()method to get Tokens and its types
将从您的代码本身继续。该程序块将提供句子中的所有名词短语。使用getTagNodes()方法获取令牌及其类型
Parse topParses[] = ParserTool.parseLine(line, parser, 1);
Parse words[]=null; //an array to store the tokens
//Loop thorugh to get the tag nodes
for (Parse nodes : topParses){
words=nodes.getTagNodes(); // we will get a list of nodes
}
for(Parse word:words){
//Change the types according to your desired types
if(word.getType().equals("NN") || word.getType().equals("NNP") || word.getType().equals("NNS")){
System.out.println(word);
}
}