java 如何使用 Open nlp 的分块解析器提取名词短语

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14708047/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 17:07:08  来源:igfitidea点击:

How to extract the noun phrases using Open nlp's chunking parser

javanlpstanford-nlpopennlp

提问by user2024234

I am newbie to Natural Language processing.I need to extract the noun phrases from the text.So far i have used open nlp's chunking parser for parsing my text to get the Tree structure.But i am not able to extract the noun phrases from the tree structure, is there any regular expression pattern in open nlp so that i can use it to extract the noun phrases.

我是自然语言处理的新手。我需要从文本中提取名词短语。到目前为止,我已经使用 open nlp 的分块解析器来解析我的文本以获得树结构。但是我无法从文本中提取名词短语树结构,open nlp 中是否有任何正则表达式模式,以便我可以使用它来提取名词短语。

Below is the code that i am using

下面是我正在使用的代码

    InputStream is = new FileInputStream("en-parser-chunking.bin");
    ParserModel model = new ParserModel(is);
    Parser parser = ParserFactory.create(model);
    Parse topParses[] = ParserTool.parseLine(line, parser, 1);
        for (Parse p : topParses){
                 p.show();}

Here I am getting the output as

在这里,我得到的输出为

(TOP (S (S (ADJP (JJ welcome) (PP (TO to) (NP (NNP Big) (NNP Data.))))) (S (NP (PRP We)) (VP (VP (VBP are) (VP (VBG working) (PP (IN on) (NP (NNP Natural) (NNP Language) (NNP Processing.can))))) (NP (DT some) (CD one) (NN help)) (NP (PRP us)) (PP (IN in) (S (VP (VBG extracting) (NP (DT the) (NN noun) (NNS phrases)) (PP (IN from) (NP (DT the) (NN tree) (WP stucture.))))))))))

(TOP(S(S(ADJP(JJwelcome))(PP(TO)(NP(NNP Big)(NNP Data.))))(S(NP(PRP We))(VP(VP(VBP)) (VP (VBG working) (PP (IN on) (NP (NNP Natural) (NNP Language) (NNP Processing.can))))) (NP (DT some) (CD one) (NN help)) (NP ( PRP us)) (PP (IN in) (S (VP (VBG提取) (NP (DT the) (NN名词) (NNS短语)) (PP (IN from) (NP (DT the) (NN tree) ( WP 结构。)))))))))))

Can some one please help me in getting the noun phrases like NP,NNP,NN etc.Can some one tell me do I need to use any other NP Chunker to get the noun phrases?Is there any regex pattern to achieve the same.

有人可以帮助我获取名词短语,如 NP、NNP、NN 等。有人可以告诉我我是否需要使用任何其他 NP Chunker 来获取名词短语?是否有任何正则表达式模式可以实现相同的目标。

Please help me on this.

请帮我解决这个问题。

Thanks in advance

提前致谢

Gouse.

鹅。

采纳答案by icecream

The Parseobject is a tree; you can use getParent()and getChildren()and getType()to navigate the tree.

Parse对象是一个树; 你可以用getParent()getChildren()getType()导航树。

List<Parse> nounPhrases;

public void getNounPhrases(Parse p) {
    if (p.getType().equals("NP")) {
         nounPhrases.add(p);
    }
    for (Parse child : p.getChildren()) {
         getNounPhrases(child);
    }
}

回答by markgiaconia

if you only want noun phrases, then use the sentence chunker rather than the tree parser. the code is something like this (you need to get the model from the same place you got the parser model)

如果你只想要名词短语,那么使用句子块器而不是树解析器。代码是这样的(你需要从你得到解析器模型的同一个地方得到模型)

public void chunk() {
    InputStream modelIn = null;
    ChunkerModel model = null;

    try {
      modelIn = new FileInputStream("en-chunker.bin");
      model = new ChunkerModel(modelIn);
    }
    catch (IOException e) {
      // Model loading failed, handle the error
      e.printStackTrace();
    }
    finally {
      if (modelIn != null) {
        try {
          modelIn.close();
        }
        catch (IOException e) {
        }
      }
    }

//After the model is loaded a Chunker can be instantiated.


    ChunkerME chunker = new ChunkerME(model);



    String sent[] = new String[]{"Rockwell", "International", "Corp.", "'s",
      "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
      "extending", "its", "contract", "with", "Boeing", "Co.", "to",
      "provide", "structural", "parts", "for", "Boeing", "'s", "747",
      "jetliners", "."};

    String pos[] = new String[]{"NNP", "NNP", "NNP", "POS", "NNP", "NN",
      "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
      "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
      "."};

    String tag[] = chunker.chunk(sent, pos);
  }

then look at the tag array for the types you want

然后查看您想要的类型的标签数组

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.chunking.api

回答by Sinoy Siby

Will continue from your codeitself . This program block will provide all the noun phrases in sentence. Use getTagNodes()method to get Tokens and its types

将从您的代码本身继续。该程序块将提供句子中的所有名词短语。使用getTagNodes()方法获取令牌及其类型

Parse topParses[] = ParserTool.parseLine(line, parser, 1);
Parse words[]=null; //an array to store the tokens
//Loop thorugh to get the tag nodes
for (Parse nodes : topParses){
        words=nodes.getTagNodes(); // we will get a list of nodes
}

for(Parse word:words){
//Change the types according to your desired types
    if(word.getType().equals("NN") || word.getType().equals("NNP") || word.getType().equals("NNS")){
            System.out.println(word);
            }
        }