java 使用斯坦福类型解析器从文本文件中提取名词短语

Question

提问by S Gaber

I have a text which I want to extract the noun phrases from it. I can easily get the typed parser for the text that i have, but wondering how i can extract the noun phrases in the text ?

我有一个文本，我想从中提取名词短语。我可以轻松获得文本的类型解析器，但想知道如何提取文本中的名词短语？

Answer 1

回答by alan turing

You can extract noun phrases from Tree by using following code. It assumes you have parsed sentence stored in parse(i.e. parse is output of LexicalizedParser class apply method)

您可以使用以下代码从 Tree 中提取名词短语。它假设您已解析存储在parse 中的句子（即 parse 是 LexicalizedParser 类应用方法的输出）

public static List<Tree> GetNounPhrases()
{

    List<Tree> phraseList=new ArrayList<Tree>();
    for (Tree subtree: parse)
    {

      if(subtree.label().value().equals("NP"))
      {

        phraseList.add(subtree);
        System.out.println(subtree);

      }
    }

      return phraseList;

}

Answer 2

回答by MARK

Try this linkas well. I am not sure whether the stanford pos tagger and the tagger available in the corenlp are the same or not but I found this link to be more useful.

也试试这个链接。我不确定 stanford pos tagger 和 corenlp 中可用的 tagger 是否相同，但我发现这个链接更有用。

After PoS Tagging you will have to detect patterns like this (Adjective | Noun)* (Noun Preposition)? (Adjective | Noun)* Noun

在 PoS 标记之后，您将必须检测这样的模式 （形容词 | 名词）*（名词介词）？（形容词 | 名词）* 名词

Try this linkfor some details on Noun phrase detection.

试试这个链接，了解名词短语检测的一些细节。

Answer 3

回答by zingler

You can use Stanford Core NLP for POS tagging. You could find a sample code at http://nlp.stanford.edu/software/corenlp.shtml#Usagewhich might be a good starting point for experimentation. You would need to give tokenize, split and pos as the properties. This outputs a list of tokens with their corresponding tags.

您可以使用斯坦福核心 NLP 进行词性标注。您可以在http://nlp.stanford.edu/software/corenlp.shtml#Usage找到示例代码，这可能是一个很好的实验起点。您需要将 tokenize、split 和 pos 作为属性。这将输出带有相应标签的令牌列表。

The entire tag list can be viwed at http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html. All the noun tags would start with NN. Performing this check would give you the required tokens.

可以在http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html 查看整个标签列表。所有名词标签都以 NN 开头。执行此检查将为您提供所需的令牌。

java 使用斯坦福类型解析器从文本文件中提取名词短语

提问by S Gaber

回答by alan turing

回答by MARK

回答by zingler

相关推荐

最近更新

标签

java 使用斯坦福类型解析器从文本文件中提取名词短语

提问by S Gaber

回答by alan turing

回答by MARK

回答by zingler

相关推荐

java JVM 崩溃时，Sun JDK 可以生成核心/堆转储文件吗？

java Maven 插件安装：安装文件错误

如何在我的代码中使用 Base64.java 文件？

java 连接mysql服务器失败

相关推荐

最近更新

标签