Python 斯坦福解析器和 NLTK

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13883277/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:50:02  来源:igfitidea点击:

Stanford Parser and NLTK

pythonparsingnlpnltkstanford-nlp

提问by ThanaDaray

Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)

是否可以在 NLTK 中使用斯坦福解析器?(我不是在谈论斯坦福 POS。)

回答by bob dope

If I remember well, the Stanford parser is a java library, therefore you must have a Java interpreter running on your server/computer.

如果我没记错的话,Stanford 解析器是一个 Java 库,因此您必须在您的服务器/计算机上运行一个 Java 解释器。

I used it once a server, combined with a php script. The script used php's exec() function to make a command-line call to the parser like so:

我曾经在服务器上使用过它,并结合了一个 php 脚本。该脚本使用 php 的 exec() 函数对解析器进行命令行调用,如下所示:

<?php

exec( "java -cp /pathTo/stanford-parser.jar -mx100m edu.stanford.nlp.process.DocumentPreprocessor /pathTo/fileToParse > /pathTo/resultFile 2>/dev/null" );

?>

I don't remember all the details of this command, it basically opened the fileToParse, parsed it, and wrote the output in the resultFile. PHP would then open the result file for further use.

我不记得这个命令的所有细节,它基本上是打开fileToParse,解析它,并将输出写入resultFile。PHP 然后将打开结果文件以供进一步使用。

The end of the command directs the parser's verbose to NULL, to prevent unnecessary command line information from disturbing the script.

命令的结尾将解析器的详细信息定向为 NULL,以防止不必要的命令行信息干扰脚本。

I don't know much about Python, but there might be a way to make command line calls.

我对 Python 了解不多,但可能有一种方法可以进行命令行调用。

It might not be the exact route you were hoping for, but hopefully it'll give you some inspiration. Best of luck.

这可能不是您希望的确切路线,但希望它会给您一些启发。祝你好运。

回答by Rohith

There is python interface for stanford parser

stanford解析器有python接口

http://projects.csail.mit.edu/spatial/Stanford_Parser

http://projects.csail.mit.edu/spatial/Stanford_Parser

回答by alvas

Deprecated Answer

弃用的答案

The answer below is deprecated, please use the solution on https://stackoverflow.com/a/51981566/610569for NLTK v3.3 and above.

以下答案已弃用,请使用https://stackoverflow.com/a/51981566/610569上的解决方案,用于 NLTK v3.3 及更高版本。



Edited

已编辑

As of the current Stanford parser (2015-04-20), the default output for the lexparser.shhas changed so the script below will not work.

从当前的斯坦福解析器 (2015-04-20) 开始, 的默认输出lexparser.sh已更改,因此下面的脚本将无法运行。

But this answer is kept for legacy sake, it will still work with http://nlp.stanford.edu/software/stanford-parser-2012-11-12.zipthough.

但是这个答案是为了遗留问题而保留的,它仍然适用于http://nlp.stanford.edu/software/stanford-parser-2012-11-12.zip



Original Answer

原答案

I suggest you don't mess with Jython, JPype. Let python do python stuff and let java do java stuff, get the Stanford Parser output through the console.

我建议你不要乱用 Jython、JPype。让python做python的东西,让java做java的东西,通过控制台得到Stanford Parser输出。

After you've installed the Stanford Parserin your home directory ~/, just use this python recipe to get the flat bracketed parse:

在您的主目录中安装Stanford Parser~/,只需使用此 python 配方即可获得平括号解析:

import os
sentence = "this is a foo bar i want to parse."

os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt")
parser_out = os.popen("~/stanford-parser-2012-11-12/lexparser.sh ~/stanfordtemp.txt").readlines()

bracketed_parse = " ".join( [i.strip() for i in parser_out if i.strip()[0] == "("] )
print bracketed_parse

回答by Sadik

You can use the Stanford Parsers output to create a Tree in nltk (nltk.tree.Tree).

您可以使用斯坦福解析器输出在 nltk (nltk.tree.Tree) 中创建树。

Assuming the stanford parser gives you a file in which there is exactly one parse tree for every sentence. Then this example works, though it might not look very pythonic:

假设斯坦福解析器为您提供了一个文件,其中每个句子只有一个解析树。然后这个例子就可以工作了,虽然它可能看起来不是很 Pythonic:

f = open(sys.argv[1]+".output"+".30"+".stp", "r")
parse_trees_text=[]
tree = ""
for line in f:
  if line.isspace():
    parse_trees_text.append(tree)
tree = ""
  elif "(. ...))" in line:
#print "YES"
tree = tree+')'
parse_trees_text.append(tree)
tree = ""
  else:
tree = tree + line

parse_trees=[]
for t in parse_trees_text:
  tree = nltk.Tree(t)
  tree.__delitem__(len(tree)-1) #delete "(. .))" from tree (you don't need that)
  s = traverse(tree)
  parse_trees.append(tree)

回答by silverasm

The Stanford Core NLP software page has a list of python wrappers:

斯坦福核心 NLP 软件页面有一个 Python 包装器列表:

http://nlp.stanford.edu/software/corenlp.shtml#Extensions

http://nlp.stanford.edu/software/corenlp.shtml#Extensions

回答by danger89

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

请注意,此答案适用于 NLTK v 3.0,而不适用于更新的版本。

Sure, try the following in Python:

当然,在 Python 中尝试以下操作:

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = '/path/to/standford/jars'
os.environ['STANFORD_MODELS'] = '/path/to/standford/jars'

parser = stanford.StanfordParser(model_path="/location/of/the/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences

# GUI
for line in sentences:
    for sentence in line:
        sentence.draw()

Output:

输出:

[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]

[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', [ 'is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree(' ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is') ]), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['? '])])])]

Note 1:In this example both the parser & model jars are in the same folder.

注 1:在此示例中,解析器和模型 jar 位于同一文件夹中。

Note 2:

笔记2:

  • File name of stanford parser is: stanford-parser.jar
  • File name of stanford models is: stanford-parser-x.x.x-models.jar
  • 斯坦福解析器的文件名是:stanford-parser.jar
  • 斯坦福模型的文件名是:stanford-parser-xxx-models.jar

Note 3:The englishPCFG.ser.gz file can be found insidethe models.jar file (/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz). Please use come archive manager to 'unzip' the models.jar file.

注3:本englishPCFG.ser.gz文件,可以发现里面的models.jar文件(/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz)。请使用来存档管理器来“解压缩”models.jar 文件。

Note 4:Be sure you are using Java JRE (Runtime Environment) 1.8also known as Oracle JDK 8. Otherwise you will get: Unsupported major.minor version 52.0.

注意 4:请确保您使用的是 Java JRE(运行时环境)1.8,也称为 Oracle JDK 8。否则您将获得:不支持的major.minor 版本 52.0。

Installation

安装

  1. Download NLTK v3 from: https://github.com/nltk/nltk. And install NLTK:

    sudo python setup.py install

  2. You can use the NLTK downloader to get Stanford Parser, using Python:

    import nltk
    nltk.download()
    
  3. Try my example! (don't forget the change the jar paths and change the model path to the ser.gz location)

  1. https://github.com/nltk/nltk下载 NLTK v3 。并安装 NLTK:

    须藤 python setup.py 安装

  2. 您可以使用 NLTK 下载器获取斯坦福解析器,使用 Python:

    import nltk
    nltk.download()
    
  3. 试试我的例子!(不要忘记更改 jar 路径并将模型路径更改为 ser.gz 位置)

OR:

或者:

  1. Download and install NLTK v3, same as above.

  2. Download the latest version from (current versionfilename is stanford-parser-full-2015-01-29.zip): http://nlp.stanford.edu/software/lex-parser.shtml#Download

  3. Extract the standford-parser-full-20xx-xx-xx.zip.

  4. Create a new folder ('jars' in my example). Place the extracted files into this jar folder: stanford-parser-3.x.x-models.jar and stanford-parser.jar.

    As shown above you can use the environment variables (STANFORD_PARSER & STANFORD_MODELS) to point to this 'jars' folder. I'm using Linux, so if you use Windows please use something like: C://folder//jars.

  5. Open the stanford-parser-3.x.x-models.jar using an Archive manager (7zip).

  6. Browse inside the jar file; edu/stanford/nlp/models/lexparser. Again, extract the file called 'englishPCFG.ser.gz'. Remember the location where you extract this ser.gz file.

  7. When creating a StanfordParser instance, you can provide the model path as parameter. This is the complete path to the model, in our case /location/of/englishPCFG.ser.gz.

  8. Try my example! (don't forget the change the jar paths and change the model path to the ser.gz location)

  1. 下载并安装NLTK v3,同上。

  2. 下载最新版本(当前版本文件名为 stanford-parser-full-2015-01-29.zip):http: //nlp.stanford.edu/software/lex-parser.shtml#Download

  3. 提取standford-parser-full-20xx-xx-xx.zip。

  4. 创建一个新文件夹(在我的示例中为“jars”)。将提取的文件放入这个 jar 文件夹:stanford-parser-3.xx-models.jar 和 stanford-parser.jar。

    如上所示,您可以使用环境变量 (STANFORD_PARSER & STANFORD_MODELS) 指向这个“jars”文件夹。我使用的是 Linux,所以如果您使用 Windows,请使用类似:C://folder//jars 的内容。

  5. 使用存档管理器 (7zip) 打开 stanford-parser-3.xx-models.jar。

  6. 浏览 jar 文件;edu/stanford/nlp/models/lexparser。再次提取名为“englishPCFG.ser.gz”的文件。记住您提取此 ser.gz 文件的位置。

  7. 创建 StanfordParser 实例时,您可以提供模型路径作为参数。这是模型的完整路径,在我们的例子中是 /location/of/englishPCFG.ser.gz。

  8. 试试我的例子!(不要忘记更改 jar 路径并将模型路径更改为 ser.gz 位置)

回答by Avery Andrews

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

请注意,此答案适用于 NLTK v 3.0,而不适用于更新的版本。

Here is an adaptation of danger98's code that works with nltk3.0.0 on windoze, and presumably the other platforms as well, adjust directory names as appropriate for your setup:

以下是根据在windoze 上与nltk3.0.0 一起使用的danger98 代码的改编版本,大概其他平台也一样,请根据您的设置调整目录名称:

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = 'd:/stanford-parser'
os.environ['STANFORD_MODELS'] = 'd:/stanford-parser'
os.environ['JAVAHOME'] = 'c:/Program Files/java/jre7/bin'

parser = stanford.StanfordParser(model_path="d:/stanford-grammars/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences

Note that the parsing command has changed (see the source code at www.nltk.org/_modules/nltk/parse/stanford.html), and that you need to define the JAVAHOME variable. I tried to get it to read the grammar file in situ in the jar, but have so far failed to do that.

请注意,解析命令已更改(请参阅 www.nltk.org/_modules/nltk/parse/stanford.html 上的源代码),并且您需要定义 JAVAHOME 变量。我试图让它在 jar 中原位读取语法文件,但到目前为止还没有做到这一点。

回答by redreamality

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

请注意,此答案适用于 NLTK v 3.0,而不适用于更新的版本。

Here is the windows version of alvas's answer

这是 alvas 答案的 Windows 版本

sentences = ('. '.join(['this is sentence one without a period','this is another foo bar sentence '])+'.').encode('ascii',errors = 'ignore')
catpath =r"YOUR CURRENT FILE PATH"

f = open('stanfordtemp.txt','w')
f.write(sentences)
f.close()

parse_out = os.popen(catpath+r"\nlp_tools\stanford-parser-2010-08-20\lexparser.bat "+catpath+r"\stanfordtemp.txt").readlines()

bracketed_parse = " ".join( [i.strip() for i in parse_out if i.strip() if i.strip()[0] == "("] )
bracketed_parse = "\n(ROOT".join(bracketed_parse.split(" (ROOT")).split('\n')
aa = map(lambda x :ParentedTree.fromstring(x),bracketed_parse)

NOTES:

笔记:

  • In lexparser.batyou need to change all the paths into absolute path to avoid java errors such as "class not found"

  • I strongly recommend you to apply this method under windows since I Tried several answers on the page and all the methods communicates python with Java fails.

  • wish to hear from you if you succeed on windows and wish you can tell me how you overcome all these problems.

  • search python wrapper for stanford coreNLP to get the python version

  • lexparser.bat您需要将所有路径更改为绝对路径以避免诸如“找不到类”之类的java错误

  • 我强烈建议您在 windows 下应用此方法,因为我在页面上尝试了几个答案,并且所有方法都将 python 与 Java 通信失败。

  • 如果您在 Windows 上取得成功,希望收到您的来信,并希望您能告诉我您是如何克服所有这些问题的。

  • 搜索 stanford coreNLP 的 python 包装器以获取 python 版本



回答by Ted Petrou

I am on a windows machine and you can simply run the parser normally as you do from the command like but as in a different directory so you don't need to edit the lexparser.bat file. Just put in the full path.

我在 Windows 机器上,您可以像从命令一样正常运行解析器,但就像在不同的目录中一样,因此您无需编辑 lexparser.bat 文件。只需输入完整路径即可。

cmd = r'java -cp \Documents\stanford_nlp\stanford-parser-full-2015-01-30 edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "typedDependencies" \Documents\stanford_nlp\stanford-parser-full-2015-01-30\stanford-parser-3.5.1-models\edu\stanford\nlp\models\lexparser\englishFactored.ser.gz stanfordtemp.txt'
parse_out = os.popen(cmd).readlines()

The tricky part for me was realizing how to run a java program from a different path. There must be a better way but this works.

对我来说棘手的部分是意识到如何从不同的路径运行 Java 程序。必须有更好的方法,但这是有效的。

回答by SYK

Note that this answer applies to NLTK v 3.0, and not to more recent versions.

请注意,此答案适用于 NLTK v 3.0,而不适用于更新的版本。

A slight update (or simply alternative) on danger89's comprehensive answer on using Stanford Parser in NLTK and Python

关于在 NLTK 和 Python 中使用斯坦福解析器的 Danger89 综合答案的轻微更新(或简单的替代)

With stanford-parser-full-2015-04-20, JRE 1.8 and nltk 3.0.4 (python 2.7.6), it appears that you no longer need to extract the englishPCFG.ser.gz from stanford-parser-x.x.x-models.jar or setting up any os.environ

使用 stanford-parser-full-2015-04-20、JRE 1.8 和 nltk 3.0.4 (python 2.7.6),您似乎不再需要从 stanford-parser-xxx-models 中提取 englishPCFG.ser.gz .jar 或设置任何 os.environ

from nltk.parse.stanford import StanfordParser

english_parser = StanfordParser('path/stanford-parser.jar', 'path/stanford-parser-3.5.2-models.jar')

s = "The real voyage of discovery consists not in seeking new landscapes, but in having new eyes."

sentences = english_parser.raw_parse_sents((s,))
print sentences #only print <listiterator object> for this version

#draw the tree
for line in sentences:
    for sentence in line:
        sentence.draw()