python的斯坦福nlp
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32879532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Stanford nlp for python
提问by 90abyss
All I want to do is find the sentiment (positive/negative/neutral) of any given string. On researching I came across Stanford NLP. But sadly its in Java. Any ideas on how can I make it work for python?
我想要做的就是找到任何给定字符串的情绪(正面/负面/中性)。在研究中,我遇到了斯坦福 NLP。但遗憾的是它在 Java 中。关于如何使其适用于 python 的任何想法?
采纳答案by sds
Use py-corenlp
用 py-corenlp
Download Stanford CoreNLP
下载斯坦福 CoreNLP
The latest version at this time (2020-05-25) is 4.0.0:
此时的最新版本(2020-05-25)是4.0.0:
wget https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar
If you do not have wget
, you probably have curl
:
curl https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip -O https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar -O
If all else fails, use the browser ;-)
如果一切都失败了,请使用浏览器 ;-)
Install the package
安装包
unzip stanford-corenlp-4.0.0.zip
mv stanford-corenlp-4.0.0-models-english.jar stanford-corenlp-4.0.0
Start the server
启动服务器
cd stanford-corenlp-4.0.0
java -mx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000
Notes:
笔记:
timeout
is in milliseconds, I set it to 10 sec above. You should increase it if you pass huge blobs to the server.- There are more options, you can list them with
--help
. -mx5g
should allocate enough memory, but YMMV and you may need to modify the option if your box is underpowered.
timeout
以毫秒为单位,我将其设置为 10 秒以上。如果您将巨大的 blob 传递给服务器,则应该增加它。- 还有更多选项,您可以用 列出它们
--help
。 -mx5g
应该分配足够的内存,但是 YMMV 并且如果您的盒子动力不足,您可能需要修改该选项。
Install the python package
安装python包
pip install pycorenlp
(See also the official list).
(另见官方列表)。
Use it
用它
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
res = nlp.annotate("I love you. I hate him. You are nice. He is dumb",
properties={
'annotators': 'sentiment',
'outputFormat': 'json',
'timeout': 1000,
})
for s in res["sentences"]:
print("%d: '%s': %s %s" % (
s["index"],
" ".join([t["word"] for t in s["tokens"]]),
s["sentimentValue"], s["sentiment"]))
and you will get:
你会得到:
0: 'I love you .': 3 Positive
1: 'I hate him .': 1 Negative
2: 'You are nice .': 3 Positive
3: 'He is dumb': 1 Negative
Notes
笔记
- You pass the whole text to the server and it splits it into sentences. It also splits sentences into tokens.
- The sentiment is ascribed to each sentence, not the whole text. The mean
sentimentValue
across sentences can be used to estimate the sentiment of the whole text. - The average sentiment of a sentence is between
Neutral
(2) andNegative
(1), the range is fromVeryNegative
(0) toVeryPositive
(4) which appear to be quite rare. - You can stop the servereither by typing Ctrl-Cat the terminal you started it from or using the shell command
kill $(lsof -ti tcp:9000)
.9000
is the default port, you can change it using the-port
option when starting the server. - Increase
timeout
(in milliseconds) in server or client if you get timeout errors. sentiment
is just oneannotator, there are many more, and you can request several, separating them by comma:'annotators': 'sentiment,lemma'
.- Beware that the sentiment model is somewhat idiosyncratic (e.g., the result is different depending on whether you mention David or Bill).
- 您将整个文本传递给服务器,它会将其拆分为句子。它还将句子拆分为标记。
- 情绪被归因于每个句子,而不是整个文本。句子间的均值
sentimentValue
可用于估计整个文本的情绪。 - 句子的平均情感在
Neutral
(2)和Negative
(1)之间,范围从VeryNegative
(0)到VeryPositive
(4),这似乎非常罕见。 - 您可以通过在启动它的终端上键入或使用 shell 命令来停止服务器。是默认端口,您可以在启动服务器时使用该选项更改它。Ctrl-C
kill $(lsof -ti tcp:9000)
9000
-port
timeout
如果出现超时错误,则增加(以毫秒为单位)服务器或客户端。sentiment
只是一个注释器,还有更多,您可以请求多个,用逗号分隔它们:'annotators': 'sentiment,lemma'
。- 请注意,情绪模型有些特殊(例如,根据您提到 David 还是 Bill,结果会有所不同)。
PS. I cannot believe that I added a 9thanswer, but, I guess, I had to, since none of the existing answers helped me (some of the 8 previous answers have now been deleted, some others have been converted to comments).
附注。我不敢相信我添加了第 9 个答案,但是,我想,我必须这样做,因为现有的答案都没有帮助我(之前 8 个答案中的一些现在已被删除,其他一些已转换为评论)。
回答by cutteeth
Textblob
is a great package for sentimental analysis written in Python
. You can have the docs here. Sentimental analysis of any given sentence is carried out by inspecting words and their corresponding emotional score (sentiment). You can start with
Textblob
是一个很好的情感分析包Python
。您可以在此处获取文档。对任何给定句子的情感分析是通过检查单词及其相应的情感分数(情感)来进行的。你可以从
$ pip install -U textblob
$ python -m textblob.download_corpora
First pip install command will give you latest version of textblob installed in your (virtualenv
) system since you pass -U will upgrade the pip package its latest available version
. And the next will download all the data required, thecorpus
.
第一个 pip install 命令将为您提供virtualenv
自您通过-U will upgrade the pip package its latest available version
. 接下来将下载所需的所有数据,corpus
.
回答by Hao Lyu
I also faced similar situation. Most of my projects are in Python and sentiment part is Java. Luckily it's quite easy to lean how to use the stanford CoreNLP jar.
我也遇到过类似的情况。我的大部分项目都是用 Python 编写的,而情感部分是用 Java 编写的。幸运的是,了解如何使用 stanford CoreNLP jar 很容易。
Here is one of my scripts and you can download jars and run it.
这是我的脚本之一,您可以下载 jars 并运行它。
import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.ArrayCoreMap;
import edu.stanford.nlp.util.CoreMap;
public class Simple_NLP {
static StanfordCoreNLP pipeline;
public static void init() {
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
pipeline = new StanfordCoreNLP(props);
}
public static String findSentiment(String tweet) {
String SentiReturn = "";
String[] SentiClass ={"very negative", "negative", "neutral", "positive", "very positive"};
//Sentiment is an integer, ranging from 0 to 4.
//0 is very negative, 1 negative, 2 neutral, 3 positive and 4 very positive.
int sentiment = 2;
if (tweet != null && tweet.length() > 0) {
Annotation annotation = pipeline.process(tweet);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
if (sentences != null && sentences.size() > 0) {
ArrayCoreMap sentence = (ArrayCoreMap) sentences.get(0);
Tree tree = sentence.get(SentimentAnnotatedTree.class);
sentiment = RNNCoreAnnotations.getPredictedClass(tree);
SentiReturn = SentiClass[sentiment];
}
}
return SentiReturn;
}
}
回答by Arnaud
I am facing the same problem : maybe a solution with stanford_corenlp_pythat uses Py4j
as pointed out by @roopalgarg.
我面临着同样的问题:也许是stanford_corenlp_py的解决方案,它使用Py4j
了@roopalgarg 所指出的。
stanford_corenlp_py
This repo provides a Python interface for calling the "sentiment" and "entitymentions" annotators of Stanford's CoreNLP Java package, current as of v. 3.5.1. It uses py4j to interact with the JVM; as such, in order to run a script like scripts/runGateway.py, you must first compile and run the Java classes creating the JVM gateway.
stanford_corenlp_py
这个 repo 提供了一个 Python 接口,用于调用斯坦福 CoreNLP Java 包的“情绪”和“实体”注释器,当前版本为 3.5.1。它使用py4j与JVM交互;因此,为了运行像 scripts/runGateway.py 这样的脚本,您必须首先编译并运行创建 JVM 网关的 Java 类。
回答by adam shamsudeen
Use stanfordcore-nlp python library
使用 stanfordcore-nlp python 库
stanford-corenlp is a really good wrapper on top of the stanfordcore-nlp to use it in python.
stanford-corenlp 是 stanfordcore-nlp 之上的一个非常好的包装器,可以在 python 中使用它。
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip
Usage
用法
# Simple usage
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('/Users/name/stanford-corenlp-full-2018-10-05')
sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print('Tokenize:', nlp.word_tokenize(sentence))
print('Part of Speech:', nlp.pos_tag(sentence))
print('Named Entities:', nlp.ner(sentence))
print('Constituency Parsing:', nlp.parse(sentence))
print('Dependency Parsing:', nlp.dependency_parse(sentence))
nlp.close() # Do not forget to close! The backend server will consume a lot memory.
回答by Géo Jolly
I would suggest using the TextBlob library. A sample implementation goes like this:
我建议使用 TextBlob 库。一个示例实现是这样的:
from textblob import TextBlob
def sentiment(message):
# create TextBlob object of passed tweet text
analysis = TextBlob(message)
# set sentiment
return (analysis.sentiment.polarity)
回答by zwlayer
There is a very new progress on this issue:
这个问题有一个非常新的进展:
Now you can use stanfordnlp
package inside the python:
现在你可以stanfordnlp
在 python 中使用包:
From the README:
从自述文件:
>>> import stanfordnlp
>>> stanfordnlp.download('en') # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()
回答by Aleksander Pohl
Native Python implementation of NLP tools from Stanford
斯坦福 NLP 工具的原生 Python 实现
Recently Stanford has released a new Python packagedimplementing neural network (NN) based algorithms for the most important NLP tasks:
最近,斯坦福大学发布了一个新的Python 打包实现基于神经网络 (NN) 的算法,用于最重要的 NLP 任务:
- tokenization
- multi-word token (MWT) expansion
- lemmatization
- part-of-speech (POS) and morphological features tagging
- dependency parsing
- 标记化
- 多字令牌 (MWT) 扩展
- 词形还原
- 词性 (POS) 和形态特征标记
- 依赖解析
It is implemented in Python and uses PyTorch as the NN library. The package contains accurate models for more than 50 languages.
它是用 Python 实现的,并使用 PyTorch 作为 NN 库。该软件包包含适用于50 多种语言的准确模型。
To install you can use PIP:
要安装,您可以使用 PIP:
pip install stanfordnlp
To perform basic tasks you can use native Python interface with many NLP algorithms:
要执行基本任务,您可以将本机 Python 接口与许多 NLP 算法结合使用:
import stanfordnlp
stanfordnlp.download('en') # This downloads the English models for the neural pipeline
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
doc.sentences[0].print_dependencies()
EDIT:
编辑:
So far, the library does not support sentiment analysis, yet I'm not deleting the answer, since it directly answers the "Stanford nlp for python" part of the question.
到目前为止,该库不支持情绪分析,但我不会删除答案,因为它直接回答了问题的“Stanford nlp for python”部分。
回答by Syauqi Haris
Right now they have STANZA.
现在他们有 STANZA。
https://stanfordnlp.github.io/stanza/
https://stanfordnlp.github.io/stanza/
Release HistoryNote that prior to version 1.0.0, the Stanza library was named as “StanfordNLP”. To install historical versions prior to to v1.0.0, you'll need to run pip install stanfordnlp.
发布历史请注意,在 1.0.0 版之前,Stanza 库被命名为“StanfordNLP”。要安装 v1.0.0 之前的历史版本,您需要运行 pip install stanfordnlp。
So, it confirms that Stanza is the full python version of stanford NLP.
因此,它确认 Stanza 是 stanford NLP 的完整 Python 版本。