python的斯坦福nlp

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32879532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:25:39  来源:igfitidea点击:

Stanford nlp for python

pythonstanford-nlpsentiment-analysis

提问by 90abyss

All I want to do is find the sentiment (positive/negative/neutral) of any given string. On researching I came across Stanford NLP. But sadly its in Java. Any ideas on how can I make it work for python?

我想要做的就是找到任何给定字符串的情绪(正面/负面/中性)。在研究中,我遇到了斯坦福 NLP。但遗憾的是它在 Java 中。关于如何使其适用于 python 的任何想法?

采纳答案by sds

Use py-corenlp

py-corenlp

Download Stanford CoreNLP

下载斯坦福 CoreNLP

The latest version at this time (2020-05-25) is 4.0.0:

此时的最新版本(2020-05-25)是4.0.0:

wget https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar

If you do not have wget, you probably have curl:

如果你没有wget,你可能有curl

curl https://nlp.stanford.edu/software/stanford-corenlp-4.0.0.zip -O https://nlp.stanford.edu/software/stanford-corenlp-4.0.0-models-english.jar -O

If all else fails, use the browser ;-)

如果一切都失败了,请使用浏览器 ;-)

Install the package

安装包

unzip stanford-corenlp-4.0.0.zip
mv stanford-corenlp-4.0.0-models-english.jar stanford-corenlp-4.0.0

Start the server

启动服务器

cd stanford-corenlp-4.0.0
java -mx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000

Notes:

笔记:

  1. timeoutis in milliseconds, I set it to 10 sec above. You should increase it if you pass huge blobs to the server.
  2. There are more options, you can list them with --help.
  3. -mx5gshould allocate enough memory, but YMMV and you may need to modify the option if your box is underpowered.
  1. timeout以毫秒为单位,我将其设置为 10 秒以上。如果您将巨大的 blob 传递给服务器,则应该增加它。
  2. 还有更多选项,您可以用 列出它们--help
  3. -mx5g应该分配足够的内存,但是 YMMV 并且如果您的盒子动力不足,您可能需要修改该选项。

Install the python package

安装python包

pip install pycorenlp

(See also the official list).

(另见官方列表)。

Use it

用它

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')
res = nlp.annotate("I love you. I hate him. You are nice. He is dumb",
                   properties={
                       'annotators': 'sentiment',
                       'outputFormat': 'json',
                       'timeout': 1000,
                   })
for s in res["sentences"]:
    print("%d: '%s': %s %s" % (
        s["index"],
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

and you will get:

你会得到:

0: 'I love you .': 3 Positive
1: 'I hate him .': 1 Negative
2: 'You are nice .': 3 Positive
3: 'He is dumb': 1 Negative

Notes

笔记

  1. You pass the whole text to the server and it splits it into sentences. It also splits sentences into tokens.
  2. The sentiment is ascribed to each sentence, not the whole text. The meansentimentValueacross sentences can be used to estimate the sentiment of the whole text.
  3. The average sentiment of a sentence is between Neutral(2) and Negative(1), the range is from VeryNegative(0) to VeryPositive(4) which appear to be quite rare.
  4. You can stop the servereither by typing Ctrl-Cat the terminal you started it from or using the shell command kill $(lsof -ti tcp:9000). 9000is the default port, you can change it using the -portoption when starting the server.
  5. Increase timeout(in milliseconds) in server or client if you get timeout errors.
  6. sentimentis just oneannotator, there are many more, and you can request several, separating them by comma: 'annotators': 'sentiment,lemma'.
  7. Beware that the sentiment model is somewhat idiosyncratic (e.g., the result is different depending on whether you mention David or Bill).
  1. 您将整个文本传递给服务器,它会将其拆分为句子。它还将句子拆分为标记。
  2. 情绪被归因于每个句子,而不是整个文本。句子间的均值sentimentValue可用于估计整个文本的情绪。
  3. 句子的平均情感在Neutral(2)和Negative(1)之间,范围从VeryNegative(0)到VeryPositive(4),这似乎非常罕见。
  4. 您可以通过在启动它的终端上键入或使用 shell 命令来停止服务器。是默认端口,您可以在启动服务器时使用该选项更改它。Ctrl-Ckill $(lsof -ti tcp:9000)9000-port
  5. timeout如果出现超时错误,则增加(以毫秒为单位)服务器或客户端。
  6. sentiment只是一个注释器,还有更多,您可以请求多个,用逗号分隔它们:'annotators': 'sentiment,lemma'
  7. 请注意,情绪模型有些特殊(例如,根据您提到 David 还是 Bill结果会有所不同)。

PS. I cannot believe that I added a 9thanswer, but, I guess, I had to, since none of the existing answers helped me (some of the 8 previous answers have now been deleted, some others have been converted to comments).

附注。我不敢相信我添加了第 9 个答案,但是,我想,我必须这样做,因为现有的答案都没有帮助我(之前 8 个答案中的一些现在已被删除,其他一些已转换为评论)。

回答by cutteeth

Textblobis a great package for sentimental analysis written in Python. You can have the docs here. Sentimental analysis of any given sentence is carried out by inspecting words and their corresponding emotional score (sentiment). You can start with

Textblob是一个很好的情感分析包Python。您可以在此处获取文档。对任何给定句子的情感分析是通过检查单词及其相应的情感分数(情感)来进行的。你可以从

$ pip install -U textblob
$ python -m textblob.download_corpora

First pip install command will give you latest version of textblob installed in your (virtualenv) system since you pass -U will upgrade the pip package its latest available version. And the next will download all the data required, thecorpus.

第一个 pip install 命令将为您提供virtualenv自您通过-U will upgrade the pip package its latest available version. 接下来将下载所需的所有数据,corpus.

回答by Hao Lyu

I also faced similar situation. Most of my projects are in Python and sentiment part is Java. Luckily it's quite easy to lean how to use the stanford CoreNLP jar.

我也遇到过类似的情况。我的大部分项目都是用 Python 编写的,而情感部分是用 Java 编写的。幸运的是,了解如何使用 stanford CoreNLP jar 很容易。

Here is one of my scripts and you can download jars and run it.

这是我的脚本之一,您可以下载 jars 并运行它。

import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.ArrayCoreMap;
import edu.stanford.nlp.util.CoreMap;

public class Simple_NLP {
static StanfordCoreNLP pipeline;

    public static void init() {
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
        pipeline = new StanfordCoreNLP(props);
    }

    public static String findSentiment(String tweet) {
        String SentiReturn = "";
        String[] SentiClass ={"very negative", "negative", "neutral", "positive", "very positive"};

        //Sentiment is an integer, ranging from 0 to 4. 
        //0 is very negative, 1 negative, 2 neutral, 3 positive and 4 very positive.
        int sentiment = 2;

        if (tweet != null && tweet.length() > 0) {
            Annotation annotation = pipeline.process(tweet);

            List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
            if (sentences != null && sentences.size() > 0) {

                ArrayCoreMap sentence = (ArrayCoreMap) sentences.get(0);                
                Tree tree = sentence.get(SentimentAnnotatedTree.class);  
                sentiment = RNNCoreAnnotations.getPredictedClass(tree);             
                SentiReturn = SentiClass[sentiment];
            }
        }
        return SentiReturn;
    }

}

回答by Arnaud

I am facing the same problem : maybe a solution with stanford_corenlp_pythat uses Py4jas pointed out by @roopalgarg.

我面临着同样的问题:也许是stanford_corenlp_py的解决方案,它使用Py4j了@roopalgarg 所指出的。

stanford_corenlp_py

This repo provides a Python interface for calling the "sentiment" and "entitymentions" annotators of Stanford's CoreNLP Java package, current as of v. 3.5.1. It uses py4j to interact with the JVM; as such, in order to run a script like scripts/runGateway.py, you must first compile and run the Java classes creating the JVM gateway.

stanford_corenlp_py

这个 repo 提供了一个 Python 接口,用于调用斯坦福 CoreNLP Java 包的“情绪”和“实体”注释器,当前版本为 3.5.1。它使用py4j与JVM交互;因此,为了运行像 scripts/runGateway.py 这样的脚本,您必须首先编译并运行创建 JVM 网关的 Java 类。

回答by adam shamsudeen

Use stanfordcore-nlp python library

使用 stanfordcore-nlp python 库

stanford-corenlp is a really good wrapper on top of the stanfordcore-nlp to use it in python.

stanford-corenlp 是 stanfordcore-nlp 之上的一个非常好的包装器,可以在 python 中使用它。

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip

Usage

用法

# Simple usage
from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('/Users/name/stanford-corenlp-full-2018-10-05')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print('Tokenize:', nlp.word_tokenize(sentence))
print('Part of Speech:', nlp.pos_tag(sentence))
print('Named Entities:', nlp.ner(sentence))
print('Constituency Parsing:', nlp.parse(sentence))
print('Dependency Parsing:', nlp.dependency_parse(sentence))

nlp.close() # Do not forget to close! The backend server will consume a lot memory.

More info

更多信息

回答by Géo Jolly

I would suggest using the TextBlob library. A sample implementation goes like this:

我建议使用 TextBlob 库。一个示例实现是这样的:

from textblob import TextBlob
def sentiment(message):
    # create TextBlob object of passed tweet text
    analysis = TextBlob(message)
    # set sentiment
    return (analysis.sentiment.polarity)

回答by zwlayer

There is a very new progress on this issue:

这个问题有一个非常新的进展:

Now you can use stanfordnlppackage inside the python:

现在你可以stanfordnlp在 python 中使用包:

From the README:

自述文件

>>> import stanfordnlp
>>> stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()

回答by Aleksander Pohl

Native Python implementation of NLP tools from Stanford

斯坦福 NLP 工具的原生 Python 实现

Recently Stanford has released a new Python packagedimplementing neural network (NN) based algorithms for the most important NLP tasks:

最近,斯坦福大学发布了一个新的Python 打包实现基于神经网络 (NN) 的算法,用于最重要的 NLP 任务:

  • tokenization
  • multi-word token (MWT) expansion
  • lemmatization
  • part-of-speech (POS) and morphological features tagging
  • dependency parsing
  • 标记化
  • 多字令牌 (MWT) 扩展
  • 词形还原
  • 词性 (POS) 和形态特征标记
  • 依赖解析

It is implemented in Python and uses PyTorch as the NN library. The package contains accurate models for more than 50 languages.

它是用 Python 实现的,并使用 PyTorch 作为 NN 库。该软件包包含适用于50 多种语言的准确模型。

To install you can use PIP:

要安装,您可以使用 PIP:

pip install stanfordnlp

To perform basic tasks you can use native Python interface with many NLP algorithms:

要执行基本任务,您可以将本机 Python 接口与许多 NLP 算法结合使用

import stanfordnlp

stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
doc.sentences[0].print_dependencies()

EDIT:

编辑:

So far, the library does not support sentiment analysis, yet I'm not deleting the answer, since it directly answers the "Stanford nlp for python" part of the question.

到目前为止,该库不支持情绪分析,但我不会删除答案,因为它直接回答了问题的“Stanford nlp for python”部分。

回答by Syauqi Haris

Right now they have STANZA.

现在他们有 STANZA。

https://stanfordnlp.github.io/stanza/

https://stanfordnlp.github.io/stanza/

Release HistoryNote that prior to version 1.0.0, the Stanza library was named as “StanfordNLP”. To install historical versions prior to to v1.0.0, you'll need to run pip install stanfordnlp.

发布历史请注意,在 1.0.0 版之前,Stanza 库被命名为“StanfordNLP”。要安装 v1.0.0 之前的历史版本,您需要运行 pip install stanfordnlp。

So, it confirms that Stanza is the full python version of stanford NLP.

因此,它确认 Stanza 是 stanford NLP 的完整 Python 版本。