使用 python 进行 sentiwordnet 评分

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38263039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:34:58  来源:igfitidea点击:

sentiwordnet scoring with python

pythonnltksenti-wordnet

提问by pechdara

I have been working on a research in relation with twitter sentiment analysis. I have a little knowledge on how to code on Python. Since my research is related with coding, I have done some research on how to analyze sentiment using Python, and the below is how far I have come to: 1.Tokenization of tweets 2. POS tagging of token and the remaining is calculating Positive and Negative of the sentiment which the issue i am facing now and need your help.

我一直在从事与推特情绪分析相关的研究。我对如何在 Python 上编码有一点了解。由于我的研究与编码有关,因此我对如何使用 Python 分析情绪做了一些研究,以下是我的进展: 1. 推文的标记化 2. 标记的 POS 标记,其余是计算 Positive 和否定我现在面临的问题并需要您的帮助的情绪。

Below is my code example:

下面是我的代码示例:

import nltk
sentence = "Iphone6 camera is awesome for low light "
token = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(token)

Therefore, I want to ask if anybody can help me to show/guide the example of using python to code about sentiwordnet to calculate the positive and negative score of the tweeets that has already been POS tagged. thank in advance

因此,我想问一下是否有人可以帮助我展示/指导使用python编码关于sentiwordnet的示例来计算已经被POS标记的推文的正负分数。预先感谢

回答by Saravana Kumar

It's a little unclear as to what exactly your question is. Do you need a guide to using Sentiwordnet? If so check out this link,

关于你的问题究竟是什么,有点不清楚。您需要使用 Sentiwordnet 的指南吗?如果是这样,请查看此链接,

http://www.nltk.org/howto/sentiwordnet.html

http://www.nltk.org/howto/sentiwordnet.html

Since you've already tokenized and POS tagged the words, all you need to do now is to use this syntax,

由于您已经对单词进行了标记和 POS 标记,您现在需要做的就是使用此语法,

swn.senti_synset('breakdown.n.03')

Breaking down the argument,

打破论点,

  • 'breakdown' = word you need scores for.
  • 'n' = part of speech
  • '03' = Usage (01 for most common usage and a higher number would indicate lesser common usages)
  • 'breakdown' = 你需要分数的词。
  • 'n' = 词性
  • '03' = 用法(01 表示最常见的用法,较高的数字表示不太常见的用法)

So for each tuple in your tagged array, create a string as above and pass it to the senti_synset function to get the positive, negative and objective score for that word.

因此,对于标记数组中的每个元组,如上所述创建一个字符串并将其传递给 senti_synset 函数以获取该单词的正面、负面和客观分数。

Caveat: The POS tagger gives you a different tag than the one senti_synset accepts. Use the following to convert to synset notation.

警告:POS 标记器为您提供与 senti_synset 接受的标记不同的标记。使用以下内容转换为同义词集表示法。

n - NOUN 
v - VERB 
a - ADJECTIVE 
s - ADJECTIVE SATELLITE 
r - ADVERB 

(Credits to Using Sentiwordnet 3.0for the above notation)

(上述符号使用 Sentiwordnet 3.0 的功劳)

That being said, it is generally not a great idea to use Sentiwordnet for Twitter sentiment analysis and here's why,

话虽如此,使用 Sentiwordnet 进行 Twitter 情绪分析通常不是一个好主意,原因如下:

Tweets are filled with typos and non-dictionary words which Sentiwordnet often times does not recognize. To counter this problem, either lemmatize/stem your tweets before you pos tag them or use a Machine Learning classifier such as Naive Bayes for which NLTK has built in functions. As for the training dataset for the classifier, either manually annotate a dataset or use a pre-labelled set such as, as the Sentiment140 corpus.

推文中充满了 Sentiwordnet 经常无法识别的拼写错误和非字典词。为了解决这个问题,要么在为推文添加标签之前对其进行词形还原/词干化,要么使用机器学习分类器,例如 NLTK 内置函数的朴素贝叶斯。对于分类器的训练数据集,要么手动注释数据集,要么使用预先标记的集,例如 Sentiment140 语料库。

If you are uninterested in actually performing the sentiment analysis but need a sentiment tag for a given tweet, you can always use the Sentiment140 API for this purpose.

如果您对实际执行情感分析不感兴趣,但需要为给定推文添加情感标签,您始终可以为此目的使用 Sentiment140 API。

回答by shantanu pathak

@Saravana Kumar has a wonderful answer.

@Saravana Kumar 有一个很好的答案。

To add detailed code to it i am writing this. I have referred link https://nlpforhackers.io/sentiment-analysis-intro/

为了给它添加详细的代码,我正在写这个。我已经提到了链接https://nlpforhackers.io/sentiment-analysis-intro/

from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
from nltk.stem import PorterStemmer

def penn_to_wn(tag):
    """
    Convert between the PennTreebank tags to simple Wordnet tags
    """
    if tag.startswith('J'):
        return wn.ADJ
    elif tag.startswith('N'):
        return wn.NOUN
    elif tag.startswith('R'):
        return wn.ADV
    elif tag.startswith('V'):
        return wn.VERB
    return None

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

def get_sentiment(word,tag):
    """ returns list of pos neg and objective score. But returns empty list if not present in senti wordnet. """

    wn_tag = penn_to_wn(tag)
    if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV):
        return []

    lemma = lemmatizer.lemmatize(word, pos=wn_tag)
    if not lemma:
        return []

    synsets = wn.synsets(word, pos=wn_tag)
    if not synsets:
        return []

    # Take the first sense, the most common
    synset = synsets[0]
    swn_synset = swn.senti_synset(synset.name())

    return [swn_synset.pos_score(),swn_synset.neg_score(),swn_synset.obj_score()]


ps = PorterStemmer()
words_data = ['this','movie','is','wonderful']
# words_data = [ps.stem(x) for x in words_data] # if you want to further stem the word

pos_val = nltk.pos_tag(words_data)
senti_val = [get_sentiment(x,y) for (x,y) in pos_val]

print(f"pos_val is {pos_val}")
print(f"senti_val is {senti_val}")

Output

输出

pos_val is [('this', 'DT'), ('movie', 'NN'), ('is', 'VBZ'), ('wonderful', 'JJ')]
senti_val is [[], [0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]

回答by Nilkanth Shirodkar

For Positive and Negative sentiments, first you need to give training and have to train the model. for training model you can use SVM, thiers open library called LibSVM you can use it.

对于 Positive 和 Negative 情绪,首先需要进行训练,并且必须训练模型。对于训练模型,您可以使用 SVM,您可以使用名为 LibSVM 的开放库。