Python 文本处理：AttributeError: 'list' 对象没有属性 'lower'

Question

提问by user3670554

I am new to Python and to Stackoverflow(please be gentle) and am trying to learn how to do a sentiment analysis. I am using a combination of code I found in a tutorial and here: Python - AttributeError: 'list' object has no attributeHowever, I keep getting

我是 Python 和 Stackoverflow 的新手（请保持温和），我正在尝试学习如何进行情绪分析。我正在使用我在教程和此处找到的代码组合：Python - AttributeError: 'list' object has no attribute但是，我不断收到

Traceback (most recent call last):
    File "C:/Python27/training", line 111, in <module>
    processedTestTweet = processTweet(row)
  File "C:/Python27/training", line 19, in processTweet
    tweet = tweet.lower()
AttributeError: 'list' object has no attribute 'lower'`

This is my code:

这是我的代码：

import csv
#import regex
import re
import pprint
import nltk.classify


#start replaceTwoOrMore
def replaceTwoOrMore(s):
    #look for 2 or more repetitions of character
    pattern = re.compile(r"(.){1,}", re.DOTALL)
    return pattern.sub(r"", s)

# process the tweets
def processTweet(tweet):
    #Convert to lower case
    tweet = tweet.lower()
    #Convert www.* or https?://* to URL
    tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet)
    #Convert @username to AT_USER
    tweet = re.sub('@[^\s]+','AT_USER',tweet)
    #Remove additional white spaces
    tweet = re.sub('[\s]+', ' ', tweet)
    #Replace #word with word
    tweet = re.sub(r'#([^\s]+)', r'', tweet)
    #trim
    tweet = tweet.strip('\'"')
    return tweet

#start getStopWordList
def getStopWordList(stopWordListFileName):
    #read the stopwords file and build a list
    stopWords = []
    stopWords.append('AT_USER')
    stopWords.append('URL')

    fp = open(stopWordListFileName, 'r')
    line = fp.readline()
    while line:
        word = line.strip()
        stopWords.append(word)
        line = fp.readline()
    fp.close()
    return stopWords

def getFeatureVector(tweet, stopWords):
    featureVector = []
    words = tweet.split()
    for w in words:
        #replace two or more with two occurrences
        w = replaceTwoOrMore(w)
        #strip punctuation
        w = w.strip('\'"?,.')
        #check if it consists of only words
        val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*[a-zA-Z]+[a-zA-Z0-9]*$", w)
        #ignore if it is a stopWord
        if(w in stopWords or val is None):
            continue
        else:
            featureVector.append(w.lower())
     return featureVector

def extract_features(tweet):
    tweet_words = set(tweet)
    features = {}
    for word in featureList:
        features['contains(%s)' % word] = (word in tweet_words)
    return features


#Read the tweets one by one and process it
inpTweets = csv.reader(open('C:/GsTraining.csv', 'rb'),
                       delimiter=',',
                       quotechar='|')
stopWords = getStopWordList('C:/stop.txt')
count = 0;
featureList = []
tweets = []

for row in inpTweets:
    sentiment = row[0]
    tweet = row[1]
    processedTweet = processTweet(tweet)
    featureVector = getFeatureVector(processedTweet, stopWords)
    featureList.extend(featureVector)
    tweets.append((featureVector, sentiment))

# Remove featureList duplicates
featureList = list(set(featureList))

# Generate the training set
training_set = nltk.classify.util.apply_features(extract_features, tweets)

# Train the Naive Bayes classifier
NBClassifier = nltk.NaiveBayesClassifier.train(training_set)

# Test the classifier
with open('C:/CleanedNewGSMain.txt', 'r') as csvinput:
    with open('GSnewmain.csv', 'w') as csvoutput:
    writer = csv.writer(csvoutput, lineterminator='\n')
    reader = csv.reader(csvinput)

    all=[]
    row = next(reader)

    for row in reader:
        processedTestTweet = processTweet(row)
        sentiment = NBClassifier.classify(
            extract_features(getFeatureVector(processedTestTweet, stopWords)))
        row.append(sentiment)
        processTweet(row[1])

    writer.writerows(all)

Any help would be massively appreciated.

任何帮助将不胜感激。

Answer 1

回答by Slater Victoroff

The result from the csv reader is a list, loweronly works on strings. Presumably it is a list of string, so there are two options. Either you can call loweron each element, or turn the list into a string and then call loweron it.

csv 阅读器的结果是一个列表，lower仅适用于字符串。大概是一个字符串列表，所以有两个选项。您可以调用lower每个元素，也可以将列表转换为字符串然后调用lower它。

# the first approach
[item.lower() for item in tweet]

# the second approach
' '.join(tweet).lower()

But more reasonably (hard to tell without more information) you only actually want one item out of your list. Something along the lines of:

但更合理（如果没有更多信息就很难说）你实际上只想要一个项目从你的列表中删除。类似的东西：

for row in reader:
    processedTestTweet = processTweet(row[0]) # Again, can't know if this is actually correct without seeing the file

Also, guessing that you aren't using the csv reader quite like you think you are, because right now you are training a naive bayes classifier on a single example every time and then having it predict the one example it was trained on. Maybe explain what you're trying to do?

此外，猜测您并没有像您想象的那样使用 csv 阅读器，因为现在您每次都在单个示例上训练朴素贝叶斯分类器，然后让它预测它所训练的一个示例。也许解释一下你想要做什么？

Python 文本处理：AttributeError: 'list' 对象没有属性 'lower'

提问by user3670554

回答by Slater Victoroff

相关推荐

最近更新

标签

Python 文本处理：AttributeError: 'list' 对象没有属性 'lower'

提问by user3670554

回答by Slater Victoroff

相关推荐

Python 将文件写入不存在的目录

Python 获取整数的最后三位

Python 熊猫分组和加入列表

Python datetime.utcnow() 返回不正确的日期时间

相关推荐

最近更新

标签