Python AttributeError: 'list' 对象没有属性 'lower' gensim

Question

提问by tom

I have a list of 10k words in a text file like so:

我在一个文本文件中有一个 10k 个单词的列表，如下所示：

G15 KDN C30A Action Standard Air Brush Air Dilution

G15 KDN C30A 动作标准气刷空气稀释

I am trying to convert them into lower cased tokens using this code for subsequent processing with GenSim:

我正在尝试使用此代码将它们转换为小写的标记，以便使用 GenSim 进行后续处理：

data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
texts = [[word for word in data.lower().split()] for word in data]

and I get the following callback:

我得到以下回调：

AttributeErrorTraceback (most recent call last)
<ipython-input-84-33bbe380449e> in <module>()
      1 data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
----> 2 texts = [[word for word in data.lower().split()] for word in data]
      3 
AttributeError: 'list' object has no attribute 'lower'

Any suggestions on what I am doing wrong and how to correct it would be greatly appreciated!!! Thank you!!

任何关于我做错了什么以及如何纠正它的建议将不胜感激！！！谢谢！！

Answer 1

回答by epattaro

try:

尝试：

data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')]
texts = [[word.lower() for word in text.split()] for text in data]

you were trying to apply .lower() to data, which is a list.
.lower() can only be applied to strings.

您试图将 .lower() 应用于数据，这是一个列表。
.lower() 只能应用于字符串。

Answer 2

回答by kvorobiev

You need

你需要

texts = [[word.lower() for word in line.split()] for line in data]

This code for each linein data([... for line in data]) generate a list of lower case words ([word.lower() for word in line.split()]). Each str linewill contain a sequence of space-separated words.line.split()will turn this sequence into list. And word.lower()will convert each word to lowercase.

该代码对于每个line在data（[... for line in data]）生成的小写字的列表（[word.lower() for word in line.split()]）。每个 strline将包含一系列以空格分隔的单词。line.split()将把这个序列变成列表。并且word.lower()将每个单词转换为小写。

Answer 3

回答by marmeladze

what you are doing wrong is, calling a string method (lower()) for a list (in your case, data)

您做错的是，lower()为列表（在您的情况下为数据）调用字符串方法（）

data = [line.strip() for line in open('corpus.txt', 'r')]

what you should do after getting lines as list entry is

获取行作为列表条目后应该做什么

texts = [[words for words in sentences.lower().split()] for sentences in data]
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*********^^^^^^^^^^^^^^^^^^^^^^*********^^^^
#you should call lower on iter. value - in our case it is "sentences"

this will give you list of lists. each list contains the lowercased words form lines.

这将为您提供列表列表。每个列表都包含行中的小写单词。

$ tail -n 10 corpus.txt 
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution
G15 KDN C30A Action Standard Air Brush Air Dilution


$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = [line.strip() for line in open('corpus.txt', 'r')]
>>> texts = [[words for words in sentences.lower().split()] for sentences in data]
>>> texts[:5]
[['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution']]
>>>

sure you can flatten or just keep as it is.

确保您可以压平或保持原样。

>>> flattened = reduce(lambda x,y: x+y, texts)
>>> flattened[:30]
['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a']
>>>

Answer 4

回答by Viraj Wadate

Simply we can convert list into small latter do this.

简单地我们可以将列表转换为小的后者来做到这一点。

>>> words = ["PYTHON", "PROGRAMMING"]
>>> type((words))
>>> for i in words:
          print(i.lower())

Output:

输出：

python programming

蟒蛇编程

Python AttributeError: 'list' 对象没有属性 'lower' gensim

提问by tom

回答by epattaro

回答by kvorobiev

回答by marmeladze

回答by Viraj Wadate

相关推荐

最近更新

标签

Python AttributeError: 'list' 对象没有属性 'lower' gensim

提问by tom

回答by epattaro

回答by kvorobiev

回答by marmeladze

回答by Viraj Wadate

相关推荐

Python 使用 webdriver 滚动到元素？

pytest 不能导入模块，而 python 可以

使用 PythonOperator 模板文件的气流

Python 替换某个索引中的字符

相关推荐

最近更新

标签