pandas 系列'对象没有属性'在熊猫中解码

Question

提问by Kabilesh

I am trying to decode utf-8 encoded text in python. The data is loaded to a pandas data frame and then I decode. This produces an error: AttributeError: 'Series' object has no attribute 'decode'. How can I properly decode the text that is in pandas column?

我正在尝试在 python 中解码 utf-8 编码的文本。数据被加载到一个Pandas数据帧，然后我解码。这会产生一个错误：AttributeError: 'Series' object has no attribute 'decode'。如何正确解码Pandas列中的文本？

>> preparedData.head(5).to_dict( )
{'id': {0: 1042616899408945154, 1: 1042592536769044487, 2: 1042587702040903680, 3: 1042587263643930626, 4: 1042586780292276230}, 'date': {0: '2018-09-20', 1: '2018-09-20', 2: '2018-09-20', 3: '2018-09-20', 4: '2018-09-20'}, 'time': {0: '03:30:14', 1: '01:53:25', 2: '01:34:13', 3: '01:32:28', 4: '01:30:33'}, 'text': {0: "b'\xf0\x9f\x8c\xb9 are red, violets are blue, if you want to buy us \xf0\x9f\x92\x90, here is a CLUE \xf0\x9f\x98\x89 Our #flowerpowered eye &amp; cheek palette is AL\xe2\x80\xa6 '", 1: "b'\xf0\x9f\x8e\xb5Is it too late now to say sorry\xf0\x9f\x8e\xb5 #tartetalk #memes'", 2: "b'@JillianJChase Oh no! Please email your order # to [email protected] &amp; we can help \xf0\x9f\x92\x95'", 3: 'b"@Danikins__ It\'s best applied with our buffer brush! \xf0\x9f\x92\x9c\xc2\xa0"', 4: "b'@AdelaineMorin DEAD \xf0\x9f\xa4\xa3\xf0\x9f\xa4\xa3\xf0\x9f\xa4\xa3'"}, 'hasMedia': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0}, 'hasHashtag': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0}, 'followers_count': {0: 801745, 1: 801745, 2: 801745, 3: 801745, 4: 801745}, 'retweet_count': {0: 17, 1: 94, 2: 0, 3: 0, 4: 0}, 'favourite_count': {0: 181, 1: 408, 2: 0, 3: 0, 4: 14}}

My data looks like the above. I want to decode the 'text' column.

我的数据如上所示。我想解码“文本”列。

ExampleText = b'\xf0\x9f\x8c\xb9 are red, violets are blue, if you want to buy us \xf0\x9f\x92\x90, here is a CLUE \xf0\x9f\x98\x89 Our #flowerpowered eye & cheek palette is AL\xe2\x80\xa6'

ExampleText = b'\xf0\x9f\x8c\xb9 是红色的，紫罗兰是蓝色的，如果你想给我们买\xf0\x9f\x92\x90，这里是一个CLUE \xf0\x9f\x98\x89 我们的#flowerpowered eye & 脸颊调色板是 AL\xe2\x80\xa6'

I could decode the text above as

我可以将上面的文本解码为

ExampleText = ExampleText.decode('utf8')

However, when I try to decode text from a pandas dataframe column, I get the error. I tried like this,

但是，当我尝试从 Pandas 数据帧列解码文本时，出现错误。我是这样试的

preparedData['text'] = preparedData['text'].decode('utf8')

Then the error I get is,

然后我得到的错误是，

Traceback (most recent call last):
File "F:/Level 4 Research Project/makeViral/main.py", line 23, in <module>
main()
File "F:/Level 4 Research Project/makeViral/main.py", line 19, in main
preprocessedData = preprocessData(preparedData)
File "F:\Level 4 Research Project\makeViral\preprocess.py", line 34, in preprocessData
 preparedData['text'] = preparedData['text'].decode('utf8')
File "C:\Users\Kabilesh\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 4376, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'decode'

I also tried

我也试过

preparedData['text'] = preparedData['text'].str.decode('utf8', errors='strict')

This does not produce any error. But the resulting 'text' column is like,

这不会产生任何错误。但由此产生的“文本”列就像，

'text': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}

Answer 1

回答by Sven Harris

I could be wrong but I would guess that what you have are byte strings rather than strings of bytes strings b"XXXXX"instead of "b'XXXXX'"as you've posted in your answer in which case you could do the following (you need to use the string accessor):

我可能是错的，但我猜你拥有的是字节字符串而不是字节字符串，b"XXXXX"而不是"b'XXXXX'"你在答案中发布的字符串，在这种情况下你可以执行以下操作（你需要使用字符串访问器）：

preparedData['text'] = preparedData['text'].str.decode('utf8')

Edit: Looks like my assumption was wrong, in which case you can do a pre-processing step:

编辑：看起来我的假设是错误的，在这种情况下，您可以执行预处理步骤：

import ast
preparedData['text'] = preparedData['text'].apply(ast.literal_eval).str.decode("utf-8")

pandas 系列'对象没有属性'在熊猫中解码

提问by Kabilesh

回答by Sven Harris

相关推荐

最近更新

标签

pandas 系列'对象没有属性'在熊猫中解码

提问by Kabilesh

回答by Sven Harris

相关推荐

pandas 使用 seaborn distplot 设置轴最大值

用户警告：Pandas 不允许通过新的属性名称创建列

pandas read_csv 删除空白行

像访问常规列一样访问 Pandas 索引

相关推荐

最近更新

标签