pandas 系列'对象没有属性'在熊猫中解码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52484199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Series' object has no attribute 'decode in pandas
提问by Kabilesh
I am trying to decode utf-8 encoded text in python. The data is loaded to a pandas data frame and then I decode. This produces an error: AttributeError: 'Series' object has no attribute 'decode'. How can I properly decode the text that is in pandas column?
我正在尝试在 python 中解码 utf-8 编码的文本。数据被加载到一个Pandas数据帧,然后我解码。这会产生一个错误:AttributeError: 'Series' object has no attribute 'decode'。如何正确解码Pandas列中的文本?
>> preparedData.head(5).to_dict( )
{'id': {0: 1042616899408945154, 1: 1042592536769044487, 2: 1042587702040903680, 3: 1042587263643930626, 4: 1042586780292276230}, 'date': {0: '2018-09-20', 1: '2018-09-20', 2: '2018-09-20', 3: '2018-09-20', 4: '2018-09-20'}, 'time': {0: '03:30:14', 1: '01:53:25', 2: '01:34:13', 3: '01:32:28', 4: '01:30:33'}, 'text': {0: "b'\xf0\x9f\x8c\xb9 are red, violets are blue, if you want to buy us \xf0\x9f\x92\x90, here is a CLUE \xf0\x9f\x98\x89 Our #flowerpowered eye & cheek palette is AL\xe2\x80\xa6 '", 1: "b'\xf0\x9f\x8e\xb5Is it too late now to say sorry\xf0\x9f\x8e\xb5 #tartetalk #memes'", 2: "b'@JillianJChase Oh no! Please email your order # to [email protected] & we can help \xf0\x9f\x92\x95'", 3: 'b"@Danikins__ It\'s best applied with our buffer brush! \xf0\x9f\x92\x9c\xc2\xa0"', 4: "b'@AdelaineMorin DEAD \xf0\x9f\xa4\xa3\xf0\x9f\xa4\xa3\xf0\x9f\xa4\xa3'"}, 'hasMedia': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0}, 'hasHashtag': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0}, 'followers_count': {0: 801745, 1: 801745, 2: 801745, 3: 801745, 4: 801745}, 'retweet_count': {0: 17, 1: 94, 2: 0, 3: 0, 4: 0}, 'favourite_count': {0: 181, 1: 408, 2: 0, 3: 0, 4: 14}}
My data looks like the above. I want to decode the 'text' column.
我的数据如上所示。我想解码“文本”列。
ExampleText = b'\xf0\x9f\x8c\xb9 are red, violets are blue, if you want to buy us \xf0\x9f\x92\x90, here is a CLUE \xf0\x9f\x98\x89 Our #flowerpowered eye & cheek palette is AL\xe2\x80\xa6'
ExampleText = b'\xf0\x9f\x8c\xb9 是红色的,紫罗兰是蓝色的,如果你想给我们买\xf0\x9f\x92\x90,这里是一个CLUE \xf0\x9f\x98\x89 我们的#flowerpowered eye & 脸颊调色板是 AL\xe2\x80\xa6'
I could decode the text above as
我可以将上面的文本解码为
ExampleText = ExampleText.decode('utf8')
However, when I try to decode text from a pandas dataframe column, I get the error. I tried like this,
但是,当我尝试从 Pandas 数据帧列解码文本时,出现错误。我是这样试的
preparedData['text'] = preparedData['text'].decode('utf8')
Then the error I get is,
然后我得到的错误是,
Traceback (most recent call last):
File "F:/Level 4 Research Project/makeViral/main.py", line 23, in <module>
main()
File "F:/Level 4 Research Project/makeViral/main.py", line 19, in main
preprocessedData = preprocessData(preparedData)
File "F:\Level 4 Research Project\makeViral\preprocess.py", line 34, in preprocessData
preparedData['text'] = preparedData['text'].decode('utf8')
File "C:\Users\Kabilesh\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 4376, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'decode'
I also tried
我也试过
preparedData['text'] = preparedData['text'].str.decode('utf8', errors='strict')
This does not produce any error. But the resulting 'text' column is like,
这不会产生任何错误。但由此产生的“文本”列就像,
'text': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}
回答by Sven Harris
I could be wrong but I would guess that what you have are byte strings rather than strings of bytes strings b"XXXXX"
instead of "b'XXXXX'"
as you've posted in your answer in which case you could do the following (you need to use the string accessor):
我可能是错的,但我猜你拥有的是字节字符串而不是字节字符串,b"XXXXX"
而不是"b'XXXXX'"
你在答案中发布的字符串,在这种情况下你可以执行以下操作(你需要使用字符串访问器):
preparedData['text'] = preparedData['text'].str.decode('utf8')
Edit: Looks like my assumption was wrong, in which case you can do a pre-processing step:
编辑:看起来我的假设是错误的,在这种情况下,您可以执行预处理步骤:
import ast
preparedData['text'] = preparedData['text'].apply(ast.literal_eval).str.decode("utf-8")