Pandas 将对象列转换为 str - 列包含 unicode、float 等
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48177573/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas convert object column to str - column contains unicode, float etc
提问by add-semi-colons
I have pandas data frame where column type shows as object
but when I try to convert to string,
我有 Pandas 数据框,其中列类型显示为object
但是当我尝试转换为字符串时,
df['column'] = df['column'].astype('str')
df['column'] = df['column'].astype('str')
UnicodeEncodeError
get thrown:
*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
UnicodeEncodeError
被抛出:
*** UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
My next approach was to handle the encoding part:
df['column'] = filtered_df['column'].apply(lambda x: x.encode('utf-8').strip())
我的下一个方法是处理编码部分:
df['column'] = filtered_df['column'].apply(lambda x: x.encode('utf-8').strip())
But that gives following error:
*** AttributeError: 'float' object has no attribute 'encode'
但这会导致以下错误:
*** AttributeError: 'float' object has no attribute 'encode'
Whats the best approach to convert this column to string.
将此列转换为字符串的最佳方法是什么。
Sample of string in the column
列中的字符串示例
Thank you :)
Thank You !!!
responsibilities/assigned job.
回答by Nigel
I had the same problem in python 2.7 when trying to run a script that was originally intended for python 3. In python 2.7, the default str
functionality is to encode to ASCII, which will apparently not work with your data. This can be replicated in a simple example:
在尝试运行最初用于 python 3 的脚本时,我在 python 2.7 中遇到了同样的问题。在 python 2.7 中,默认str
功能是编码为 ASCII,这显然不适用于您的数据。这可以在一个简单的例子中复制:
import pandas as pd
df = pd.DataFrame({'column': ['asdf', u'uh ? oh', 123]})
df['column'] = df['column'].astype('str')
Results in:
结果是:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 3: ordinal not in range(128)
Instead, you can specify unicode:
相反,您可以指定 unicode:
df['column'] = df['column'].astype('unicode')
Verify that the number has been converted to a string:
验证数字是否已转换为字符串:
df['column'][2]
This outputs u'123'
, so it has been converted to a unicode string. The special character ? has been properly preserved as well.
这会输出u'123'
,因此它已转换为 unicode 字符串。特殊字符?也被妥善保存。