Python 将 Pandas df 写入 csv 时出现 Unicode 编码错误

Question

提问by I am not George

I cleaned 400 excel files and read them into python using pandas and appended all the raw data into one big df.

我清理了 400 个 excel 文件并使用 Pandas 将它们读入 python 并将所有原始数据附加到一个大 df 中。

Then when I try to export it to a csv:

然后当我尝试将其导出到 csv 时：

df.to_csv("path",header=True,index=False)

I get this error:

我收到此错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xc7' in position 20: ordinal not in range(128)

Can someone suggest a way to fix this and what it means?

有人可以建议一种方法来解决这个问题，这意味着什么？

Thanks

谢谢

Answer 1

采纳答案by unutbu

You have unicodevalues in your DataFrame. Files store bytes, which means all unicodehave to be encoded into bytes before they can be stored in a file. You have to specify an encoding, such as utf-8. For example,

您unicode的 DataFrame 中有值。文件存储字节，这意味着unicode在将它们存储在文件中之前，所有这些都必须编码为字节。您必须指定编码，例如utf-8. 例如，

df.to_csv('path', header=True, index=False, encoding='utf-8')

If you don't specify an encoding, then the encoding used by df.to_csvdefaults to asciiin Python2, or utf-8in Python3.

如果不指定编码，则df.to_csv默认使用asciiPython2 或utf-8Python3 中的编码。

Answer 2

回答by tangfucius

Adding an answer to help myself google it later:

添加一个答案以帮助自己稍后进行谷歌搜索：

One trick that helped me is to encode a problematic series first, then decode it back to utf-8. Like:

帮助我的一个技巧是首先对有问题的系列进行编码，然后将其解码回 utf-8。喜欢：

df['crumbs'] = df['crumbs'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))

This would get the dataframe to print correctly too.

这也将使数据框正确打印。

Python 将 Pandas df 写入 csv 时出现 Unicode 编码错误

提问by I am not George

采纳答案by unutbu

回答by tangfucius

相关推荐

最近更新

标签

Python 将 Pandas df 写入 csv 时出现 Unicode 编码错误

提问by I am not George

采纳答案by unutbu

回答by tangfucius

相关推荐

Python 如何将 scikit-learn 的 LogisticRegression 应用于一些十进制数据？

Python sklearn：如何获得多项式特征的系数

Python 加载模块时使用 sys.path.insert(0, path) 和 sys.path(append) 的效果

Python str 与 unicode 类型

相关推荐

最近更新

标签