Pandas：保存到 excel 编码问题

Question

提问by jjj

I have a similar problem to the one mentioned herebut none of the suggested methods work for me.

我有一个与这里提到的问题类似的问题，但没有一种建议的方法对我有用。

I have a medium size utf-8.csv file with a lot of non-ascii characters. I am splitting the file by a particular value from one of the columns, and then I'd like to save each of the obtained dataframes as an .xlsx file with the characters preserved.

我有一个中等大小的utf-8.csv 文件，其中包含许多非 ascii 字符。我正在按列中的一个特定值拆分文件，然后我想将每个获得的数据帧保存为一个 .xlsx 文件，并保留字符。

This doesn't work, as I am getting an error:

这不起作用，因为我收到一个错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 7: ordinal not in range(128)

Here is what I tried:

这是我尝试过的：

Using xlsxwriterengine explicitly. This doesn't seem to change anything.

Defining a function (below) to change encoding and throw away bad characters. This also doesn't change anything.

def changeencode(data):
cols = data.columns
for col in cols:
if data[col].dtype == 'O':
    data[col] = data[col].str.decode('utf-8').str.encode('ascii', 'ignore')
return data

Changing by hand all the offensive chars to some others. Still no effect (the quoted error was obtained after this change).
Encoding the file as utf-16(which, I believe, is the correct encoding since I want to be able to manipulate the file from within the excel afterwards) doesn't help either.

xlsxwriter显式使用引擎。这似乎没有改变任何事情。

定义一个函数（如下）来改变编码并丢弃坏字符。这也不会改变任何事情。

def changeencode(data):
cols = data.columns
for col in cols:
if data[col].dtype == 'O':
    data[col] = data[col].str.decode('utf-8').str.encode('ascii', 'ignore')
return data

手动将所有攻击性字符更改为其他字符。仍然没有效果（引用的错误是在此更改后获得的）。
将文件编码为utf-16（我相信这是正确的编码，因为我希望之后能够在 excel 中操作文件）也无济于事。

I believe that the problem is in the file itself (because of 2 and 3) but I have no idea how to get around it. I'd appreciate any help. The beginning of the file is pasted below.

我相信问题出在文件本身（因为 2 和 3），但我不知道如何解决它。我很感激任何帮助。文件的开头粘贴在下面。

"Submitted","your-name","youremail","phone","miasto","cityCF","innemiasto","languagesCF","morelanguages","wiek","partnerCF","messageCF","acceptance-795","Submitted Login","Submitted From","2015-12-25 14:07:58 +00:00","Zózia kry?","[email protected]","4444444","Wroc?aw","","testujemy polskie znaki","Polski","testujemy polskie znaki","44","test","test","1","Justyna","99.111.155.132",

EDIT

编辑

Some code (one of the versions, without the splitting part):

一些代码（其中一个版本，没有拆分部分）：

import pandas as pd
import string
import xlsxwriter

df = pd.read_csv('path-to-file.csv')

with pd.ExcelWriter ('test.xlsx') as writer:
                df.to_excel(writer, sheet_name = 'sheet1',engine='xlsxwriter')

Answer 1

采纳答案by jjj

Supposedly this was a bug in the version of pandaswhich I was using back then. Right now, in pandasver. 0.19.2, the code below saves the csv from the question without any trouble (and with correct encoding).
NB: openpyxlmodule have to be installed on your system.

据说这是pandas我当时使用的版本中的一个错误。现在，在pandasver. 0.19.2，下面的代码从问题中保存了 csv 没有任何问题（并且具有正确的编码）。
注意：openpyxl模块必须安装在您的系统上。

import pandas as pd
df = pd.read_csv('Desktop/test.csv')
df.to_excel('Desktop/test.xlsx', encoding='utf8')

Answer 2

回答by Siva Arasu

Try encoding the columns with non-ascii characters as

尝试将非 ascii 字符的列编码为

df['col'] = df['col'].apply(lambda x: unicode(x))

and then save the file to xlsx format with encoding 'utf8'

然后将文件保存为 xlsx 格式，编码为“utf8”

Answer 3

回答by rbinnun

What if you save the csv files from pandas and then use win32com to convert to Excel. It would look something like this...

如果您从 Pandas 保存 csv 文件，然后使用 win32com 转换为 Excel 会怎样。它看起来像这样......

import win32com.client
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = 0

for x in range(10): 
    f = path + str(x)
    # not showing the pandas dataframe creation
    df.to_csv(f+'.csv')
    wb = excel.Workbooks.Open(f+'.csv')
    wb.SaveAs(f+'.xlsx', 51) #xlOpenXMLWorkbook=51

Pandas：保存到 excel 编码问题

提问by jjj

采纳答案by jjj

回答by Siva Arasu

回答by rbinnun

相关推荐

最近更新

标签

Pandas：保存到 excel 编码问题

提问by jjj

采纳答案by jjj

回答by Siva Arasu

回答by rbinnun

相关推荐

pandas 将年份和年份中的日期转换为熊猫中的日期时间索引

pandas 熊猫：具有不同列名的连接数据框

在 Pandas DataFrame 的字符串中漂亮地打印换行符

Pandas dropna - 存储删除的行

相关推荐

最近更新

标签