Pandas:保存到 excel 编码问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34485982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: save to excel encoding issue
提问by jjj
I have a similar problem to the one mentioned herebut none of the suggested methods work for me.
我有一个与这里提到的问题类似的问题,但没有一种建议的方法对我有用。
I have a medium size utf-8
.csv file with a lot of non-ascii characters.
I am splitting the file by a particular value from one of the columns, and then I'd like to save each of the obtained dataframes as an .xlsx file with the characters preserved.
我有一个中等大小的utf-8
.csv 文件,其中包含许多非 ascii 字符。我正在按列中的一个特定值拆分文件,然后我想将每个获得的数据帧保存为一个 .xlsx 文件,并保留字符。
This doesn't work, as I am getting an error:
这不起作用,因为我收到一个错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 7: ordinal not in range(128)
Here is what I tried:
这是我尝试过的:
- Using
xlsxwriter
engine explicitly. This doesn't seem to change anything. Defining a function (below) to change encoding and throw away bad characters. This also doesn't change anything.
def changeencode(data): cols = data.columns for col in cols: if data[col].dtype == 'O': data[col] = data[col].str.decode('utf-8').str.encode('ascii', 'ignore') return data
Changing by hand all the offensive chars to some others. Still no effect (the quoted error was obtained after this change).
Encoding the file as
utf-16
(which, I believe, is the correct encoding since I want to be able to manipulate the file from within the excel afterwards) doesn't help either.
xlsxwriter
显式使用引擎。这似乎没有改变任何事情。定义一个函数(如下)来改变编码并丢弃坏字符。这也不会改变任何事情。
def changeencode(data): cols = data.columns for col in cols: if data[col].dtype == 'O': data[col] = data[col].str.decode('utf-8').str.encode('ascii', 'ignore') return data
手动将所有攻击性字符更改为其他字符。仍然没有效果(引用的错误是在此更改后获得的)。
将文件编码为
utf-16
(我相信这是正确的编码,因为我希望之后能够在 excel 中操作文件)也无济于事。
I believe that the problem is in the file itself (because of 2 and 3) but I have no idea how to get around it. I'd appreciate any help. The beginning of the file is pasted below.
我相信问题出在文件本身(因为 2 和 3),但我不知道如何解决它。我很感激任何帮助。文件的开头粘贴在下面。
"Submitted","your-name","youremail","phone","miasto","cityCF","innemiasto","languagesCF","morelanguages","wiek","partnerCF","messageCF","acceptance-795","Submitted Login","Submitted From","2015-12-25 14:07:58 +00:00","Zózia kry?","[email protected]","4444444","Wroc?aw","","testujemy polskie znaki","Polski","testujemy polskie znaki","44","test","test","1","Justyna","99.111.155.132",
EDIT
编辑
Some code (one of the versions, without the splitting part):
一些代码(其中一个版本,没有拆分部分):
import pandas as pd
import string
import xlsxwriter
df = pd.read_csv('path-to-file.csv')
with pd.ExcelWriter ('test.xlsx') as writer:
df.to_excel(writer, sheet_name = 'sheet1',engine='xlsxwriter')
采纳答案by jjj
Supposedly this was a bug in the version of pandas
which I was using back then.
Right now, in pandas
ver. 0.19.2, the code below saves the csv from the question without any trouble (and with correct encoding).
NB: openpyxl
module have to be installed on your system.
据说这是pandas
我当时使用的版本中的一个错误。现在,在pandas
ver. 0.19.2,下面的代码从问题中保存了 csv 没有任何问题(并且具有正确的编码)。
注意:openpyxl
模块必须安装在您的系统上。
import pandas as pd
df = pd.read_csv('Desktop/test.csv')
df.to_excel('Desktop/test.xlsx', encoding='utf8')
回答by Siva Arasu
Try encoding the columns with non-ascii characters as
尝试将非 ascii 字符的列编码为
df['col'] = df['col'].apply(lambda x: unicode(x))
and then save the file to xlsx format with encoding 'utf8'
然后将文件保存为 xlsx 格式,编码为“utf8”
回答by rbinnun
What if you save the csv files from pandas and then use win32com to convert to Excel. It would look something like this...
如果您从 Pandas 保存 csv 文件,然后使用 win32com 转换为 Excel 会怎样。它看起来像这样......
import win32com.client
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = 0
for x in range(10):
f = path + str(x)
# not showing the pandas dataframe creation
df.to_csv(f+'.csv')
wb = excel.Workbooks.Open(f+'.csv')
wb.SaveAs(f+'.xlsx', 51) #xlOpenXMLWorkbook=51