Pandas - 编写包含 unicode 的 excel 文件 - IllegalCharacterError
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28837057/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - Writing an excel file containing unicode - IllegalCharacterError
提问by killajoule
I have the following code:
我有以下代码:
import pandas as pd
x = [u'string with some unicode: \x16']
df = pd.DataFrame(x)
If I try to write this dataframe as an excel file:
如果我尝试将此数据框编写为 excel 文件:
df.to_excel("test.xlsx")
Or, if I try to write this dataframe as an excel file, with utf-8 encoding:
或者,如果我尝试将此数据框写入 excel 文件,并使用 utf-8 编码:
ew = pd.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
df.to_excel(ew)
I get the following error:
我收到以下错误:
IllegalCharacterError Traceback (most recent call last)
<ipython-input-4-62adec25ae8d> in <module>()
1 ew = pd.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
2 #df.to_excel("test.xlsx")
----> 3 df.to_excel(ew)
/usr/local/lib/python2.7/dist-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs)
86 else:
87 kwargs[new_arg_name] = new_arg_value
---> 88 return func(*args, **kwargs)
89 return wrapper
90 return _deprecate_kwarg
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep)
1258 formatted_cells = formatter.get_formatted_cells()
1259 excel_writer.write_cells(formatted_cells, sheet_name,
-> 1260 startrow=startrow, startcol=startcol)
1261 if need_save:
1262 excel_writer.save()
/usr/local/lib/python2.7/dist-packages/pandas/io/excel.pyc in write_cells(self, cells, sheet_name, startrow, startcol)
679 colletter = get_column_letter(startcol + cell.col + 1)
680 xcell = wks.cell("%s%s" % (colletter, startrow + cell.row + 1))
--> 681 xcell.value = _conv_value(cell.val)
682 style_kwargs = {}
683
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in value(self, value)
360 def value(self, value):
361 """Set the value and infer type and display options."""
--> 362 self._bind_value(value)
363
364 @property
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in _bind_value(self, value)
269 elif self.guess_types:
270 value = self._infer_value(value)
--> 271 self.set_explicit_value(value, self.data_type)
272
273
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in set_explicit_value(self, value, data_type)
235 raise ValueError('Invalid data type: %s' % data_type)
236 if isinstance(value, STRING_TYPES):
--> 237 value = self.check_string(value)
238 self._value = value
239 self.data_type = data_type
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in check_string(self, value)
220 value = value[:32767]
221 if next(ILLEGAL_CHARACTERS_RE.finditer(value), None):
--> 222 raise IllegalCharacterError
223 return value
224
IllegalCharacterError:
How can I write a pandas dataframe containing unicode to an excel file?
如何将包含 unicode 的 Pandas 数据框写入 excel 文件?
采纳答案by bobince
Not a Unicode issue as such... \x16(or in Unicode strings \u0016refers to the same character) is ASCII control code 22 (SYN). Pandas says it's invalid to have control codes (other than tab and newlines) in an Excel file, and though I don't know much about Excel files it would certainly be impossible to include them in an XML 1.0 file, which is what's inside a xlsx.
不是 Unicode 问题本身...... \x16(或在 Unicode 字符串\u0016中指的是同一个字符)是 ASCII 控制代码 22 (SYN)。Pandas 说在 Excel 文件中包含控制代码(制表符和换行符除外)是无效的,尽管我对 Excel 文件了解不多,但肯定不可能将它们包含在 XML 1.0 文件中,这就是xlsx。
So most likely there is no way to include arbitrary character sequences (with control codes) in an Excel. You should filter them out before writing, or if you really need to preserve the original data use some form of ad hoc encoding recognised only by your application.
所以很可能没有办法在 Excel 中包含任意字符序列(带有控制代码)。您应该在写入之前将它们过滤掉,或者如果您确实需要保留原始数据,请使用某种只能由您的应用程序识别的特殊编码形式。
回答by mathsyouth
The same problem happened to me. I solved it as follows:
同样的问题发生在我身上。我是这样解决的:
First, install python package xlsxwriter:
首先,安装python包xlsxwriter:
pip install xlsxwriter
Second, replace the default engine 'openpyxl' with 'xlsxwriter':
其次,用“xlsxwriter”替换默认引擎“openpyxl”:
df.to_excel("test.xlsx", engine='xlsxwriter')
回答by Krishna Aswani
Use this to remove any error that you might be getting. You can save to excel post this.
使用它来消除您可能遇到的任何错误。你可以保存到excel张贴这个。
df = df.applymap(lambda x: x.encode('unicode_escape').
decode('utf-8') if isinstance(x, str) else x)
回答by Samuel Nde
When I encounter this error, I usually go around it by writing the file to a '.csvinstead of '.xlsx'files.
So instead of
当我遇到这个错误时,我通常通过将文件写入 a'.csv而不是'.xlsx'files来解决它。所以代替
yourdataframe.to_excel('Your workbook name.xlsx')
I would do:
我会做:
yourdataframe.to_csv('Your workbook name.csv')
It appears the way pandasdecodes .csvfiles by default is:
似乎默认情况下pandas解码.csv文件的方式是:
encoding : string, optional
A string representing the encoding to use in the output file,
defaults to 'ascii' on Python 2 and 'utf-8' on Python 3.
On the other hand default encoding of .xlsxfiles is:
另一方面,.xlsx文件的默认编码是:
encoding: string, default None
encoding of the resulting excel file. Only necessary for xlwt,
other writers support unicode natively.
This difference is responsible for that error. You will also get the error when you write data with strings that start with -or +to a .xlsxfile.
这种差异是造成该错误的原因。当您使用以文件开头-或+以.xlsx文件开头的字符串写入数据时,您也会收到错误消息。
回答by Louise
I don't know this particular language, but generally there is an error with excel and UTF8. If you just open a file of UTF8 characters with excel programatically, it will corrupt them (it doesn't seem to handle all the bits in the character, but truncates it to effectively the first 2 and last 2 hex numbers of the 8 present in extended characters).
我不知道这种特定的语言,但通常 excel 和 UTF8 会出错。如果你只是用 excel 以编程方式打开一个 UTF8 字符的文件,它会破坏它们(它似乎没有处理字符中的所有位,而是将它有效地截断为 8 的前 2 个和最后 2 个十六进制数扩展字符)。
A work around, to load a utf file correctly into excel, is to get the program insert a macro into your excel sheet after you have loaded it which imports the data. I have some code to do this in C#, if that's any help?
将 utf 文件正确加载到 excel 中的一种解决方法是让程序在加载宏并导入数据后将宏插入到 excel 表中。我有一些代码可以在 C# 中做到这一点,如果这有帮助吗?
does your input contain any extended characters (i.e. àa???è?éêì???ò????üù?à????è?éêì???ò????üù?) and if you take them out, does it work?
您的输入是否包含任何扩展字符(即 àa???è?éêì???ò????üù?à????è?éêì???ò????üù?),如果您使用它们出来了,有用吗?

