pandas 如何删除非法字符以便数据框可以写入 Excel
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42306755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove illegal characters so a dataframe can write to Excel
提问by user4896331
I am trying to write a dataframe to an Excel spreadsheet using ExcelWriter, but it keeps returning an error:
我正在尝试使用 ExcelWriter 将数据框写入 Excel 电子表格,但它一直返回错误:
openpyxl.utils.exceptions.IllegalCharacterError
I'm guessing there's some character in the dataframe that ExcelWriter doesn't like. It seems odd, because the dataframe is formed from three Excel spreadsheets, so I can't see how there could be a character that Excel doesn't like!
我猜数据框中有一些 ExcelWriter 不喜欢的字符。看起来很奇怪,因为数据框是由三个 Excel 电子表格组成的,所以我看不出怎么会有 Excel 不喜欢的字符!
Is there any way to iterate through a dataframe and replace characters that ExcelWriter doesn't like? I don't even mind if it simply deletes them.
有没有办法遍历数据框并替换 ExcelWriter 不喜欢的字符?我什至不介意它是否只是删除它们。
What's the best way or removing or replacing illegal characters from a dataframe?
从数据框中删除或替换非法字符的最佳方法是什么?
回答by user4896331
Based on Haipeng Su's answer, I added a function that does this:
根据 Haipeng Su 的回答,我添加了一个执行此操作的函数:
dataframe = dataframe.applymap(lambda x: x.encode('unicode_escape').
decode('utf-8') if isinstance(x, str) else x)
Basically, it escapes the unicode characters if they exist. It worked and I can now write to Excel spreadsheets again!
基本上,它会转义 unicode 字符(如果它们存在)。它奏效了,我现在可以再次写入 Excel 电子表格了!
回答by Jialin Zou
try a different excel writer engine solved my problem.
尝试不同的 excel 编写器引擎解决了我的问题。
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
回答by mathsyouth
The same problem happened to me. I solved it as follows:
同样的问题发生在我身上。我是这样解决的:
- install python package xlsxwriter:
- 安装python包xlsxwriter:
pip install xlsxwriter
- replace the default engine 'openpyxl' with 'xlsxwriter':
- 用“xlsxwriter”替换默认引擎“openpyxl”:
dataframe.to_excel("file.xlsx", engine='xlsxwriter')
回答by Haipeng Su
I was also struggling with some weird characters in a data frame when writing the data frame to html or csv. For example, for characters with accent, I can't write to html file, so I need to convert the characters into characters without the accent.
在将数据框写入 html 或 csv 时,我也在数据框中遇到了一些奇怪的字符。例如,对于带重音的字符,我无法写入html文件,因此我需要将字符转换为不带重音的字符。
My method may not be the best, but it helps me to convert unicode
string into ascii
compatible.
我的方法可能不是最好的,但它帮助我将unicode
字符串转换为ascii
兼容的。
# install unidecode first
from unidecode import unidecode
def FormatString(s):
if isinstance(s, unicode):
try:
s.encode('ascii')
return s
except:
return unidecode(s)
else:
return s
df2 = df1.applymap(FormatString)
In your situation, if you just want to get rid of the illegal characters by changing return unidecode(s)
to return 'StringYouWantToReplace'
.
在您的情况下,如果您只想通过更改return unidecode(s)
为return 'StringYouWantToReplace'
.
Hope this can give me some ideas to deal with your problems.
希望这可以给我一些想法来处理您的问题。
回答by REdim.Learning
If you're still struggling to clean up the characters, this worked well for me:
如果您仍在努力清理角色,这对我来说效果很好:
import xlwings as xw
import pandas as pd
df = pd.read_pickle('C:\Users\User1\picked_DataFrame_notWriting.df')
topath = 'C:\Users\User1\tryAgain.xlsx'
wb = xw.Book(topath)
ws = wb.sheets['Data']
ws.range('A1').options(index=False).value = df
wb.save()
wb.close()
回答by miri
Just remove the illegal characters from your dataframe before exporting it into Excel.
在将其导出到 Excel 之前,只需从数据框中删除非法字符。
import pandas as pd
import re
import openpyxl
from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE
writer = pd.ExcelWriter(myexcelfilepath, engine='openpyxl')
# [optional] avoid pandas.DataFrame.to_excel overwritting your existing workbook
workbook = openpyxl.load_workbook(myexcelfilepath)
writer.book = workbook
# replace illegal characters in str or unicode value by ''
# using the regex ILLEGAL_CHARACTERS_RE string defined in openpyxl.cell.cell module
mydataframe = mydataframe.applymap(
lambda x: re.sub(ILLEGAL_CHARACTERS_RE, '', x)
if isinstance(x, str) or isinstance(x, unicode) else x)
# export your cleaned dataframe to excel
mydataframe.to_excel(writer, sheet_name='targetsheetname', index=False)
writer.close()