pandas 如何删除非法字符以便数据框可以写入 Excel

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42306755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:00:46  来源:igfitidea点击:

How to remove illegal characters so a dataframe can write to Excel

pandasexport-to-excel

提问by user4896331

I am trying to write a dataframe to an Excel spreadsheet using ExcelWriter, but it keeps returning an error:

我正在尝试使用 ExcelWriter 将数据框写入 Excel 电子表格,但它一直返回错误:

openpyxl.utils.exceptions.IllegalCharacterError

I'm guessing there's some character in the dataframe that ExcelWriter doesn't like. It seems odd, because the dataframe is formed from three Excel spreadsheets, so I can't see how there could be a character that Excel doesn't like!

我猜数据框中有一些 ExcelWriter 不喜欢的字符。看起来很奇怪,因为数据框是由三个 Excel 电子表格组成的,所以我看不出怎么会有 Excel 不喜欢的字符!

Is there any way to iterate through a dataframe and replace characters that ExcelWriter doesn't like? I don't even mind if it simply deletes them.

有没有办法遍历数据框并替换 ExcelWriter 不喜欢的字符?我什至不介意它是否只是删除它们。

What's the best way or removing or replacing illegal characters from a dataframe?

从数据框中删除或替换非法字符的最佳方法是什么?

回答by user4896331

Based on Haipeng Su's answer, I added a function that does this:

根据 Haipeng Su 的回答,我添加了一个执行此操作的函数:

dataframe = dataframe.applymap(lambda x: x.encode('unicode_escape').
                 decode('utf-8') if isinstance(x, str) else x)

Basically, it escapes the unicode characters if they exist. It worked and I can now write to Excel spreadsheets again!

基本上,它会转义 unicode 字符(如果它们存在)。它奏效了,我现在可以再次写入 Excel 电子表格了!

回答by Jialin Zou

try a different excel writer engine solved my problem.

尝试不同的 excel 编写器引擎解决了我的问题。

writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')

回答by mathsyouth

The same problem happened to me. I solved it as follows:

同样的问题发生在我身上。我是这样解决的:

  1. install python package xlsxwriter:
  1. 安装python包xlsxwriter:
pip install xlsxwriter
  1. replace the default engine 'openpyxl' with 'xlsxwriter':
  1. 用“xlsxwriter”替换默认引擎“openpyxl”:
dataframe.to_excel("file.xlsx", engine='xlsxwriter')

回答by Haipeng Su

I was also struggling with some weird characters in a data frame when writing the data frame to html or csv. For example, for characters with accent, I can't write to html file, so I need to convert the characters into characters without the accent.

在将数据框写入 html 或 csv 时,我也在数据框中遇到了一些奇怪的字符。例如,对于带重音的字符,我无法写入html文件,因此我需要将字符转换为不带重音的字符。

My method may not be the best, but it helps me to convert unicodestring into asciicompatible.

我的方法可能不是最好的,但它帮助我将unicode字符串转换为ascii兼容的。

# install unidecode first 
from unidecode import unidecode

def FormatString(s):
if isinstance(s, unicode):
  try:
    s.encode('ascii')
    return s
  except:
    return unidecode(s)
else:
  return s

df2 = df1.applymap(FormatString) 

In your situation, if you just want to get rid of the illegal characters by changing return unidecode(s)to return 'StringYouWantToReplace'.

在您的情况下,如果您只想通过更改return unidecode(s)return 'StringYouWantToReplace'.

Hope this can give me some ideas to deal with your problems.

希望这可以给我一些想法来处理您的问题。

回答by REdim.Learning

If you're still struggling to clean up the characters, this worked well for me:

如果您仍在努力清理角色,这对我来说效果很好:

import xlwings as xw
import pandas as pd
df = pd.read_pickle('C:\Users\User1\picked_DataFrame_notWriting.df')
topath = 'C:\Users\User1\tryAgain.xlsx'
wb = xw.Book(topath)
ws = wb.sheets['Data']
ws.range('A1').options(index=False).value = df
wb.save()
wb.close()

回答by miri

Just remove the illegal characters from your dataframe before exporting it into Excel.

在将其导出到 Excel 之前,只需从数据框中删除非法字符。

import pandas as pd
import re
import openpyxl
from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE


writer = pd.ExcelWriter(myexcelfilepath, engine='openpyxl')

# [optional] avoid pandas.DataFrame.to_excel overwritting your existing workbook 
workbook = openpyxl.load_workbook(myexcelfilepath)
writer.book = workbook

# replace illegal characters in str or unicode value by '' 
# using the regex ILLEGAL_CHARACTERS_RE string defined in openpyxl.cell.cell module
mydataframe = mydataframe.applymap(
               lambda x: re.sub(ILLEGAL_CHARACTERS_RE, '', x) 
               if isinstance(x, str) or isinstance(x, unicode) else x)

# export your cleaned dataframe to excel
mydataframe.to_excel(writer, sheet_name='targetsheetname', index=False)
writer.close()