Python Pandas read_excel dtype str 在读取或通过 to_csv 写入时将 nan 替换为空白('')

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45148292/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:44:59  来源:igfitidea点击:

Python Pandas read_excel dtype str replace nan by blank ('') when reading or when writing via to_csv

pythonexcelcsvpandasnan

提问by panda

Python version: Python 2.7.13 :: Anaconda custom (64-bit) Pandas version: pandas 0.20.2

Python 版本:Python 2.7.13 :: Anaconda 自定义(64 位)Pandas 版本:pandas 0.20.2

Hello,

你好,

I have a quite simple requirement. I would like to read an excel file and write a specific sheet to a csv file. Blank values in the source Excel file should be treated / written as blank when writing the csv file. However, my blank records are always written as 'nan' to the output file. (without the quotes)

我有一个很简单的要求。我想读取一个 excel 文件并将特定的工作表写入一个 csv 文件。写入 csv 文件时,源 Excel 文件中的空白值应视为/写入空白。但是,我的空白记录总是以“nan”的形式写入输出文件。(没有引号)

I read the Excel file via method

我通过方法读取了 Excel 文件

read_excel(xlsx, sheetname='sheet1', dtype = str)

read_excel(xlsx, sheetname='sheet1', dtype = str)

I am specifying dtype because I have some columns that are numbers but should be treated as string. (Otherwise they might lose leading 0s etc) i.e. I would like to read the exact value from every cell.

我指定 dtype 是因为我有一些列是数字但应该被视为字符串。(否则他们可能会丢失前导 0 等)即我想从每个单元格中读取确切的值。

Now I write the output .csv file via to_csv(output_file,index=False,mode='wb',sep=',',encoding='utf-8')

现在我通过to_csv(output_file,index=False,mode='wb',sep=',',encoding='utf-8')编写输出 .csv 文件

However, my result csv file contains nan for all blank cells from the excel file.

但是,我的结果 csv 文件包含来自 excel 文件的所有空白单元格的 nan 。

What am I missing? I already tried .fillna('', inplace=True) function but it seems to be doing nothing to my data. I also tried to add parameter na_rep ='' to the to_csv method but without success.

我错过了什么?我已经尝试过 .fillna('', inplace=True) 函数,但它似乎对我的数据没有任何作用。我还尝试将参数 na_rep ='' 添加到 to_csv 方法,但没有成功。

Thanks for any help!

谢谢你的帮助!

Addendum: Please find hereafter a reproducible example.

附录:请在下文中找到可重现的示例。

Please find hereafter a reproducible example code. Please first create a new Excel file with 2 columns with the following content: COLUMNA COLUMNB COLUMNC 01 test 02 test
03 test

请在下文中找到可重现的示例代码。请首先创建一个包含以下内容的 2 列的新 Excel 文件:COLUMNA COLUMNB COLUMNC 01 test 02 test
03 test

(I saved this Excel file to c:\test.xls Please note that 1st and 3rd row for column B as well as the 2nd row for Column C is blank/empty)

(我将此 Excel 文件保存到 c:\test.xls 请注意,B 列的第一行和第三行以及 C 列的第二行是空白/空的)

Now here is my code:

现在这是我的代码:

import pandas as pd
xlsx = pd.ExcelFile('c:\test.xlsx')
df = pd.read_excel(xlsx, sheetname='Sheet1', dtype = str)
df.fillna('', inplace=True)
df.to_csv('c:\test.csv', index=False,mode='wb',sep=',',encoding='utf-8', na_rep ='')

My result is:
COLUMNA,COLUMNB,COLUMNC
01,nan,test
02,test,nan
03,nan,test

我的结果是:
COLUMNA,COLUMNB,COLUMNC
01,nan,test
02,test,nan
03,nan,test

My desired result would be:
COLUMNA,COLUMNB,COLUMNC
01,,test
02,test,
03,,test

我想要的结果是:
COLUMNA,COLUMNB,COLUMNC
01,,test
02,test,
03,,test

采纳答案by cs95

Since you are dealing with nanstrings, you can the replacefunction:

由于您正在处理nan字符串,因此您可以使用以下replace函数:

df = pd.DataFrame({'Col1' : ['nan', 'foo', 'bar', 'baz', 'nan', 'test']})
df.replace('nan', '')

   Col1
0      
1   foo
2   bar
3   baz
4      
5  test

All 'nan'string values will be replaced by the empty string ''. replaceis not in-place, so make sure you assign it back:

所有'nan'字符串值都将替换为空字符串''replace未就位,因此请确保将其分配回:

df = df.replace('nan', '')

You can then write it to your file using to_csv.

然后,您可以使用to_csv.



If you are actually looking to fill NaN values with blank, use fillna:

如果您实际上想用空白填充 NaN 值,请使用fillna

df = df.fillna('')