Python Pandas read_excel dtype str 在读取或通过 to_csv 写入时将 nan 替换为空白('')
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45148292/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas read_excel dtype str replace nan by blank ('') when reading or when writing via to_csv
提问by panda
Python version: Python 2.7.13 :: Anaconda custom (64-bit) Pandas version: pandas 0.20.2
Python 版本:Python 2.7.13 :: Anaconda 自定义(64 位)Pandas 版本:pandas 0.20.2
Hello,
你好,
I have a quite simple requirement. I would like to read an excel file and write a specific sheet to a csv file. Blank values in the source Excel file should be treated / written as blank when writing the csv file. However, my blank records are always written as 'nan' to the output file. (without the quotes)
我有一个很简单的要求。我想读取一个 excel 文件并将特定的工作表写入一个 csv 文件。写入 csv 文件时,源 Excel 文件中的空白值应视为/写入空白。但是,我的空白记录总是以“nan”的形式写入输出文件。(没有引号)
I read the Excel file via method
我通过方法读取了 Excel 文件
read_excel(xlsx, sheetname='sheet1', dtype = str)
read_excel(xlsx, sheetname='sheet1', dtype = str)
I am specifying dtype because I have some columns that are numbers but should be treated as string. (Otherwise they might lose leading 0s etc) i.e. I would like to read the exact value from every cell.
我指定 dtype 是因为我有一些列是数字但应该被视为字符串。(否则他们可能会丢失前导 0 等)即我想从每个单元格中读取确切的值。
Now I write the output .csv file via to_csv(output_file,index=False,mode='wb',sep=',',encoding='utf-8')
现在我通过to_csv(output_file,index=False,mode='wb',sep=',',encoding='utf-8')编写输出 .csv 文件
However, my result csv file contains nan for all blank cells from the excel file.
但是,我的结果 csv 文件包含来自 excel 文件的所有空白单元格的 nan 。
What am I missing? I already tried .fillna('', inplace=True) function but it seems to be doing nothing to my data. I also tried to add parameter na_rep ='' to the to_csv method but without success.
我错过了什么?我已经尝试过 .fillna('', inplace=True) 函数,但它似乎对我的数据没有任何作用。我还尝试将参数 na_rep ='' 添加到 to_csv 方法,但没有成功。
Thanks for any help!
谢谢你的帮助!
Addendum: Please find hereafter a reproducible example.
附录:请在下文中找到可重现的示例。
Please find hereafter a reproducible example code.
Please first create a new Excel file with 2 columns with the following content:
COLUMNA COLUMNB COLUMNC
01 test
02 test
03 test
请在下文中找到可重现的示例代码。请首先创建一个包含以下内容的 2 列的新 Excel 文件:COLUMNA COLUMNB COLUMNC 01 test 02 test
03 test
(I saved this Excel file to c:\test.xls Please note that 1st and 3rd row for column B as well as the 2nd row for Column C is blank/empty)
(我将此 Excel 文件保存到 c:\test.xls 请注意,B 列的第一行和第三行以及 C 列的第二行是空白/空的)
Now here is my code:
现在这是我的代码:
import pandas as pd
xlsx = pd.ExcelFile('c:\test.xlsx')
df = pd.read_excel(xlsx, sheetname='Sheet1', dtype = str)
df.fillna('', inplace=True)
df.to_csv('c:\test.csv', index=False,mode='wb',sep=',',encoding='utf-8', na_rep ='')
My result is:
COLUMNA,COLUMNB,COLUMNC
01,nan,test
02,test,nan
03,nan,test
我的结果是:
COLUMNA,COLUMNB,COLUMNC
01,nan,test
02,test,nan
03,nan,test
My desired result would be:
COLUMNA,COLUMNB,COLUMNC
01,,test
02,test,
03,,test
我想要的结果是:
COLUMNA,COLUMNB,COLUMNC
01,,test
02,test,
03,,test
采纳答案by cs95
Since you are dealing with nan
strings, you can the replace
function:
由于您正在处理nan
字符串,因此您可以使用以下replace
函数:
df = pd.DataFrame({'Col1' : ['nan', 'foo', 'bar', 'baz', 'nan', 'test']})
df.replace('nan', '')
Col1
0
1 foo
2 bar
3 baz
4
5 test
All 'nan'
string values will be replaced by the empty string ''
. replace
is not in-place, so make sure you assign it back:
所有'nan'
字符串值都将替换为空字符串''
。replace
未就位,因此请确保将其分配回:
df = df.replace('nan', '')
You can then write it to your file using to_csv
.
然后,您可以使用to_csv
.
If you are actually looking to fill NaN values with blank, use fillna
:
如果您实际上想用空白填充 NaN 值,请使用fillna
:
df = df.fillna('')