Pandas 在保存为 CSV 时更改 NaN 值的格式

Question

提问by Jerry

I am working with a df and using numpy to transform data - including setting blanks (or '') to NaN. But when I write the df to csv - the output contains the string 'nan' as oppose to being NULL.

我正在使用 df 并使用 numpy 来转换数据 - 包括将空白（或 ''）设置为 NaN。但是当我将 df 写入 csv 时 - 输出包含字符串 'nan' 作为反对为 NULL。

I have looked around but can't find a workable solution. Here's the basic issue:

我环顾四周，但找不到可行的解决方案。这是基本问题：

df
index x    y   z
0     1   NaN  2
1     NaN  3   4

CSV output:

CSV 输出：

index x    y   z
0     1   nan  2
1     nan  3   4

I have tried a few things to set 'nan' to NULL but the csv output results in a 'blank' rather than NULL:

我尝试了一些方法将“nan”设置为 NULL，但 csv 输出结果为“空白”而不是 NULL：

dfDemographics = dfDemographics.replace('nan', np.NaN)
dfDemographics.replace(r'\s+( +\.)|#', np.nan, regex=True).replace('', 
np.nan)
dfDemographics = dfDemographics.replace('nan', '')  # of course, this wouldn't work, but tried it anyway.

Any help would be appreciated.

任何帮助，将不胜感激。

Answer 1

回答by cs95

Pandas to the rescue, use na_repto fix your own representation for NaNs.

Pandas救援，用于na_rep修复您自己的 NaN 表示。

df.to_csv('file.csv', na_rep='NULL')

file.csv

,index,x,y,z
0,0,1.0,NULL,2
1,1,NULL,3.0,4

Answer 2

回答by Kranthi Kiran

Using df.replace may help -

使用 df.replace 可能会有所帮助 -

df = df.replace(np.nan, '', regex=True)
df.to_csv("df.csv", index=False)

(This sets all the null values to '' i.e empty string.)

（这会将所有空值设置为 '' 即空字符串。）

Answer 3

回答by Good Will

User @coldspeed illustrates how to replace nan values with NULL when save pd.DataFrame. In case, for data analysis, one is interested in replacing the "NULL" values in pd.DataFrame with np.NaN values, the following code will do:

用户@coldspeed 说明了如何在保存 pd.DataFrame 时将 nan 值替换为 NULL。如果为了数据分析，有兴趣用 np.NaN 值替换 pd.DataFrame 中的“NULL”值，以下代码将执行：

import numpy as np, pandas as pd

# replace NULL values with np.nan
colNames = mydf.columns.tolist()
dfVals = mydf.values
matSyb = mydf.isnull().values
dfVals[matSyb] = np.NAN

mydf = pd.DataFrame(dfVals, columns=colNames)    
#np.nansum(mydf.values, axis=0 )
#np.nansum(dfVals, axis=0 )

Answer 4

回答by gherka

In my situation, the culprit was np.where. When the data types of the two return elements are different, then your np.NaNwill be converted to a nan.

在我的情况下，罪魁祸首是np.where。当两个返回元素的数据类型不同时，则您的np.NaN将被转换为nan.

It's hard (for me) to see exactly what's going on under the hood, but I suspect this might be true for other Numpy array methods that have mixed types.

（对我来说）很难确切地看到幕后发生了什么，但我怀疑这可能适用于其他具有混合类型的 Numpy 数组方法。

A minimal example:

一个最小的例子：

import numpy as np
import pandas as pd

seq = [1, 2, 3, 4, np.NaN]
same_type_seq = np.where("parrot"=="dead", 0, seq)
diff_type_seq = np.where("parrot"=="dead", "spam", seq)

pd.Series(seq).to_csv("vanilla_nan.csv", header=False) # as expected, last row is blank
pd.Series(same_type_seq).to_csv("samey_nan.csv", header=False) # also, blank
pd.Series(diff_type_seq).to_csv("nany_nan.csv", header=False) # nan instead of blank

So how to get round this? I'm not too sure, but as a hacky workaround for small datasets, you can replace NaNin your original sequence with a token string and then replace it back to np.NaN

那么如何解决这个问题呢？我不太确定，但作为小数据集的一种hacky 解决方法，您可以用NaN标记字符串替换原始序列，然后将其替换回np.NaN

repl = "missing"
hacky_seq = np.where("parrot"=="dead", "spam", [repl if np.isnan(x) else x for x in seq])
pd.Series(hacky_seq).replace({repl:np.NaN}).to_csv("hacky_nan.csv", header=False)

Pandas 在保存为 CSV 时更改 NaN 值的格式

提问by Jerry

回答by cs95

回答by Kranthi Kiran

回答by Good Will

回答by gherka

相关推荐

最近更新

标签

Pandas 在保存为 CSV 时更改 NaN 值的格式

提问by Jerry

回答by cs95

回答by Kranthi Kiran

回答by Good Will

回答by gherka

相关推荐

pandas 将包含列表的列拆分为熊猫中的不同行

pandas 即使文件存在，文件 b'train.csv' 也不存在

pandas 在熊猫数据框中将单元格拆分为多行

pandas 带有虚拟/分类变量的线性回归

相关推荐

最近更新

标签