Pandas DataFrame - 用空白替换 NULL 字符串,用 0 替换 NULL 数字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52873804/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:05:44  来源:igfitidea点击:

Pandas DataFrame - Replace NULL String with Blank and NULL Numeric with 0

pythonpandas

提问by HMan06

I am working on a large dataset with many columns of different types. There are a mix of numeric values and strings with some NULL values. I need to change the NULL Value to Blank or 0 depending on the type.

我正在处理一个包含许多不同类型列的大型数据集。有一些数值和带有一些 NULL 值的字符串的混合。我需要根据类型将 NULL 值更改为空白或 0。

1   John   2    Doe   3   Mike   4    Orange   5   Stuff
9   NULL   NULL NULL  8   NULL   NULL Lemon    12  NULL

I want it to look like this,

我希望它看起来像这样

1   John   2    Doe   3   Mike   4    Orange   5   Stuff
9          0          8          0    Lemon    12  

I can do this for each individual, but since I am going to be pulling several extremely large datasets with hundreds of columns, I'd like to do this some other way.

我可以为每个人执行此操作,但由于我将提取具有数百列的多个超大数据集,因此我想以其他方式执行此操作。

Edit: Types from Smaller Dataset,

编辑:来自较小数据集的类型,

Field1              object
Field2              object
Field3              object
Field4              object
Field5              object
Field6              object
Field7              object
Field8              object
Field9              object
Field10              float64
Field11              float64
Field12              float64
Field13              float64
Field14              float64
Field15              object
Field16              float64
Field17              object
Field18              object
Field19              float64
Field20              float64
Field21              int64

回答by jezrael

Use DataFrame.select_dtypesfor numeric columns, filter by subset and replace values to 0, then repalce all another columns to empty string:

使用DataFrame.select_dtypes数字列,过滤通过子集和替换值0,然后repalce所有另一列空字符串:

print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9   NaN  NaN  NaN  8   NaN  NaN   Lemon  12    NaN

print (df.dtypes)
0      int64
1     object
2    float64
3     object
4      int64
5     object
6    float64
7     object
8      int64
9     object
dtype: object

c = df.select_dtypes(np.number).columns
df[c] = df[c].fillna(0)
df = df.fillna("")
print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9        0.0       8        0.0   Lemon  12       

Another solution is create dictionary for replace:

另一种解决方案是创建用于替换的字典:

num_cols = df.select_dtypes(np.number).columns
d1 = dict.fromkeys(num_cols, 0)
d2 = dict.fromkeys(df.columns.difference(num_cols), "")

d  = {**d1,  **d2}
print (d)
{0: 0, 2: 0, 4: 0, 6: 0, 8: 0, 1: '', 3: '', 5: '', 7: '', 9: ''}

df = df.fillna(d)
print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9        0.0       8        0.0   Lemon  12       

回答by Andrea

You could try this to substitute a different value for each different column (Ato Care numeric, while Dis a string):

您可以尝试使用此方法为每个不同的列替换不同的值(AtoC是数字,whileD是字符串):

import pandas as pd
import numpy as np

df_pd = pd.DataFrame([[np.nan, 2, np.nan, '0'],
        [3, 4, np.nan, '1'],
        [np.nan, np.nan, np.nan, '5'],
        [np.nan, 3, np.nan, np.nan]],
        columns=list('ABCD'))

df_pd.fillna(value={'A':0.0,'B':0.0,'C':0.0,'D':''})