多列的 Pandas Fillna 与每列的模式

Question

提问by Nick

Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns. I can get the modes easily:

使用人口普查数据，我想用这两列的各自模式替换两列（“workclass”和“native-country”）中的 NaN。我可以轻松获得模式：

mode = df.filter(["workclass", "native-country"]).mode()

which returns a dataframe:

它返回一个数据帧：

  workclass native-country
0   Private  United-States

However,

然而，

df.filter(["workclass", "native-country"]).fillna(mode)

does notreplace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?

并不能取代与任何每一列的NaN的，更何况是对应于列模式。有没有一种顺利的方法来做到这一点？

Answer 1

回答by jezrael

If you want to impute missing values with the modein some columns a dataframe df, you can just fillnaby Seriescreated by select by position by iloc:

如果要归咎于与遗漏值mode在一些列的数据框df，你可以fillna通过Series按选择创建的位置由iloc：

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])

Or:

或者：

df[cols]=df[cols].fillna(mode.iloc[0])

Your solution:

您的解决方案：

df[cols]=df.filter(cols).fillna(mode.iloc[0])

Sample:

样本：

df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],
                   'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'],
                   'col':[2,3,7,8,9]})

print (df)
   col native-country workclass
0    2  United-States   Private
1    3            NaN   Private
2    7         Canada       NaN
3    8            NaN   another
4    9  United-States       NaN

mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
  workclass native-country
0   Private  United-States

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
print (df)
   col native-country workclass
0    2  United-States   Private
1    3  United-States   Private
2    7         Canada   Private
3    8  United-States   another
4    9  United-States   Private

Answer 2

回答by Miriam Farber

You can do it like that:

你可以这样做：

df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0])

For example,

例如，

    import pandas as pd
d={
    'key3': [1,4,4,4,5],
    'key2': [6,6,4],
    'key1': [6,4,4],
}

df=pd.DataFrame.from_dict(d,orient='index').transpose()

Then dfis

然后df是

  key3  key2    key1
0   1   6       6
1   4   6       4
2   4   4       4
3   4   NaN     NaN
4   5   NaN     NaN

Then by doing:

然后通过做：

l=df.filter(["key1", "key2"]).mode()
df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0])

we get that dfis

我们得到的df是

  key3  key2    key1
0   1   6        6
1   4   6        4
2   4   4        4
3   4   6        4
4   5   6        4

Answer 3

回答by Krishna

I think it's cleanest to use a dict as the fillna parameter 'value'

我认为使用 dict 作为填充参数“值”是最干净的

ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

参考：https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

create a toy df from @miriam-farber's response

根据@miriam-farber 的回复创建一个玩具 df

import pandas as pd
d={
    'key3': [1,4,4,4,5],
    'key2': [6,6,4],
    'key1': [6,4,4],
}

d_df=pd.DataFrame.from_dict(d,orient='index').transpose()

create a dict

创建一个字典

mode_dict = d_df.loc[:,['key2','key1']].mode().to_dict('records')[0]

use this dict in fillna method

在 fillna 方法中使用此 dict

d_df.fillna(mode_dict, inplace=True)

Answer 4

回答by bhavesh singh

This code impute mean to the int columns and mode to the object columns making a list of both types of columns and imputing the missing value according to the conditions.

此代码将平均值归入 int 列，将模式归入对象列，生成两种类型的列的列表，并根据条件输入缺失值。

cateogry_columns=df.select_dtypes(include=['object']).columns.tolist()
integer_columns=df.select_dtypes(include=['int64','float64']).columns.tolist()

for column in df:
    if df[column].isnull().any():
        if(column in cateogry_columns):
            df[column]=df[column].fillna(df[column].mode()[0])
        else:
            df[column]=df[column].fillna(df[column].mean)`

多列的 Pandas Fillna 与每列的模式

提问by Nick

回答by jezrael

回答by Miriam Farber

回答by Krishna

回答by bhavesh singh

相关推荐

最近更新

标签

多列的 Pandas Fillna 与每列的模式

提问by Nick

回答by jezrael

回答by Miriam Farber

回答by Krishna

回答by bhavesh singh

相关推荐

pandas 读取多个csv文件并将文件名添加为pandas中的新列

Pandas DataFrame 多列的并排箱线图

pandas 如何通过一些标签从大叶地图组上的熊猫数据框中绘制纬度和经度

pandas 在熊猫中添加日期

相关推荐

最近更新

标签