多列的 Pandas Fillna 与每列的模式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42870536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Fillna of Multiple Columns with Mode of Each Column
提问by Nick
Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns. I can get the modes easily:
使用人口普查数据,我想用这两列的各自模式替换两列(“workclass”和“native-country”)中的 NaN。我可以轻松获得模式:
mode = df.filter(["workclass", "native-country"]).mode()
which returns a dataframe:
它返回一个数据帧:
workclass native-country
0 Private United-States
However,
然而,
df.filter(["workclass", "native-country"]).fillna(mode)
does notreplace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?
并不能取代与任何每一列的NaN的,更何况是对应于列模式。有没有一种顺利的方法来做到这一点?
回答by jezrael
If you want to impute missing values with the mode
in some columns a dataframe df
, you can just fillna
by Series
created by select by position by iloc
:
如果要归咎于与遗漏值mode
在一些列的数据框df
,你可以fillna
通过Series
按选择创建的位置由iloc
:
cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
Or:
或者:
df[cols]=df[cols].fillna(mode.iloc[0])
Your solution:
您的解决方案:
df[cols]=df.filter(cols).fillna(mode.iloc[0])
Sample:
样本:
df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],
'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'],
'col':[2,3,7,8,9]})
print (df)
col native-country workclass
0 2 United-States Private
1 3 NaN Private
2 7 Canada NaN
3 8 NaN another
4 9 United-States NaN
mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
workclass native-country
0 Private United-States
cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
print (df)
col native-country workclass
0 2 United-States Private
1 3 United-States Private
2 7 Canada Private
3 8 United-States another
4 9 United-States Private
回答by Miriam Farber
You can do it like that:
你可以这样做:
df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0])
For example,
例如,
import pandas as pd
d={
'key3': [1,4,4,4,5],
'key2': [6,6,4],
'key1': [6,4,4],
}
df=pd.DataFrame.from_dict(d,orient='index').transpose()
Then df
is
然后df
是
key3 key2 key1
0 1 6 6
1 4 6 4
2 4 4 4
3 4 NaN NaN
4 5 NaN NaN
Then by doing:
然后通过做:
l=df.filter(["key1", "key2"]).mode()
df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0])
we get that df
is
我们得到的df
是
key3 key2 key1
0 1 6 6
1 4 6 4
2 4 4 4
3 4 6 4
4 5 6 4
回答by Krishna
I think it's cleanest to use a dict as the fillna parameter 'value'
我认为使用 dict 作为填充参数“值”是最干净的
ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
参考:https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
create a toy df from @miriam-farber's response
根据@miriam-farber 的回复创建一个玩具 df
import pandas as pd
d={
'key3': [1,4,4,4,5],
'key2': [6,6,4],
'key1': [6,4,4],
}
d_df=pd.DataFrame.from_dict(d,orient='index').transpose()
create a dict
创建一个字典
mode_dict = d_df.loc[:,['key2','key1']].mode().to_dict('records')[0]
use this dict in fillna method
在 fillna 方法中使用此 dict
d_df.fillna(mode_dict, inplace=True)
回答by bhavesh singh
This code impute mean to the int columns and mode to the object columns making a list of both types of columns and imputing the missing value according to the conditions.
此代码将平均值归入 int 列,将模式归入对象列,生成两种类型的列的列表,并根据条件输入缺失值。
cateogry_columns=df.select_dtypes(include=['object']).columns.tolist()
integer_columns=df.select_dtypes(include=['int64','float64']).columns.tolist()
for column in df:
if df[column].isnull().any():
if(column in cateogry_columns):
df[column]=df[column].fillna(df[column].mode()[0])
else:
df[column]=df[column].fillna(df[column].mean)`