Python 具有缺失值的列子集的行平均
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34734940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Row-wise average for a subset of columns with missing values
提问by scrollex
I've got a 'DataFrame` which has occasional missing values, and looks something like this:
我有一个“DataFrame”,它偶尔会丢失值,看起来像这样:
Monday Tuesday Wednesday
================================================
Mike 42 NaN 12
Jenna NaN NaN 15
Jon 21 4 1
I'd like to add a new column
to my data frame where I'd calculate the average across all columns
for every row
.
我想column
在我的数据框中添加一个新的,我将计算columns
每个row
.
Meaning, for Mike
, I'd need
(df['Monday'] + df['Wednesday'])/2
, but for Jenna
, I'd simply use df['Wednesday amt.']/1
意思是,对于Mike
,我需要
(df['Monday'] + df['Wednesday'])/2
,但是对于Jenna
,我只是使用df['Wednesday amt.']/1
Does anyone know the best way to account for this variation that results from missing values and calculate the average?
有谁知道解决由缺失值引起的这种变化并计算平均值的最佳方法?
采纳答案by Stefan
You can simply:
您可以简单地:
df['avg'] = df.mean(axis=1)
Monday Tuesday Wednesday avg
Mike 42 NaN 12 27.000000
Jenna NaN NaN 15 15.000000
Jon 21 4 1 8.666667
because .mean()
ignores missing values by default: see docs.
因为.mean()
默认情况下忽略缺失值:请参阅 docs。
To select a subset, you can:
要选择子集,您可以:
df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)
Monday Tuesday Wednesday avg
Mike 42 NaN 12 42.0
Jenna NaN NaN 15 NaN
Jon 21 4 1 12.5
回答by Amir F
Alternative - using iloc (can also use loc here):
替代方案 - 使用 iloc(也可以在此处使用 loc):
df['avg'] = df.iloc[:,0:2].mean(axis=1)