Python 如何获得数据框列值的平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16689514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to get the average of dataframe column values
提问by badideas
A B
DATE
2013-05-01 473077 71333
2013-05-02 35131 62441
2013-05-03 727 27381
2013-05-04 481 1206
2013-05-05 226 1733
2013-05-06 NaN 4064
2013-05-07 NaN 41151
2013-05-08 NaN 8144
2013-05-09 NaN 23
2013-05-10 NaN 10
say i have the dataframe above. what is the easiest way to get a series with the same index which is the average of the columns A and B? the average needs to ignore NaN values. the twist is that this solution needs to be flexible to the addition of new columns to the dataframe.
说我有上面的数据框。获得具有相同索引(即 A 列和 B 列的平均值)的系列的最简单方法是什么?平均值需要忽略 NaN 值。扭曲的是,该解决方案需要灵活地向数据帧添加新列。
the closest i have come was
我最接近的是
df.sum(axis=1) / len(df.columns)
however, this does not seem to ignore the NaN values
然而,这似乎并没有忽略 NaN 值
(note: i am still a bit new to the pandas library, so i'm guessing there's an obvious way to do this that my limited brain is simply not seeing)
(注意:我对 Pandas 库还是有点陌生,所以我猜有一种明显的方法可以做到这一点,而我有限的大脑根本看不到)
采纳答案by DSM
Simply using df.mean()will Do The Right Thing(tm) with respect to NaNs:
简单地使用df.mean()will Do The Right Thing(tm) 就 NaN 而言:
>>> df
A B
DATE
2013-05-01 473077 71333
2013-05-02 35131 62441
2013-05-03 727 27381
2013-05-04 481 1206
2013-05-05 226 1733
2013-05-06 NaN 4064
2013-05-07 NaN 41151
2013-05-08 NaN 8144
2013-05-09 NaN 23
2013-05-10 NaN 10
>>> df.mean(axis=1)
DATE
2013-05-01 272205.0
2013-05-02 48786.0
2013-05-03 14054.0
2013-05-04 843.5
2013-05-05 979.5
2013-05-06 4064.0
2013-05-07 41151.0
2013-05-08 8144.0
2013-05-09 23.0
2013-05-10 10.0
dtype: float64
You can use df[["A", "B"]].mean(axis=1)if there are other columns to ignore.
df[["A", "B"]].mean(axis=1)如果还有其他要忽略的列,您可以使用。

