Python 如何获得数据框列值的平均值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16689514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:22:45  来源:igfitidea点击:

how to get the average of dataframe column values

pythonpandasdataframe

提问by badideas

                    A        B
DATE                 
2013-05-01        473077    71333
2013-05-02         35131    62441
2013-05-03           727    27381
2013-05-04           481     1206
2013-05-05           226     1733
2013-05-06           NaN     4064
2013-05-07           NaN    41151
2013-05-08           NaN     8144
2013-05-09           NaN       23
2013-05-10           NaN       10

say i have the dataframe above. what is the easiest way to get a series with the same index which is the average of the columns A and B? the average needs to ignore NaN values. the twist is that this solution needs to be flexible to the addition of new columns to the dataframe.

说我有上面的数据框。获得具有相同索引(即 A 列和 B 列的平均值)的系列的最简单方法是什么?平均值需要忽略 NaN 值。扭曲的是,该解决方案需要灵活地向数据帧添加新列。

the closest i have come was

我最接近的是

df.sum(axis=1) / len(df.columns)

however, this does not seem to ignore the NaN values

然而,这似乎并没有忽略 NaN 值

(note: i am still a bit new to the pandas library, so i'm guessing there's an obvious way to do this that my limited brain is simply not seeing)

(注意:我对 Pandas 库还是有点陌生​​,所以我猜有一种明显的方法可以做到这一点,而我有限的大脑根本看不到)

采纳答案by DSM

Simply using df.mean()will Do The Right Thing(tm) with respect to NaNs:

简单地使用df.mean()will Do The Right Thing(tm) 就 NaN 而言:

>>> df
                 A      B
DATE                     
2013-05-01  473077  71333
2013-05-02   35131  62441
2013-05-03     727  27381
2013-05-04     481   1206
2013-05-05     226   1733
2013-05-06     NaN   4064
2013-05-07     NaN  41151
2013-05-08     NaN   8144
2013-05-09     NaN     23
2013-05-10     NaN     10
>>> df.mean(axis=1)
DATE
2013-05-01    272205.0
2013-05-02     48786.0
2013-05-03     14054.0
2013-05-04       843.5
2013-05-05       979.5
2013-05-06      4064.0
2013-05-07     41151.0
2013-05-08      8144.0
2013-05-09        23.0
2013-05-10        10.0
dtype: float64

You can use df[["A", "B"]].mean(axis=1)if there are other columns to ignore.

df[["A", "B"]].mean(axis=1)如果还有其他要忽略的列,您可以使用。