Python Pandas:计算整个数据帧的均值或标准差(标准差)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25140998/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:47:10  来源:igfitidea点击:

Pandas : compute mean or std (standard deviation) over entire dataframe

pythonnumpypandas

提问by jrjc

Here is my problem, I have a dataframe like this :

这是我的问题,我有一个这样的数据框:

    Depr_1  Depr_2  Depr_3
S3  0   5   9
S2  4   11  8
S1  6   11  12
S5  0   4   11
S4  4   8   8

and I just want to calculate the mean over the full dataframe, as the following doesn't work :

我只想计算整个数据帧的平均值,因为以下不起作用:

df.mean()

Then I came up with :

然后我想出了:

df.mean().mean()

But this trick won't work for computing the standard deviation. My final attempts were :

但是这个技巧不适用于计算标准偏差。我最后的尝试是:

df.get_values().mean()
df.get_values().std()

Except that in the latter case, it uses mean() and std() function from numpy. It's not a problem for the mean, but it is for std, as the pandas function uses by default ddof=1, unlike the numpy one where ddof=0.

除了后一种情况,它使用来自 numpy 的 mean() 和 std() 函数。这对于均值来说不是问题,但对于 std 来说却是问题,因为 pandas 函数默认使用ddof=1,与 numpy函数不同,其中ddof=0.

采纳答案by JohnE

You could convert the dataframe to be a single column with stack(this changes the shape from 5x3 to 15x1) and then take the standard deviation:

您可以将数据stack框转换为单列(这会将形状从 5x3 更改为 15x1),然后取标准偏差:

df.stack().std()         # pandas default degrees of freedom is one

Alternatively, you can use valuesto convert from a pandas dataframe to a numpy array before taking the standard deviation:

或者,您可以values在获取标准偏差之前使用从 pandas 数据帧转换为 numpy 数组:

df.values.std(ddof=1)    # numpy default degrees of freedom is zero

Unlike pandas, numpy will give the standard deviation of the entire array by default, so there is no need to reshape before taking the standard deviation.

与pandas不同的是,numpy默认会给出整个数组的标准差,所以在取标准差之前不需要reshape。

A couple of additional notes:

一些额外的注意事项:

  • The numpy approach here is a bit faster than the pandas one, which is generally true when you have the option to accomplish the same thing with either numpy or pandas. The speed difference will depend on the size of your data, but numpy was roughly 10x faster when I tested a few different sized dataframes on my laptop (numpy version 1.15.4 and pandas version 0.23.4).

  • The numpy and pandas approaches here will not give exactly the same answers, but will be extremely close (identical at several digits of precision). The discrepancy is due to slight differences in implementation behind the scenes that affect how the floating point values get rounded.

  • 这里的 numpy 方法比 pandas 方法快一点,当您可以选择使用 numpy 或 pandas 完成相同的事情时,这通常是正确的。速度差异将取决于您的数据大小,但是当我在笔记本电脑(numpy 版本 1.15.4 和 Pandas 版本 0.23.4)上测试几个不同大小的数据帧时,numpy 大约快 10 倍。

  • 这里的 numpy 和 pandas 方法不会给出完全相同的答案,但会非常接近(在几位精度上相同)。这种差异是由于幕后实现的细微差异会影响浮点值的舍入方式。