pandas 如何计算数据帧行的标准偏差?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38361022/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:34:36  来源:igfitidea点击:

How I can calculate standard deviation for rows of a dataframe?

pythonnumpypandas

提问by NamAshena

df:  

name   group   S1   S2  S3        
A      mn      1    2   8         
B      mn      4    3   5        
C      kl      5    8   2        
D      kl      6    5   5         
E      fh      7    1   3         

output: 

std (S1,S2,S3)
3.78
1
3
0.57
3.05

This is working for getting std for a column:

这适用于获取列的 std:

numpy.std(df['A'])

I want to do the same for rows

我想对行做同样的事情

回答by jezrael

You can use DataFrame.std, which omit non numeric columns:

您可以使用DataFrame.std,它省略了非数字列:

print (df.std())
S1    2.302173
S2    2.774887
S3    2.302173
dtype: float64

If need stdby columns:

如果需要std按列:

print (df.std(axis=1))
0    3.785939
1    1.000000
2    3.000000
3    0.577350
4    3.055050
dtype: float64

If need select only some numeric columns, use subset:

如果只需要选择一些数字列,使用子集:

print (df[['S1','S2']].std())
S1    2.302173
S2    2.774887
dtype: float64

There is different with numpy.stdby default parameter ddof(Delta Degrees of Freedom):

numpy.std默认参数ddof(Delta degree of Freedom)不同:

  • pandas by default ddof=1
  • numpy by default ddof=0
  • 默认Pandas ddof=1
  • 默认为 numpy ddof=0

So there are different outputs:

所以有不同的输出:

#ddof=1
print (df.std(axis=1))
0    3.785939
1    1.000000
2    3.000000
3    0.577350
4    3.055050
dtype: float64

#ddof=0
print (np.std(df, axis=1))
0    3.091206
1    0.816497
2    2.449490
3    0.471405
4    2.494438
dtype: float64

But you can change it very easy:

但是你可以很容易地改变它:

#same output as pandas function
print (np.std(df, ddof=1, axis=1))
0    3.785939
1    1.000000
2    3.000000
3    0.577350
4    3.055050
dtype: float64

#same output as numpy function
print (df.std(ddof=0, axis=1))
0    3.091206
1    0.816497
2    2.449490
3    0.471405
4    2.494438
dtype: float64   

回答by Stefano Fedele

When you can not do on rows whatever you can do on column you may use "transpose"

当你不能在行上做任何你可以在列上做的事情时,你可以使用“转置”

np.std( df.transpose()['S1'] )