pandas 与 scipy 中的偏斜和峰态函数有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33109107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the difference between skew and kurtosis functions in pandas vs. scipy?
提问by lin_bug
I decided to compare skew and kurtosis functions in pandas and scipy.stats, and don't understand why I'm getting different results between libraries.
我决定比较 pandas 和 scipy.stats 中的偏斜和峰度函数,但不明白为什么我在库之间得到不同的结果。
As far as I can tell from the documentation, both kurtosis functions compute using Fisher's definition, whereas for skew there doesn't seem to be enough of a description to tell if there any major differences with how they are computed.
据我从文档中可以看出,两个峰度函数都使用 Fisher 的定义进行计算,而对于偏斜,似乎没有足够的描述来说明它们的计算方式是否存在重大差异。
import pandas as pd
import scipy.stats.stats as st
heights = np.array([1.46, 1.79, 2.01, 1.75, 1.56, 1.69, 1.88, 1.76, 1.88, 1.78])
print "skewness:", st.skew(heights)
print "kurtosis:", st.kurtosis(heights)
this returns:
这将返回:
skewness: -0.393524456473
kurtosis: -0.330672097724
whereas if I convert to a pandas dataframe:
而如果我转换为Pandas数据框:
heights_df = pd.DataFrame(heights)
print "skewness:", heights_df.skew()
print "kurtosis:", heights_df.kurtosis()
this returns:
这将返回:
skewness: 0 -0.466663
kurtosis: 0 0.379705
Apologies if I've posted this in the wrong place; not sure if it's a stats or a programming question.
抱歉,如果我在错误的地方发布了这篇文章;不确定这是统计数据还是编程问题。
回答by BrenBarn
The difference is due to different normalizations. Scipy by default does not correct for bias, whereas pandas does.
差异是由于不同的归一化。默认情况下,Scipy 不会校正偏差,而 Pandas 会。
You can tell scipy to correct for bias by passing the bias=Falseargument:
您可以通过传递bias=False参数告诉 scipy 纠正偏差:
>>> x = pandas.Series(np.random.randn(10))
>>> stats.skew(x)
-0.17644348972413657
>>> x.skew()
-0.20923623968879457
>>> stats.skew(x, bias=False)
-0.2092362396887948
>>> stats.kurtosis(x)
0.6362620964462327
>>> x.kurtosis()
2.0891062062174464
>>> stats.kurtosis(x, bias=False)
2.089106206217446
There does not appear to be a way to tell pandas to remove the bias correction.
似乎没有办法告诉Pandas删除偏差校正。

