pandas 与 scipy 中的偏斜和峰态函数有什么区别？

Question

提问by lin_bug

I decided to compare skew and kurtosis functions in pandas and scipy.stats, and don't understand why I'm getting different results between libraries.

我决定比较 pandas 和 scipy.stats 中的偏斜和峰度函数，但不明白为什么我在库之间得到不同的结果。

As far as I can tell from the documentation, both kurtosis functions compute using Fisher's definition, whereas for skew there doesn't seem to be enough of a description to tell if there any major differences with how they are computed.

据我从文档中可以看出，两个峰度函数都使用 Fisher 的定义进行计算，而对于偏斜，似乎没有足够的描述来说明它们的计算方式是否存在重大差异。

import pandas as pd
import scipy.stats.stats as st

heights = np.array([1.46, 1.79, 2.01, 1.75, 1.56, 1.69, 1.88, 1.76, 1.88, 1.78])

print "skewness:", st.skew(heights)
print "kurtosis:", st.kurtosis(heights)

this returns:

这将返回：

skewness: -0.393524456473
kurtosis: -0.330672097724

whereas if I convert to a pandas dataframe:

而如果我转换为Pandas数据框：

heights_df = pd.DataFrame(heights)
print "skewness:", heights_df.skew()
print "kurtosis:", heights_df.kurtosis()

this returns:

这将返回：

skewness: 0   -0.466663
kurtosis: 0    0.379705

Apologies if I've posted this in the wrong place; not sure if it's a stats or a programming question.

抱歉，如果我在错误的地方发布了这篇文章；不确定这是统计数据还是编程问题。

Answer 1

回答by BrenBarn

The difference is due to different normalizations. Scipy by default does not correct for bias, whereas pandas does.

差异是由于不同的归一化。默认情况下，Scipy 不会校正偏差，而 Pandas 会。

You can tell scipy to correct for bias by passing the bias=Falseargument:

您可以通过传递bias=False参数告诉 scipy 纠正偏差：

>>> x = pandas.Series(np.random.randn(10))
>>> stats.skew(x)
-0.17644348972413657
>>> x.skew()
-0.20923623968879457
>>> stats.skew(x, bias=False)
-0.2092362396887948
>>> stats.kurtosis(x)
0.6362620964462327
>>> x.kurtosis()
2.0891062062174464
>>> stats.kurtosis(x, bias=False)
2.089106206217446

There does not appear to be a way to tell pandas to remove the bias correction.

似乎没有办法告诉Pandas删除偏差校正。

pandas 与 scipy 中的偏斜和峰态函数有什么区别？

提问by lin_bug

回答by BrenBarn

相关推荐

最近更新

标签

pandas 与 scipy 中的偏斜和峰态函数有什么区别？

提问by lin_bug

回答by BrenBarn

相关推荐

Python Pandas.Series.asof：无法将“Timestamp”类型与“struct_time”类型进行比较

pandas 根据前几年的数据计算熊猫数据框行的百分位数

pandas ValueError：无法将大小为 5 的序列复制到维度为 2 的数组轴

使用包含空格的列名查询 Pandas DataFrame 或使用包含空格的列名使用 drop 方法

相关推荐

最近更新

标签