pandas 大熊猫如何计算偏斜
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37647961/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does pandas calculate skew
提问by piRSquared
I'm calculating a coskew matrix and wanted to double check my calculation with pandas built in skew
method. I could not reconcile how pandas performing the calculation.
我正在计算一个 coskew 矩阵,并想用内置的 Pandasskew
方法仔细检查我的计算。我无法协调Pandas如何执行计算。
define my series as:
将我的系列定义为:
import pandas as pd
series = pd.Series(
{0: -0.051917457635120283,
1: -0.070071606515280632,
2: -0.11204865874074735,
3: -0.14679988245503134,
4: -0.088062467095565145,
5: 0.17579741198527793,
6: -0.10765856028420773,
7: -0.11971470229167547,
8: -0.15169210769159247,
9: -0.038616800990881606,
10: 0.16988162977411481,
11: 0.092999418364443032}
)
I compared the following calculations and expected them to be the same.
我比较了以下计算并预计它们是相同的。
pandas
Pandas
series.skew()
1.1119637586658944
me
我
(((series - series.mean()) / series.std(ddof=0)) ** 3).mean()
0.967840223081231
me - take 2
我 - 拿 2
This is significantly different. I thought it might be Fisher-Pearson coefficient. So I did:
这是明显不同的。我认为这可能是Fisher-Pearson 系数。所以我做了:
n = len(series)
skew = series.sub(series.mean()).div(series.std(ddof=0)).apply(lambda x: x ** 3).mean()
skew * (n * (n - 1)) ** 0.5 / (n - 1)
1.0108761442417222
Still off by quite a bit.
还是差了很多。
Question
题
How does pandas calculate skew?
大Pandas如何计算偏斜?
采纳答案by jezrael
I found scipy.stats.skew
with parameter bias=False
return equal output, so I think in pandas skew
is bias=False
by default:
我发现scipy.stats.skew
参数bias=False
返回相等的输出,所以我认为 inpandas skew
是bias=False
默认的:
bias : bool
If False, then the calculations are corrected for statistical bias.
偏见:布尔
如果为 False,则针对统计偏差对计算进行校正。
import pandas as pd
import scipy.stats.stats as stats
series = pd.Series(
{0: -0.051917457635120283,
1: -0.070071606515280632,
2: -0.11204865874074735,
3: -0.14679988245503134,
4: -0.088062467095565145,
5: 0.17579741198527793,
6: -0.10765856028420773,
7: -0.11971470229167547,
8: -0.15169210769159247,
9: -0.038616800990881606,
10: 0.16988162977411481,
11: 0.092999418364443032}
)
print (series.skew())
1.11196375867
print (stats.skew(series, bias=False))
1.1119637586658944
Not sure for 100%, but I think I find it in code
不确定 100%,但我想我在代码中找到了
EDIT (piRSquared)
编辑(piRSquared)
From scipy
skew
code
if not bias:
can_correct = (n > 2) & (m2 > 0)
if can_correct.any():
m2 = np.extract(can_correct, m2)
m3 = np.extract(can_correct, m3)
nval = ma.sqrt((n-1.0)*n)/(n-2.0)*m3/m2**1.5
np.place(vals, can_correct, nval)
return vals
The adjustment was (n * (n - 1)) ** 0.5 / (n - 2)
and not (n * (n - 1)) ** 0.5 / (n - 1)
调整是(n * (n - 1)) ** 0.5 / (n - 2)
和不是(n * (n - 1)) ** 0.5 / (n - 1)