我如何在 python 中进行 F 测试
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21494141/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I do a F-test in python
提问by DrewH
How do I do an F-test to check if the variance is equivalent in two vectors in Python?
如何进行 F 检验以检查 Python 中两个向量的方差是否相等?
For example if I have
例如,如果我有
a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
is there something similar to
有没有类似的东西
scipy.stats.ttest_ind(a, b)
I found
我发现
sp.stats.f(a, b)
But it appears to be something different to an F-test
但它似乎与 F 检验有所不同
采纳答案by Joel Cornett
The test statistic F test for equal variances is simply:
等方差的检验统计量 F 检验很简单:
F = Var(X) / Var(Y)
Where Fis distributed as df1 = len(X) - 1, df2 = len(Y) - 1
在哪里F分布为df1 = len(X) - 1, df2 = len(Y) - 1
scipy.stats.fwhich you mentioned in your question has a CDF method. This means you can generate a p-value for the given statistic and test whether that p-value is greater than your chosen alpha level.
scipy.stats.f您在问题中提到的具有 CDF 方法。这意味着您可以为给定的统计量生成 p 值并测试该 p 值是否大于您选择的 alpha 水平。
Thus:
因此:
alpha = 0.05 #Or whatever you want your alpha to be.
p_value = scipy.stats.f.cdf(F, df1, df2)
if p_value > alpha:
# Reject the null hypothesis that Var(X) == Var(Y)
Note that the F-test is extremely sensitive to non-normality of X and Y, so you're probably better off doing a more robust test such as Levene's testor Bartlett's testunless you're reasonably sure that X and Y are distributed normally. These tests can be found in the scipyapi:
请注意,F 检验对 X 和 Y 的非正态性极其敏感,因此您最好进行更稳健的检验,例如Levene 检验或Bartlett 检验,除非您有理由确定 X 和 Y 是正态分布的. 这些测试可以在scipyapi 中找到:
回答by slushy
For anyone who came here searching for an ANOVA F-test or to compare between models for feature selection
对于来这里搜索 ANOVA F 检验或比较模型以进行特征选择的任何人
sklearn.feature_selection.f_classifdoes ANOVA tests, andsklearn.feature_selection.f_regressiondoes sequential testing of regressions
回答by Ryszard Cetnarski
To do a one way anova you can use
做一个你可以使用的单向方差分析
import scipy.stats as stats
stats.f_oneway(a,b)
One way Anova checks if the variance between the groups is greater then the variance within groups, and computes the probability of observing this variance ratio using F-distribution. A good tutorial can be found here:
Anova 检查组间方差是否大于组内方差的一种方法,并使用 F 分布计算观察到该方差比的概率。一个很好的教程可以在这里找到:
回答by Ala Ham
if you need a two-tailed test, you can proceed as follow, i choosed alpha =0.05:
如果您需要双尾测试,您可以按照以下步骤进行,我选择了 alpha = 0.05:
a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = np.var(a, ddof=1)/np.var(b, ddof=1) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object
p_value = 2*min(fdistribution.cdf(f_critical), 1-fdistribution.cdf(f_critical))
f_critical1 = fdistribution.ppf(0.025)
f_critical2 = fdistribution.ppf(0.975)
print(fstatistics,f_critical1, f_critical2 )
if (p_value<0.05):
print('Reject H0', p_value)
else:
print('Cant Reject H0', p_value)
if you want to proceed to an ANOVA like test where only large values can cause rejection, you can proceed to right-tail test, you need to pay attention to the order of variances (fstatistics = var1/var2 or var2/var1):
如果你想进行ANOVA之类的测试,只有大的值会导致拒绝,你可以进行右尾测试,你需要注意方差的顺序(fstatistics = var1/var2 or var2/var1):
a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = max(np.var(a, ddof=1), np.var(b, ddof=1))/min(np.var(a, ddof=1), np.var(b, ddof=1)) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object
p_value = 1-fdistribution.cdf(fstatistics)
f_critical = fd.ppf(0.95)
print(fstatistics, f_critical)
if (p_value<0.05):
print('Reject H0', p_value)
else:
print('Cant Reject H0', p_value)
The left-tailed can be done as follow :
左尾可以按如下方式完成:
a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = min(np.var(a, ddof=1), np.var(b, ddof=1))/max(np.var(a, ddof=1), np.var(b, ddof=1)) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object
p_value = fdistribution.cdf(fstatistics)
f_critical = fd.ppf(0.05)
print(fstatistics, f_critical)
if (p_value<0.05):
print('Reject H0', p_value)
else:
print('Cant Reject H0', p_value)

