Python中的卡方检验

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25139326/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:46:58  来源:igfitidea点击:

Chi squared test in Python

pythonnumpyscipychi-squared

提问by Richard

I'd like to run a chi-squared test in Python. I've created code to do this, but I don't know if what I'm doing is right, because the scipy docs are quite sparse.

我想在 Python 中运行卡方检验。我已经创建了代码来做到这一点,但我不知道我在做什么是正确的,因为 scipy 文档非常稀疏。

Background first: I have two groups of users. My null hypothesis is that there is no significant difference in whether people in either group are more likely to use desktop, mobile, or tablet.

背景第一:我有两组用户。我的零假设是,两组中的人是否更有可能使用台式机、移动设备或平板电脑没有显着差异。

These are the observedfrequencies in the two groups:

这些是两组中观察到的频率:

[[u'desktop', 14452], [u'mobile', 4073], [u'tablet', 4287]]
[[u'desktop', 30864], [u'mobile', 11439], [u'tablet', 9887]]

Here is my code using scipy.stats.chi2_contingency:

这是我使用的代码scipy.stats.chi2_contingency

obs = np.array([[14452, 4073, 4287], [30864, 11439, 9887]])
chi2, p, dof, expected = stats.chi2_contingency(obs)
print p

This gives me a p-value of 2.02258737401e-38, which clearly is significant.

这给了我一个 p 值2.02258737401e-38,这显然是显着的。

My question is: does this code look valid? In particular, I'm not sure whether I should be using scipy.stats.chi2_contingencyor scipy.stats.chisquare, given the data I have.

我的问题是:这段代码看起来有效吗?特别是,鉴于我拥有的数据,我不确定是否应该使用scipy.stats.chi2_contingencyscipy.stats.chisquare

采纳答案by Warren Weckesser

You are using chi2_contingencycorrectly. If you feel uncertain about the appropriate use of a chi-squared test or how to interpret its result (i.e. your question is about statistical testing rather than coding), consider asking it over at the "CrossValidated" site: https://stats.stackexchange.com/

您使用chi2_contingency正确。如果您不确定卡方检验的适当使用或如何解释其结果(即您的问题是关于统计检验而不是编码),请考虑在“CrossValidated”站点上询问:https://stats。 stackexchange.com/

回答by Luca Terzio Pontiggia

I can't comment too much on the use of the function. However, the issue at hand may be statistical in nature. The very small p-value you are seeing is most likely a result of your data containing large frequencies ( in the order of ten thousand). When sample sizes are too large, any differences will become significant - hence the small p-value. The tests you are using are very sensitive to sample size. See herefor more details.

我不能过多评论该功能的使用。然而,手头的问题可能是统计性质的。您看到的非常小的 p 值很可能是因为您的数据包含大频率(大约一万)。当样本量太大时,任何差异都会变得显着——因此 p 值很小。您正在使用的测试对样本量非常敏感。请参阅此处了解更多详情。