Python中的卡方检验

Question

提问by Richard

I'd like to run a chi-squared test in Python. I've created code to do this, but I don't know if what I'm doing is right, because the scipy docs are quite sparse.

我想在 Python 中运行卡方检验。我已经创建了代码来做到这一点，但我不知道我在做什么是正确的，因为 scipy 文档非常稀疏。

Background first: I have two groups of users. My null hypothesis is that there is no significant difference in whether people in either group are more likely to use desktop, mobile, or tablet.

背景第一：我有两组用户。我的零假设是，两组中的人是否更有可能使用台式机、移动设备或平板电脑没有显着差异。

These are the observedfrequencies in the two groups:

这些是两组中观察到的频率：

[[u'desktop', 14452], [u'mobile', 4073], [u'tablet', 4287]]
[[u'desktop', 30864], [u'mobile', 11439], [u'tablet', 9887]]

Here is my code using scipy.stats.chi2_contingency:

这是我使用的代码scipy.stats.chi2_contingency：

obs = np.array([[14452, 4073, 4287], [30864, 11439, 9887]])
chi2, p, dof, expected = stats.chi2_contingency(obs)
print p

This gives me a p-value of 2.02258737401e-38, which clearly is significant.

这给了我一个 p 值2.02258737401e-38，这显然是显着的。

My question is: does this code look valid? In particular, I'm not sure whether I should be using scipy.stats.chi2_contingencyor scipy.stats.chisquare, given the data I have.

我的问题是：这段代码看起来有效吗？特别是，鉴于我拥有的数据，我不确定是否应该使用scipy.stats.chi2_contingency或scipy.stats.chisquare。

Answer 1

采纳答案by Warren Weckesser

You are using chi2_contingencycorrectly. If you feel uncertain about the appropriate use of a chi-squared test or how to interpret its result (i.e. your question is about statistical testing rather than coding), consider asking it over at the "CrossValidated" site: https://stats.stackexchange.com/

您使用chi2_contingency正确。如果您不确定卡方检验的适当使用或如何解释其结果（即您的问题是关于统计检验而不是编码），请考虑在“CrossValidated”站点上询问：https://stats。 stackexchange.com/

Answer 2

回答by Luca Terzio Pontiggia

I can't comment too much on the use of the function. However, the issue at hand may be statistical in nature. The very small p-value you are seeing is most likely a result of your data containing large frequencies ( in the order of ten thousand). When sample sizes are too large, any differences will become significant - hence the small p-value. The tests you are using are very sensitive to sample size. See herefor more details.

我不能过多评论该功能的使用。然而，手头的问题可能是统计性质的。您看到的非常小的 p 值很可能是因为您的数据包含大频率（大约一万）。当样本量太大时，任何差异都会变得显着——因此 p 值很小。您正在使用的测试对样本量非常敏感。请参阅此处了解更多详情。

Python中的卡方检验

提问by Richard

采纳答案by Warren Weckesser

回答by Luca Terzio Pontiggia

相关推荐

最近更新

标签

Python中的卡方检验

提问by Richard

采纳答案by Warren Weckesser

回答by Luca Terzio Pontiggia

相关推荐

Python 如何将经纬度转换为国家或城市？

Python 如何正确停止phantomjs执行

Python 如何将字典写入现有文件？

如何使用 Python tkinter 创建子窗口？

相关推荐

最近更新

标签