Python 执行 2 个样本 t 检验
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22611446/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Perform 2 sample t-test
提问by Norfeldt
I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.
我有样本 1 和样本 2 的平均值、标准偏差和 n - 样本取自样本总体,但由不同的实验室测量。
n is different for sample 1 and sample 2. I want to do a weighted (take n into account) two-tailed t-test.
样本 1 和样本 2 的 n 不同。我想做一个加权(考虑 n)双尾 t 检验。
I tried using the scipy.statmodule by creating my numbers with np.random.normal, since it only takes data and not stat values like mean and std dev (is there any way to use these values directly).  But it didn't work since the data arrays has to be of equal size.
我尝试使用scipy.stat模块创建我的数字np.random.normal,因为它只需要数据而不是像 mean 和 std dev 这样的统计值(有没有办法直接使用这些值)。但它不起作用,因为数据数组必须具有相同的大小。
Any help on how to get the p-value would be highly appreciated.
任何有关如何获得 p 值的帮助将不胜感激。
采纳答案by Warren Weckesser
If you have the original data as arrays aand b, you can use scipy.stats.ttest_indwith the argument equal_var=False:
如果您将原始数据作为数组aand b,则可以scipy.stats.ttest_ind与参数一起使用equal_var=False:
t, p = ttest_ind(a, b, equal_var=False)
If you have only the summary statistics of the two data sets, you can calculate the t value using scipy.stats.ttest_ind_from_stats(added to scipy in version 0.16) or from the formula (http://en.wikipedia.org/wiki/Welch%27s_t_test).
如果您只有两个数据集的汇总统计数据,则可以使用scipy.stats.ttest_ind_from_stats(在 0.16 版中添加到 scipy 中)或从公式(http://en.wikipedia.org/wiki/Welch%27s_t_test)计算 t 值。
The following script shows the possibilities.
以下脚本显示了可能性。
from __future__ import print_function
import numpy as np
from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr
np.random.seed(1)
# Create sample data.
a = np.random.randn(40)
b = 4*np.random.randn(50)
# Use scipy.stats.ttest_ind.
t, p = ttest_ind(a, b, equal_var=False)
print("ttest_ind:            t = %g  p = %g" % (t, p))
# Compute the descriptive statistics of a and b.
abar = a.mean()
avar = a.var(ddof=1)
na = a.size
adof = na - 1
bbar = b.mean()
bvar = b.var(ddof=1)
nb = b.size
bdof = nb - 1
# Use scipy.stats.ttest_ind_from_stats.
t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na,
                              bbar, np.sqrt(bvar), nb,
                              equal_var=False)
print("ttest_ind_from_stats: t = %g  p = %g" % (t2, p2))
# Use the formulas directly.
tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
pf = 2*stdtr(dof, -np.abs(tf))
print("formula:              t = %g  p = %g" % (tf, pf))
The output:
输出:
ttest_ind:            t = -1.5827  p = 0.118873
ttest_ind_from_stats: t = -1.5827  p = 0.118873
formula:              t = -1.5827  p = 0.118873
回答by rroowwllaanndd
Using a recent version of Scipy 0.12.0, this functionality is built in (and does in fact operates on samples of different sizes). In scipy.statsthe ttest_indfunction performs Welch's t-test when the flag equal_varis set to False.
使用最新版本的 Scipy 0.12.0,此功能是内置的(并且实际上对不同大小的样本进行操作)。在scipy.stats该ttest_ind功能时,标志执行韦尔奇氏t检验equal_var被设定为False。
For example:
例如:
>>> import scipy.stats as stats
>>> sample1 = np.random.randn(10, 1)
>>> sample2 = 1 + np.random.randn(15, 1)
>>> t_stat, p_val = stats.ttest_ind(sample1, sample2, equal_var=False)
>>> t_stat
array([-3.94339083])
>>> p_val
array([ 0.00070813])

