Python 执行 2 个样本 t 检验

Question

提问by Norfeldt

I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs.

我有样本 1 和样本 2 的平均值、标准偏差和 n - 样本取自样本总体，但由不同的实验室测量。

n is different for sample 1 and sample 2. I want to do a weighted (take n into account) two-tailed t-test.

样本 1 和样本 2 的 n 不同。我想做一个加权（考虑 n）双尾 t 检验。

I tried using the scipy.statmodule by creating my numbers with np.random.normal, since it only takes data and not stat values like mean and std dev (is there any way to use these values directly). But it didn't work since the data arrays has to be of equal size.

我尝试使用scipy.stat模块创建我的数字np.random.normal，因为它只需要数据而不是像 mean 和 std dev 这样的统计值（有没有办法直接使用这些值）。但它不起作用，因为数据数组必须具有相同的大小。

Any help on how to get the p-value would be highly appreciated.

任何有关如何获得 p 值的帮助将不胜感激。

Answer 1

采纳答案by Warren Weckesser

If you have the original data as arrays aand b, you can use scipy.stats.ttest_indwith the argument equal_var=False:

如果您将原始数据作为数组aand b，则可以scipy.stats.ttest_ind与参数一起使用equal_var=False：

t, p = ttest_ind(a, b, equal_var=False)

If you have only the summary statistics of the two data sets, you can calculate the t value using scipy.stats.ttest_ind_from_stats(added to scipy in version 0.16) or from the formula (http://en.wikipedia.org/wiki/Welch%27s_t_test).

如果您只有两个数据集的汇总统计数据，则可以使用scipy.stats.ttest_ind_from_stats（在 0.16 版中添加到 scipy 中）或从公式（http://en.wikipedia.org/wiki/Welch%27s_t_test）计算 t 值。

The following script shows the possibilities.

以下脚本显示了可能性。

from __future__ import print_function

import numpy as np
from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr

np.random.seed(1)

# Create sample data.
a = np.random.randn(40)
b = 4*np.random.randn(50)

# Use scipy.stats.ttest_ind.
t, p = ttest_ind(a, b, equal_var=False)
print("ttest_ind:            t = %g  p = %g" % (t, p))

# Compute the descriptive statistics of a and b.
abar = a.mean()
avar = a.var(ddof=1)
na = a.size
adof = na - 1

bbar = b.mean()
bvar = b.var(ddof=1)
nb = b.size
bdof = nb - 1

# Use scipy.stats.ttest_ind_from_stats.
t2, p2 = ttest_ind_from_stats(abar, np.sqrt(avar), na,
                              bbar, np.sqrt(bvar), nb,
                              equal_var=False)
print("ttest_ind_from_stats: t = %g  p = %g" % (t2, p2))

# Use the formulas directly.
tf = (abar - bbar) / np.sqrt(avar/na + bvar/nb)
dof = (avar/na + bvar/nb)**2 / (avar**2/(na**2*adof) + bvar**2/(nb**2*bdof))
pf = 2*stdtr(dof, -np.abs(tf))

print("formula:              t = %g  p = %g" % (tf, pf))

The output:

输出：

ttest_ind:            t = -1.5827  p = 0.118873
ttest_ind_from_stats: t = -1.5827  p = 0.118873
formula:              t = -1.5827  p = 0.118873

Answer 2

回答by rroowwllaanndd

Using a recent version of Scipy 0.12.0, this functionality is built in (and does in fact operates on samples of different sizes). In scipy.statsthe ttest_indfunction performs Welch's t-test when the flag equal_varis set to False.

使用最新版本的 Scipy 0.12.0，此功能是内置的（并且实际上对不同大小的样本进行操作）。在scipy.stats该ttest_ind功能时，标志执行韦尔奇氏t检验equal_var被设定为False。

For example:

例如：

>>> import scipy.stats as stats
>>> sample1 = np.random.randn(10, 1)
>>> sample2 = 1 + np.random.randn(15, 1)
>>> t_stat, p_val = stats.ttest_ind(sample1, sample2, equal_var=False)
>>> t_stat
array([-3.94339083])
>>> p_val
array([ 0.00070813])

Python 执行 2 个样本 t 检验

提问by Norfeldt

采纳答案by Warren Weckesser

回答by rroowwllaanndd

相关推荐

最近更新

标签

Python 执行 2 个样本 t 检验

提问by Norfeldt

采纳答案by Warren Weckesser

回答by rroowwllaanndd

相关推荐

Python re.sub 用逗号替换空格

通过 Python 使用谷歌翻译的最佳方式

Python 使用 xlrd 和 xlwt 编辑现有的 Excel 工作簿和工作表

Python 使用 paramiko 运行 Sudo 命令

相关推荐

最近更新

标签