用于获取 t 统计量的 Python 函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19339305/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:30:34  来源:igfitidea点击:

Python function to get the t-statistic

pythonpython-2.7statisticsconfidence-interval

提问by ChrisProsser

I am looking for a Python function (or to write my own if there is not one) to get the t-statistic in order to use in a confidence interval calculation.

我正在寻找一个 Python 函数(如果没有,则自己编写)来获取 t 统计量,以便在置信区间计算中使用。

I have found tables that give answers for various probabilities / degrees of freedom like this one, but I would like to be able to calculate this for any given probability. For anyone not already familiar with this degrees of freedom is the number of data points (n) in your sample -1 and the numbers for the column headings at the top are probabilities (p) e.g. a 2 tailed significance level of 0.05 is used if you are looking up the t-score to use in the calculation for 95% confidence that if you repeated n tests the result would fall within the mean +/- the confidence interval.

我找到了可以为各种概率/自由度提供答案的表格,比如这个,但我希望能够针对任何给定的概率计算出这个。对于尚未熟悉此自由度的任何人来说,样本中的数据点数 (n) 是 -1,顶部列标题的数字是概率 (p) 例如,如果使用 2 尾显着性水平 0.05,则您正在查找用于计算 95% 置信度的 t 分数,即如果您重复 n 次测试,结果将落在平均值 +/- 置信区间内。

I have looked into using various functions within scipy.stats, but none that I can see seem to allow for the simple inputs I described above.

我已经研究过在 scipy.stats 中使用各种函数,但我看不到的函数似乎允许我上面描述的简单输入。

Excel has a simple implementation of this e.g. to get the t-score for a sample of 1000, where I need to be 95% confident I would use: =TINV(0.05,999)and get the score ~1.96

Excel 有一个简单的实现,例如获得 1000 样本的 t 分数,我需要有 95% 的信心我会使用:=TINV(0.05,999)并获得分数 ~1.96

Here is the code that I have used to implement confidence intervals so far, as you can see I am using a very crude way of getting the t-score at present (just allowing a few values for perc_conf and warning that it is not accurate for samples < 1000):

这是到目前为止我用来实现置信区间的代码,正如您所看到的,我目前正在使用一种非常粗略的方法来获取 t 分数(只允许 perc_conf 的几个值并警告它不准确样本 < 1000):

# -*- coding: utf-8 -*-
from __future__ import division
import math

def mean(lst):
    # μ = 1/N Σ(xi)
    return sum(lst) / float(len(lst))

def variance(lst):
    """
    Uses standard variance formula (sum of each (data point - mean) squared)
    all divided by number of data points
    """
    # σ2 = 1/N Σ((xi-μ)2)
    mu = mean(lst)
    return 1.0/len(lst) * sum([(i-mu)**2 for i in lst])

def conf_int(lst, perc_conf=95):
    """
    Confidence interval - given a list of values compute the square root of
    the variance of the list (v) divided by the number of entries (n)
    multiplied by a constant factor of (c). This means that I can
    be confident of a result +/- this amount from the mean.
    The constant factor can be looked up from a table, for 95% confidence
    on a reasonable size sample (>=500) 1.96 is used.
    """
    if perc_conf == 95:
        c = 1.96
    elif perc_conf == 90:
        c = 1.64
    elif perc_conf == 99:
        c = 2.58
    else:
        c = 1.96
        print 'Only 90, 95 or 99 % are allowed for, using default 95%'
    n, v = len(lst), variance(lst)
    if n < 1000:
        print 'WARNING: constant factor may not be accurate for n < ~1000'
    return math.sqrt(v/n) * c

Here is an example call for the above code:

以下是上述代码的示例调用:

# Example: 1000 coin tosses on a fair coin. What is the range that I can be 95%
#          confident the result will f all within.

# list of 1000 perfectly distributed...
perc_conf_req = 95
n, p = 1000, 0.5 # sample_size, probability of heads for each coin
l = [0 for i in range(int(n*(1-p)))] + [1 for j in range(int(n*p))]
exp_heads = mean(l) * len(l)
c_int = conf_int(l, perc_conf_req)

print 'I can be '+str(perc_conf_req)+'% confident that the result of '+str(n)+ \
      ' coin flips will be within +/- '+str(round(c_int*100,2))+'% of '+\
      str(int(exp_heads))
x = round(n*c_int,0)
print 'i.e. between '+str(int(exp_heads-x))+' and '+str(int(exp_heads+x))+\
      ' heads (assuming a probability of '+str(p)+' for each flip).' 

The output for this is:

这个输出是:

I can be 95% confident that the result of 1000 coin flips will be within +/- 3.1% of 500 i.e. between 469 and 531 heads (assuming a probability of 0.5 for each flip).

我有 95% 的把握认为 1000 次抛硬币的结果在 500 次的 +/- 3.1% 之内,即在 469 到 531 次正面之间(假设每次抛硬币的概率为 0.5)。

I also looked into calculating the t-distributionfor a range and then returning the t-score that got the probability closest to that required, but I had issues implementing the formula. Let me know if this is relevant and you want to see the code, but I have assumed not as there is probably an easier way.

我还研究了计算某个范围的t 分布,然后返回与所需概率最接近的 t 分数,但我在实施该公式时遇到了问题。让我知道这是否相关并且您想查看代码,但我假设不是,因为可能有更简单的方法。

Thanks in advance.

提前致谢。

采纳答案by henderso

Have you tried scipy?

你试过 scipy 吗?

You will need to installl the scipy library...more about installing it here: http://www.scipy.org/install.html

您将需要安装 scipy 库...更多关于在此处安装它的信息:http: //www.scipy.org/install.html

Once installed, you can replicate the Excel functionality like such:

安装后,您可以像这样复制 Excel 功能:

from scipy import stats
#Studnt, n=999, p<0.05, 2-tail
#equivalent to Excel TINV(0.05,999)
print stats.t.ppf(1-0.025, 999)

#Studnt, n=999, p<0.05%, Single tail
#equivalent to Excel TINV(2*0.05,999)
print stats.t.ppf(1-0.05, 999)

You can also read about installing the library here: how to install scipy for python?

您还可以在此处阅读有关安装库的信息:如何为 python 安装 scipy?

回答by javac

Try the following code:

试试下面的代码:

from scipy import stats
#Studnt, n=22,  2-tail
#stats.t.ppf(1-0.025, df)
# df=n-1=22-1=21
print (stats.t.ppf(1-0.025, 21))

回答by javac

You can try this code:

你可以试试这个代码:

# for small samples (<50) we use t-statistics
# n = 9, degree of freedom = 9-1 = 8
# for 99% confidence interval, alpha = 1% = 0.01 and alpha/2 = 0.005
from scipy import stats

ci = 99
n = 9
t = stats.t.ppf(1- ((100-ci)/2/100), n-1) # 99% CI, t8,0.005
print(t) # 3.36