在 Pandas 中计算奇数比的更好方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43261747/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:21:20  来源:igfitidea点击:

A Better Way to Calculate Odd Ratio in Pandas

pythonpandas

提问by Acerace.py

I have a dataframe counts1 which looks like:

我有一个数据框 counts1 ,它看起来像:

Factor            w-statin  wo-statin
AgeGroups Cancer                     
0-5       No           108       6575
          Yes            0        223
11-15     No             5       3669
          Yes            1        143
16-20     No            28       6174
          Yes            1        395
21-25     No            80       8173
          Yes            2        624
26-30     No           110       9143
          Yes            2        968
30-35     No           171       9046
          Yes            5       1225
35-40     No           338       8883
          Yes           21       1475

I wanted to calculate the oddsratio (w-statin/wo-statin). I did it old style like I would do it in paper:

我想计算比值比(w-statin/wo-statin)。我做的是旧式的,就像我在纸上做的一样:

counts1['sumwwoStatin']= counts1['w-statin']+counts1['wo-statin']

counts1['oddRatio']=((counts1['w-statin']/counts1['sumwwoStatin'])/(counts1['wo-statin']/counts1['sumwwoStatin']))

Is there a better way to calculate Odds-ratio, Relative risk, Contigency Table, & Chi-Square Tests in Pandas, just like in R? Any suggestions are appreciated. Oh by the way, I forgot to mention how my csv looks like:

有没有更好的方法来计算 Pandas 中的比值比、相对风险、列联表和卡方检验,就像在 R 中一样?任何建议表示赞赏。哦,顺便说一句,我忘了提及我的 csv 是什么样子的:

    Frequency Cancer     Factor AgeGroups
0         223    Yes  wo-statin       0-5
1         112    Yes  wo-statin      6-10
2         143    Yes  wo-statin     11-15
3         395    Yes  wo-statin     16-20
4         624    Yes  wo-statin     21-25
5         968    Yes  wo-statin     26-30
6        1225    Yes  wo-statin     30-35
7        1475    Yes  wo-statin     35-40
8        2533    Yes  wo-statin     41-45
9        4268    Yes  wo-statin     46-50
10       5631    Yes  wo-statin     52-55
11       6656    Yes  wo-statin     56-60
12       7166    Yes  wo-statin     61-65
13       8573    Yes  wo-statin     66-70
14       8218    Yes  wo-statin     71-75
15       4614    Yes  wo-statin     76-80
16       1869    Yes  wo-statin     81-85
17        699    Yes  wo-statin     86-90
18        157    Yes  wo-statin     91-95
19         31    Yes  wo-statin    96-100
20          5    Yes  wo-statin      >100
21        108     No   w-statin       0-5
22          6     No   w-statin      6-10
23          5     No   w-statin     11-15
24         28     No   w-statin     16-20
25         80     No   w-statin     21-25
26        110     No   w-statin     26-30
27        171     No   w-statin     30-35
28        338     No   w-statin     35-40
29        782     No   w-statin     41-45
..

回答by pansen

AFAIK pandas does not provide statistical computations and tests except basic moments like mean, variance, correlations etc...

AFAIK pandas 不提供统计计算和测试,除了基本矩,如均值、方差、相关性等......

However, you can rely on scipyfor this requirement. You'll find most of what you need there. For instance, to calculate the odds ratio:

但是,您可以依靠scipy此要求。你会在那里找到大部分你需要的东西。例如,要计算优势比:

import scipy.stats as stats

table = df.groupby(level="Cancer").sum().values
print(table)

>>> array([[  840, 51663],
           [   32,  5053]])

oddsratio, pvalue = stats.fisher_exact(table)
print("OddsR: ", oddsratio, "p-Value:", pvalue)

>>> OddsR:  2.56743220487 p-Value: 2.72418938361e-09

See hereand herefor more.

查看herehere了解更多。

回答by Paul Sochacki

I don't know of a way to do this in Pandas... However, you can calculate the odds ratio(s) for a logistic regression model in Python by first using the scikit-learn library to find the corresponding beta values, described herein:

我不知道在 Pandas 中有什么方法可以做到这一点......但是,您可以通过首先使用 scikit-learn 库找到相应的 beta 值来计算 Python 中逻辑回归模型的优势比,描述在此处:

How to find beta values in Logistic Regression model with sklearn

如何使用 sklearn 在逻辑回归模型中找到 beta 值

This thread describes how you can generate and extract the Beta coefficients from a logistic regression model. You can then calculate the odds ratio by exponentiating the Beta values using the exp() function from Python's NumPy package. The odds ratios calculated in this way will be equivalent to the odds ratios provided by R with the glm() function, specifying a binomial distribution.

本主题介绍了如何从逻辑回归模型中生成和提取 Beta 系数。然后,您可以通过使用 Python 的 NumPy 包中的 exp() 函数对 Beta 值取幂来计算优势比。以这种方式计算的优势比将等效于 R 提供的带有 glm() 函数的优势比,指定二项式分布。