在 Pandas 中计算奇数比的更好方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43261747/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
A Better Way to Calculate Odd Ratio in Pandas
提问by Acerace.py
I have a dataframe counts1 which looks like:
我有一个数据框 counts1 ,它看起来像:
Factor w-statin wo-statin
AgeGroups Cancer
0-5 No 108 6575
Yes 0 223
11-15 No 5 3669
Yes 1 143
16-20 No 28 6174
Yes 1 395
21-25 No 80 8173
Yes 2 624
26-30 No 110 9143
Yes 2 968
30-35 No 171 9046
Yes 5 1225
35-40 No 338 8883
Yes 21 1475
I wanted to calculate the oddsratio (w-statin/wo-statin). I did it old style like I would do it in paper:
我想计算比值比(w-statin/wo-statin)。我做的是旧式的,就像我在纸上做的一样:
counts1['sumwwoStatin']= counts1['w-statin']+counts1['wo-statin']
counts1['oddRatio']=((counts1['w-statin']/counts1['sumwwoStatin'])/(counts1['wo-statin']/counts1['sumwwoStatin']))
Is there a better way to calculate Odds-ratio, Relative risk, Contigency Table, & Chi-Square Tests in Pandas, just like in R? Any suggestions are appreciated. Oh by the way, I forgot to mention how my csv looks like:
有没有更好的方法来计算 Pandas 中的比值比、相对风险、列联表和卡方检验,就像在 R 中一样?任何建议表示赞赏。哦,顺便说一句,我忘了提及我的 csv 是什么样子的:
Frequency Cancer Factor AgeGroups
0 223 Yes wo-statin 0-5
1 112 Yes wo-statin 6-10
2 143 Yes wo-statin 11-15
3 395 Yes wo-statin 16-20
4 624 Yes wo-statin 21-25
5 968 Yes wo-statin 26-30
6 1225 Yes wo-statin 30-35
7 1475 Yes wo-statin 35-40
8 2533 Yes wo-statin 41-45
9 4268 Yes wo-statin 46-50
10 5631 Yes wo-statin 52-55
11 6656 Yes wo-statin 56-60
12 7166 Yes wo-statin 61-65
13 8573 Yes wo-statin 66-70
14 8218 Yes wo-statin 71-75
15 4614 Yes wo-statin 76-80
16 1869 Yes wo-statin 81-85
17 699 Yes wo-statin 86-90
18 157 Yes wo-statin 91-95
19 31 Yes wo-statin 96-100
20 5 Yes wo-statin >100
21 108 No w-statin 0-5
22 6 No w-statin 6-10
23 5 No w-statin 11-15
24 28 No w-statin 16-20
25 80 No w-statin 21-25
26 110 No w-statin 26-30
27 171 No w-statin 30-35
28 338 No w-statin 35-40
29 782 No w-statin 41-45
..
回答by pansen
AFAIK pandas does not provide statistical computations and tests except basic moments like mean, variance, correlations etc...
AFAIK pandas 不提供统计计算和测试,除了基本矩,如均值、方差、相关性等......
However, you can rely on scipy
for this requirement. You'll find most of what you need there. For instance, to calculate the odds ratio:
但是,您可以依靠scipy
此要求。你会在那里找到大部分你需要的东西。例如,要计算优势比:
import scipy.stats as stats
table = df.groupby(level="Cancer").sum().values
print(table)
>>> array([[ 840, 51663],
[ 32, 5053]])
oddsratio, pvalue = stats.fisher_exact(table)
print("OddsR: ", oddsratio, "p-Value:", pvalue)
>>> OddsR: 2.56743220487 p-Value: 2.72418938361e-09
回答by Paul Sochacki
I don't know of a way to do this in Pandas... However, you can calculate the odds ratio(s) for a logistic regression model in Python by first using the scikit-learn library to find the corresponding beta values, described herein:
我不知道在 Pandas 中有什么方法可以做到这一点......但是,您可以通过首先使用 scikit-learn 库找到相应的 beta 值来计算 Python 中逻辑回归模型的优势比,描述在此处:
How to find beta values in Logistic Regression model with sklearn
如何使用 sklearn 在逻辑回归模型中找到 beta 值
This thread describes how you can generate and extract the Beta coefficients from a logistic regression model. You can then calculate the odds ratio by exponentiating the Beta values using the exp() function from Python's NumPy package. The odds ratios calculated in this way will be equivalent to the odds ratios provided by R with the glm() function, specifying a binomial distribution.
本主题介绍了如何从逻辑回归模型中生成和提取 Beta 系数。然后,您可以通过使用 Python 的 NumPy 包中的 exp() 函数对 Beta 值取幂来计算优势比。以这种方式计算的优势比将等效于 R 提供的带有 glm() 函数的优势比,指定二项式分布。