使用 pandas 数据框进行 rpy2 回归的最小示例
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30922213/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Minimal example of rpy2 regression using pandas data frame
提问by mjandrews
What is the recommended way (if any) for doing linear regression using a pandas dataframe? I can do it, but my method seems very elaborate. Am I making things unnecessarily complicated?
使用Pandas数据框进行线性回归的推荐方法(如果有的话)是什么?我可以做到,但我的方法似乎很复杂。我是不是把事情不必要地复杂化了?
The R code, for comparison:
R代码,用于比较:
x <- c(1,2,3,4,5)
y <- c(2,1,3,5,4)
M <- lm(y~x)
summary(M)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6 1.1489125 0.522233 0.6376181
x 0.8 0.3464102 2.309401 0.1040880
Now, my python (2.7.10), rpy2 (2.6.0), and pandas (0.16.1) version:
现在,我的 python (2.7.10)、rpy2 (2.6.0) 和 pandas (0.16.1) 版本:
import pandas
import pandas.rpy.common as common
from rpy2 import robjects
from rpy2.robjects.packages import importr
base = importr('base')
stats = importr('stats')
dataframe = pandas.DataFrame({'x': [1,2,3,4,5],
'y': [2,1,3,5,4]})
robjects.globalenv['dataframe']\
= common.convert_to_r_dataframe(dataframe)
M = stats.lm('y~x', data=base.as_symbol('dataframe'))
print(base.summary(M).rx2('coefficients'))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6 1.1489125 0.522233 0.6376181
x 0.8 0.3464102 2.309401 0.1040880
By the way, I do get a FutureWarning on the import of pandas.rpy.common. However, when I tried the pandas2ri.py2ri(dataframe)to convert a dataframe from pandas to R (as mentioned here), I get
顺便说一句,我确实在导入pandas.rpy.common. 然而,当我试图pandas2ri.py2ri(dataframe)以一个数据帧从Pandas转换为R(如提到这里),我得到
NotImplementedError: Conversion 'py2ri' not defined for objects of type '<class 'pandas.core.series.Series'>'
采纳答案by lgautier
The R and Python are not strictly identical because you build a data frame in Python/rpy2 whereas you use vectors (without a data frame) in R.
R 和 Python 并不严格相同,因为您在 Python/rpy2 中构建数据框,而在 R 中使用向量(没有数据框)。
Otherwise, the conversion shipping with rpy2appears to be working here:
否则,转换运输rpy2似乎在这里工作:
from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.globalenv['dataframe'] = dataframe
M = stats.lm('y~x', data=base.as_symbol('dataframe'))
The result:
结果:
>>> print(base.summary(M).rx2('coefficients'))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6 1.1489125 0.522233 0.6376181
x 0.8 0.3464102 2.309401 0.1040880
回答by unutbu
After calling pandas2ri.activate()some conversions from Pandas objects to R objects happen automatically. For example, you can use
在调用pandas2ri.activate()一些从 Pandas 对象到 R 对象的转换后,会自动发生。例如,您可以使用
M = R.lm('y~x', data=df)
instead of
代替
robjects.globalenv['dataframe'] = dataframe
M = stats.lm('y~x', data=base.as_symbol('dataframe'))
import pandas as pd
from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri
pandas2ri.activate()
R = ro.r
df = pd.DataFrame({'x': [1,2,3,4,5],
'y': [2,1,3,5,4]})
M = R.lm('y~x', data=df)
print(R.summary(M).rx2('coefficients'))
yields
产量
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6 1.1489125 0.522233 0.6376181
x 0.8 0.3464102 2.309401 0.1040880
回答by LondonRob
I can add to unutbu's answerby outlining how to retrieve particular elements of the coefficients table including, crucially, the p-values.
我可以通过概述如何检索系数表的特定元素(包括至关重要的p 值)来添加unutbu 的答案。
def r_matrix_to_data_frame(r_matrix):
"""Convert an R matrix into a Pandas DataFrame"""
import pandas as pd
from rpy2.robjects import pandas2ri
array = pandas2ri.ri2py(r_matrix)
return pd.DataFrame(array,
index=r_matrix.names[0],
columns=r_matrix.names[1])
# Let's start from unutbu's line retrieving the coefficients:
coeffs = R.summary(M).rx2('coefficients')
df = r_matrix_to_data_frame(coeffs)
This leaves us with a DataFrame which we can access in the normal way:
这给我们留下了一个可以以正常方式访问的 DataFrame:
In [179]: df['Pr(>|t|)']
Out[179]:
(Intercept) 0.637618
x 0.104088
Name: Pr(>|t|), dtype: float64
In [181]: df.loc['x', 'Pr(>|t|)']
Out[181]: 0.10408803866182779

