使用 pandas 数据框进行 rpy2 回归的最小示例

Question

提问by mjandrews

What is the recommended way (if any) for doing linear regression using a pandas dataframe? I can do it, but my method seems very elaborate. Am I making things unnecessarily complicated?

使用Pandas数据框进行线性回归的推荐方法（如果有的话）是什么？我可以做到，但我的方法似乎很复杂。我是不是把事情不必要地复杂化了？

The R code, for comparison:

R代码，用于比较：

x <- c(1,2,3,4,5)
y <- c(2,1,3,5,4)
M <- lm(y~x)
summary(M)$coefficients
            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

Now, my python (2.7.10), rpy2 (2.6.0), and pandas (0.16.1) version:

现在，我的 python (2.7.10)、rpy2 (2.6.0) 和 pandas (0.16.1) 版本：

import pandas
import pandas.rpy.common as common
from rpy2 import robjects
from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats')

dataframe = pandas.DataFrame({'x': [1,2,3,4,5], 
                              'y': [2,1,3,5,4]})

robjects.globalenv['dataframe']\
   = common.convert_to_r_dataframe(dataframe) 

M = stats.lm('y~x', data=base.as_symbol('dataframe'))

print(base.summary(M).rx2('coefficients'))

            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

By the way, I do get a FutureWarning on the import of pandas.rpy.common. However, when I tried the pandas2ri.py2ri(dataframe)to convert a dataframe from pandas to R (as mentioned here), I get

顺便说一句，我确实在导入pandas.rpy.common. 然而，当我试图pandas2ri.py2ri(dataframe)以一个数据帧从Pandas转换为R（如提到这里），我得到

NotImplementedError: Conversion 'py2ri' not defined for objects of type '<class 'pandas.core.series.Series'>'

Answer 1

采纳答案by lgautier

The R and Python are not strictly identical because you build a data frame in Python/rpy2 whereas you use vectors (without a data frame) in R.

R 和 Python 并不严格相同，因为您在 Python/rpy2 中构建数据框，而在 R 中使用向量（没有数据框）。

Otherwise, the conversion shipping with rpy2appears to be working here:

否则，转换运输rpy2似乎在这里工作：

from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.globalenv['dataframe'] = dataframe
M = stats.lm('y~x', data=base.as_symbol('dataframe'))

The result:

结果：

>>> print(base.summary(M).rx2('coefficients'))
            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

Answer 2

回答by unutbu

After calling pandas2ri.activate()some conversions from Pandas objects to R objects happen automatically. For example, you can use

在调用pandas2ri.activate()一些从 Pandas 对象到 R 对象的转换后，会自动发生。例如，您可以使用

M = R.lm('y~x', data=df)

instead of

代替

robjects.globalenv['dataframe'] = dataframe
M = stats.lm('y~x', data=base.as_symbol('dataframe'))

import pandas as pd
from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri
pandas2ri.activate()
R = ro.r

df = pd.DataFrame({'x': [1,2,3,4,5], 
                   'y': [2,1,3,5,4]})

M = R.lm('y~x', data=df)
print(R.summary(M).rx2('coefficients'))

yields

产量

            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

Answer 3

回答by LondonRob

I can add to unutbu's answerby outlining how to retrieve particular elements of the coefficients table including, crucially, the p-values.

我可以通过概述如何检索系数表的特定元素（包括至关重要的p 值）来添加unutbu 的答案。

def r_matrix_to_data_frame(r_matrix):
    """Convert an R matrix into a Pandas DataFrame"""
    import pandas as pd
    from rpy2.robjects import pandas2ri
    array = pandas2ri.ri2py(r_matrix)
    return pd.DataFrame(array,
                        index=r_matrix.names[0],
                        columns=r_matrix.names[1])

# Let's start from unutbu's line retrieving the coefficients:
coeffs = R.summary(M).rx2('coefficients')
df = r_matrix_to_data_frame(coeffs)

This leaves us with a DataFrame which we can access in the normal way:

这给我们留下了一个可以以正常方式访问的 DataFrame：

In [179]: df['Pr(>|t|)']
Out[179]:
(Intercept)    0.637618
x              0.104088
Name: Pr(>|t|), dtype: float64

In [181]: df.loc['x', 'Pr(>|t|)']
Out[181]: 0.10408803866182779

使用 pandas 数据框进行 rpy2 回归的最小示例

提问by mjandrews

采纳答案by lgautier

回答by unutbu

回答by LondonRob

相关推荐

最近更新

标签

使用 pandas 数据框进行 rpy2 回归的最小示例

提问by mjandrews

采纳答案by lgautier

回答by unutbu

回答by LondonRob

相关推荐

pandas 从整数创建 tz 感知的熊猫时间戳对象

使用 scipy.io 将 python pandas 数据帧转换为 matlab 结构

Pandas 数据框前 ​​x 列

Python DatetimeIndex 错误 - TypeError: (“不能在 <class 'pandas.tseries.index.DatetimeIndex' 上做标签索引

相关推荐

最近更新

标签

Pandas 数据框前 x 列