Python 在 Pandas 中加载 .rds 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40996175/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:20:05  来源:igfitidea点击:

Loading a .rds file in Pandas

pythonpython-3.xpandas

提问by D1X

I have downloaded a file with format .rds, How can I load this with Pandas? It is supposed to be an R file but I haven't been able to find any info about how to load it.

我已经下载了一个 .rds 格式的文件,我如何用 Pandas 加载它?它应该是一个 R 文件,但我无法找到有关如何加载它的任何信息。

回答by mgalardini

You could use the rpy2 interface to Pandas, in the following manner:

您可以通过以下方式使用rpy2 接口到 Pandas

import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

readRDS = robjects.r['readRDS']
df = readRDS('my_file.rds')
df = pandas2ri.ri2py(df)
# do something with the dataframe

回答by Otto Fajardo

If you would prefer not having to install R (rpy2 requires it), there is a new package "pyreadr" to read Rds and RData files very easily.

如果您不想安装 R(rpy2 需要它),有一个新包“pyreadr”可以非常轻松地读取 Rds 和 RData 文件。

It is a wrapper around the C library librdata, so it is very fast.

它是 C 库 librdata 的包装器,因此速度非常快。

You can install it easily with pip:

您可以使用 pip 轻松安装它:

pip install pyreadr

Then you can read your rds file:

然后你可以读取你的 rds 文件:

import pyreadr

result = pyreadr.read_r('/path/to/file.Rds') # also works for RData

# done! 
# result is a dictionary where keys are the name of objects and the values python
# objects. In the case of Rds there is only one object with None as key
df = result[None] # extract the pandas data frame 

The repo is here: https://github.com/ofajardo/pyreadr

回购在这里:https: //github.com/ofajardo/pyreadr

Disclaimer: I am the developer of this package.

免责声明:我是这个包的开发者。

回答by user2032994

To follow up on @mgalardini's answer, in newer versions of rpy2 (version 3.0.4), the method that converts R dataframe to pandas dataframe has changed:

为了跟进@mgalardini 的回答,在较新版本的 rpy2(3.0.4 版)中,将 R 数据帧转换为 Pandas 数据帧的方法已更改:

>>> rpy2.__version__
'3.0.4'
>>> import rpy2.robjects as robjects
>>> from rpy2.robjects import pandas2ri
>>> readRDS = robjects.r['readRDS']
>>> df = readRDS('my_file.rds')
>>> df = pandas2ri.rpy2py_dataframe(df)