将 .RData 文件加载到 Python 中

Question

提问by Stu

I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?

我有一堆 .RData 时间序列文件，我想直接将它们加载到 Python 中，而无需先将文件转换为其他扩展名（例如 .csv）。关于实现这一目标的最佳方法的任何想法？

Answer 1

采纳答案by Spacedman

People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RDatafile format. So any other implementation in any other language is hard++.

人们在 R-help 和 R-dev 列表上询问此类问题，通常的答案是代码是.RData文件格式的文档。所以任何其他语言的任何其他实现都是hard++。

I think the only reasonable way is to install RPy2 and use R's loadfunction from that, converting to appropriate python objects as you go. The .RDatafile can contain structured objects as well as plain tables so watch out.

我认为唯一合理的方法是安装 RPy2 并从中使用 R 的load函数，并随时转换为适当的 python 对象。该.RData文件可以包含结构化对象以及普通表格，因此请注意。

Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/

链接：http://rpy.sourceforge.net/rpy2/doc-2.4/html/

Quicky:

快点：

>>> import rpy2.robjects as robjects
>>> robjects.r['load'](".RData")

objects are now loaded into the R workspace.

对象现在已加载到 R 工作区中。

>>> robjects.r['y']
<FloatVector - Python:0x24c6560 / R:0xf1f0e0>
[0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]

That's a simple scalar, d is a data frame, I can subset to get columns:

这是一个简单的标量，d 是一个数据框，我可以通过子集获取列：

>>> robjects.r['d'][0]
<IntVector - Python:0x24c9248 / R:0xbbc6c0>
[       1,        2,        3, ...,        8,        9,       10]
>>> robjects.r['d'][1]
<FloatVector - Python:0x24c93b0 / R:0xf1f230>
[0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]

Answer 2

回答by Games Brainiac

There is a third party library called rpy, and you can use this library to load .RDatafiles. You can get this via a pipinstall pip instally rpywill do the trick, if you don't have rpy, then I suggest that you take a look at how to install it. Otherwise, you can simple do:

有一个名为的第三方库rpy，您可以使用该库来加载.RData文件。你可以通过pip安装来获得它pip instally rpy，如果你没有rpy，那么我建议你看看如何安装它。否则，您可以简单地执行以下操作：

from rpy import *
r.load("file name here")

EDIT:

编辑：

It seems like I'm a little old school there,s rpy2 now, so you can use that.

看起来我在那里有点老了，现在是 rpy2，所以你可以使用它。

Answer 3

回答by rsc05

Jupyter Notebook Users

Jupyter 笔记本用户

If you are using Jupyter notebook, you need to do 2 steps:

如果您使用的是 Jupyter Notebook，则需要执行 2 个步骤：

Step 1: go to http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2and download Python interface to the R language (embedded R) in my case I will use rpy2-2.8.6-cp36-cp36m-win_amd64.whl

第 1 步：转到http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2并下载 Python 接口到 R 语言（嵌入式 R）在我的情况下我将使用rpy2-2.8.6-cp36-cp36m-win_amd64.whl

Put this file in the same working directory you are currently in.

将此文件放在您当前所在的同一工作目录中。

Step 2: Go to your Jupyter notebook and write the following commands

第 2 步：转到您的 Jupyter 笔记本并编写以下命令

# This is to install rpy2 library in Anaconda
!pip install rpy2-2.8.6-cp36-cp36m-win_amd64.whl

and then

进而

# This is important if you will be using rpy2
import os
os.environ['R_USER'] = 'D:\Anaconda3\Lib\site-packages\rpy2'

and then

进而

import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

This should allow you to use R functions in python. Now you have to import the readRDSas follow

这应该允许您在 python 中使用 R 函数。现在你必须导入readRDS如下

readRDS = robjects.r['readRDS']
df = readRDS('Data1.rds')
df = pandas2ri.ri2py(df)
df.head()

Congratulations! now you have the Dataframe you wanted

恭喜！现在你有了你想要的数据框

However, I advise you to save it in pickle file for later time usage in python as

但是，我建议您将其保存在 pickle 文件中，以便以后在 python 中使用

 df.to_pickle('Data1')

So next time you may simply use it by

所以下次你可以简单地使用它

df1=pd.read_pickle('Data1')

Answer 4

回答by Otto Fajardo

As an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.

对于那些不想安装 R 来完成此任务（r2py 需要它）的人来说，作为替代方案，有一个新包“pyreadr”，它允许将 RData 和 Rds 文件直接读取到 python 中，而无需依赖。

It is a wrapper around the C library librdata, so it is very fast.

它是 C 库 librdata 的包装器，因此速度非常快。

You can install it easily with pip:

您可以使用 pip 轻松安装它：

pip install pyreadr

As an example you would do:

作为一个例子，你会这样做：

import pyreadr

result = pyreadr.read_r('/path/to/file.RData') # also works for Rds

# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1

The repo is here: https://github.com/ofajardo/pyreadr

回购在这里：https: //github.com/ofajardo/pyreadr

Disclaimer: I am the developer of this package.

免责声明：我是这个包的开发者。

将 .RData 文件加载到 Python 中

提问by Stu

采纳答案by Spacedman

回答by Games Brainiac

EDIT:

编辑：

回答by rsc05

Jupyter Notebook Users

Jupyter 笔记本用户

回答by Otto Fajardo

相关推荐

最近更新

标签

将 .RData 文件加载到 Python 中

提问by Stu

采纳答案by Spacedman

回答by Games Brainiac

EDIT:

编辑：

回答by rsc05

Jupyter Notebook Users

Jupyter 笔记本用户

回答by Otto Fajardo

相关推荐

Python 如何在带有朴素贝叶斯分类器和 NLTK 的 scikit 中使用 k 折交叉验证

python easy_install 失败，所有包的 SSL 证书错误

Python 检查子字符串是否在字符串列表中？

Python 熊猫 read_csv 中的日期时间数据类型

相关推荐

最近更新

标签