将 .RData 文件加载到 Python 中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21288133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Loading .RData files into Python
提问by Stu
I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?
我有一堆 .RData 时间序列文件,我想直接将它们加载到 Python 中,而无需先将文件转换为其他扩展名(例如 .csv)。关于实现这一目标的最佳方法的任何想法?
采纳答案by Spacedman
People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RDatafile format. So any other implementation in any other language is hard++.
人们在 R-help 和 R-dev 列表上询问此类问题,通常的答案是代码是.RData文件格式的文档。所以任何其他语言的任何其他实现都是hard++。
I think the only reasonable way is to install RPy2 and use R's loadfunction from that, converting to appropriate python objects as you go. The .RDatafile can contain structured objects as well as plain tables so watch out.
我认为唯一合理的方法是安装 RPy2 并从中使用 R 的load函数,并随时转换为适当的 python 对象。该.RData文件可以包含结构化对象以及普通表格,因此请注意。
Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/
链接:http://rpy.sourceforge.net/rpy2/doc-2.4/html/
Quicky:
快点:
>>> import rpy2.robjects as robjects
>>> robjects.r['load'](".RData")
objects are now loaded into the R workspace.
对象现在已加载到 R 工作区中。
>>> robjects.r['y']
<FloatVector - Python:0x24c6560 / R:0xf1f0e0>
[0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]
That's a simple scalar, d is a data frame, I can subset to get columns:
这是一个简单的标量,d 是一个数据框,我可以通过子集获取列:
>>> robjects.r['d'][0]
<IntVector - Python:0x24c9248 / R:0xbbc6c0>
[ 1, 2, 3, ..., 8, 9, 10]
>>> robjects.r['d'][1]
<FloatVector - Python:0x24c93b0 / R:0xf1f230>
[0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]
回答by Games Brainiac
There is a third party library called rpy, and you can use this library to load .RDatafiles. You can get this via a pipinstall pip instally rpywill do the trick, if you don't have rpy, then I suggest that you take a look at how to install it. Otherwise, you can simple do:
有一个名为 的第三方库rpy,您可以使用该库来加载.RData文件。你可以通过pip安装来获得它pip instally rpy,如果你没有rpy,那么我建议你看看如何安装它。否则,您可以简单地执行以下操作:
from rpy import *
r.load("file name here")
EDIT:
编辑:
It seems like I'm a little old school there,s rpy2 now, so you can use that.
看起来我在那里有点老了,现在是 rpy2,所以你可以使用它。
回答by rsc05
Jupyter Notebook Users
Jupyter 笔记本用户
If you are using Jupyter notebook, you need to do 2 steps:
如果您使用的是 Jupyter Notebook,则需要执行 2 个步骤:
Step 1: go to http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2and download Python interface to the R language (embedded R) in my case I will use rpy2-2.8.6-cp36-cp36m-win_amd64.whl
第 1 步:转到http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2并下载 Python 接口到 R 语言(嵌入式 R)在我的情况下我将使用rpy2-2.8.6-cp36-cp36m-win_amd64.whl
Put this file in the same working directory you are currently in.
将此文件放在您当前所在的同一工作目录中。
Step 2: Go to your Jupyter notebook and write the following commands
第 2 步:转到您的 Jupyter 笔记本并编写以下命令
# This is to install rpy2 library in Anaconda
!pip install rpy2-2.8.6-cp36-cp36m-win_amd64.whl
and then
进而
# This is important if you will be using rpy2
import os
os.environ['R_USER'] = 'D:\Anaconda3\Lib\site-packages\rpy2'
and then
进而
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
This should allow you to use R functions in python. Now you have to import the readRDSas follow
这应该允许您在 python 中使用 R 函数。现在你必须导入readRDS如下
readRDS = robjects.r['readRDS']
df = readRDS('Data1.rds')
df = pandas2ri.ri2py(df)
df.head()
Congratulations! now you have the Dataframe you wanted
恭喜!现在你有了你想要的数据框
However, I advise you to save it in pickle file for later time usage in python as
但是,我建议您将其保存在 pickle 文件中,以便以后在 python 中使用
df.to_pickle('Data1')
So next time you may simply use it by
所以下次你可以简单地使用它
df1=pd.read_pickle('Data1')
回答by Otto Fajardo
As an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.
对于那些不想安装 R 来完成此任务(r2py 需要它)的人来说,作为替代方案,有一个新包“pyreadr”,它允许将 RData 和 Rds 文件直接读取到 python 中,而无需依赖。
It is a wrapper around the C library librdata, so it is very fast.
它是 C 库 librdata 的包装器,因此速度非常快。
You can install it easily with pip:
您可以使用 pip 轻松安装它:
pip install pyreadr
As an example you would do:
作为一个例子,你会这样做:
import pyreadr
result = pyreadr.read_r('/path/to/file.RData') # also works for Rds
# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1
The repo is here: https://github.com/ofajardo/pyreadr
回购在这里:https: //github.com/ofajardo/pyreadr
Disclaimer: I am the developer of this package.
免责声明:我是这个包的开发者。

