在 R 中读取泡菜文件（PANDAS Python 数据帧）

Question

提问by Vincent

Is there an easy way to read pickle files (.pkl) from Pandas Dataframe into R?

有没有一种简单的方法可以将 Pandas Dataframe 中的 pickle 文件 (.pkl) 读取到 R 中？

One possibility is to export to CSV and have R read the CSV but that seems really cumbersome for me because my dataframes are rather large. Is there an easier way to do so?

一种可能性是导出到 CSV 并让 R 读取 CSV，但这对我来说似乎很麻烦，因为我的数据框相当大。有没有更简单的方法来做到这一点？

Thanks!

谢谢！

Answer 1

采纳答案by russellpierce

You could load the pickle in python and then export it to R via the python package rpy2(or similar). Once you've done so, your data will exist in an R session linked to python. I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk. Then in RStudio you can read that file back in. Look at the R packages rJythonand rPythonfor ways in which you could trigger the python commands from R.

您可以在 python 中加载泡菜，然后通过 python 包rpy2（或类似包）将其导出到 R。完成此操作后，您的数据将存在于链接到 python 的 R 会话中。我怀疑您接下来要做的是使用该会话来调用 R 并将 RDS 保存到文件或 RAM 磁盘。然后在 RStudio 中，您可以重新读取该文件。查看 R 包rJython以及rPython可以从 R 触发 python 命令的方法。

Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout. Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to freadin the R package data.table. Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE)and read.table.

或者，您可以编写一个简单的 Python 脚本来在 Python 中加载您的数据（可能使用上面提到的 R 包之一）并将格式化的数据流写入标准输出。然后，对脚本的整个系统调用（包括指定 pickle 的参数）可以用作freadR 包中的参数 to data.table。或者，如果您想保持标准功能，您可以使用system(..., intern=TRUE)和的组合read.table。

As usual, there are /many/ ways to skin this particular cat. The basic steps are:

像往常一样，有/许多/方法可以给这只特定的猫剥皮。基本步骤是：

Load the data in python
Express the data to R (e.g., exporting the object via rpy2 or writing formatted text to stdout with R ready to receive it on the other end)
Serialize the expressed data in R to an internal data representation (e.g., exporting the object via rpy2 or fread)
(optional) Make the data in that session of R accessible to another R session (i.e., the step to close the loop with rpy2, or if you've been using freadthen you're already done).

在python中加载数据
将数据表达到 R（例如，通过 rpy2 导出对象或将格式化文本写入标准输出，R 准备在另一端接收它）
将 R 中表达的数据序列化为内部数据表示（例如，通过 rpy2 或导出对象fread）
（可选）使另一个 R 会话可以访问该 R 会话中的数据（即，使用 rpy2 关闭循环的步骤，或者如果您一直在使用，fread那么您已经完成了）。

Answer 2

回答by Ankur Sinha

Reticulatewas quite easy and super smooth as suggested by russellpierce in the comments.

正如 russellpierce 在评论中所建议的那样，Reticulate非常简单且非常平滑。

install.packages('reticulate')

After which I created a Python script like this from examples given in their documentation.

之后，我从他们的文档中给出的示例中创建了一个这样的 Python 脚本。

Python file:

蟒文件：

import pandas as pd

def read_pickle_file(file):
    pickle_data = pd.read_pickle(file)
    return pickle_data

And then my R file looked like:

然后我的 R 文件看起来像：

require("reticulate")

source_python("pickle_reader.py")
pickle_data <- read_pickle_file("C:/tsa/dataset.pickle")

This gave me all my data in R stored earlier in pickle format.

这给了我之前以pickle格式存储在R中的所有数据。

You can also do this all in-line in R without leaving your R editor (provided your system python can reach pandas)... e.g.

你也可以在不离开你的 R 编辑器的情况下在 R 中执行所有这些操作（前提是你的系统 python 可以访问熊猫）......例如

library(reticulate)
pd <- import("pandas")
pickle_data <- pd$read_pickle("dataset.pickle")

Answer 3

回答by generic_user

To add to the answer above: you might need to point to a different conda env to get to pandas:

要添加到上面的答案：您可能需要指向不同的 conda env 才能访问 Pandas：

use_condaenv("name_of_conda_env", conda = "<<result_of `which conda`>>")
pd <- import('pandas')

df <- pd$read_pickle(paste0(outdir, "df.pkl"))

在 R 中读取泡菜文件（PANDAS Python 数据帧）

提问by Vincent

采纳答案by russellpierce

回答by Ankur Sinha

回答by generic_user

相关推荐

最近更新

标签

在 R 中读取泡菜文件（PANDAS Python 数据帧）

提问by Vincent

采纳答案by russellpierce

回答by Ankur Sinha

回答by generic_user

相关推荐

Python Jupyter notebook 命令在 Mac 上不起作用

Python numpy ValueError 形状未对齐

Python 如何在子图中绘制多个 Seaborn Jointplot

Python Tornado：ImportError：没有名为“tornado”的模块

相关推荐

最近更新

标签