pandas 将泡菜文件写入 AWS 中的 s3 存储桶

Question

提问by himi64

I'm trying to write a pandas dataframe as a pickle file into an s3 bucket in AWS. I know that I can write dataframe new_dfas a csv to an s3 bucket as follows:

我正在尝试将 Pandas 数据帧作为泡菜文件写入 AWS 中的 s3 存储桶。我知道我可以将数据帧new_df作为 csv写入s3 存储桶，如下所示：

bucket='mybucket'
key='path'

csv_buffer = StringIO()
s3_resource = boto3.resource('s3')

new_df.to_csv(csv_buffer, index=False)
s3_resource.Object(bucket,path).put(Body=csv_buffer.getvalue())

I've tried using the same code as above with to_pickle()but with no success.

我试过使用与上面相同的代码，to_pickle()但没有成功。

Answer 1

采纳答案by himi64

I've found the solution, need to call BytesIO into the buffer for pickle files instead of StringIO (which are for CSV files).

我找到了解决方案，需要将 BytesIO 调用到泡菜文件的缓冲区中，而不是 StringIO （用于 CSV 文件）。

import io
import boto3

pickle_buffer = io.BytesIO()
s3_resource = boto3.resource('s3')

new_df.to_pickle(pickle_buffer)
s3_resource.Object(bucket, key).put(Body=pickle_buffer.getvalue())

Answer 2

回答by Mostafa Shabani

Further to you answer, you don't need to convert to csv. pickle.dumps method returns a byte obj. see here: https://docs.python.org/3/library/pickle.html

除了您的回答之外，您不需要转换为 csv。pickle.dumps 方法返回一个字节 obj。见这里：https: //docs.python.org/3/library/pickle.html

import boto3
import pickle

bucket='your_bucket_name'
key='your_pickle_filename.pkl'
pickle_byte_obj = pickle.dumps([var1, var2, ..., varn]) 
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket,key).put(Body=pickle_byte_obj)

Answer 3

回答by Limsanity82

this worked for me with pandas 0.23.4 and boto3 1.7.80 :

这对我有用的 pandas 0.23.4 和 boto3 1.7.80 ：

bucket='your_bucket_name'
key='your_pickle_filename.pkl'
new_df.to_pickle(key)
s3_resource.Object(bucket,path).put(Body=open(key, 'rb'))

pandas 将泡菜文件写入 AWS 中的 s3 存储桶

提问by himi64

采纳答案by himi64

回答by Mostafa Shabani

回答by Limsanity82

相关推荐

最近更新

标签

pandas 将泡菜文件写入 AWS 中的 s3 存储桶

提问by himi64

采纳答案by himi64

回答by Mostafa Shabani

回答by Limsanity82

相关推荐

pandas 熊猫在正则表达式上分裂

使用 Pandas 将 JSON 转换为 CSV 输出

pandas 将数据帧中的 NaN 转换为零

对 Pandas Dataframe 中的列和行进行迭代

相关推荐

最近更新

标签