pandas 将泡菜文件写入 AWS 中的 s3 存储桶

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49120069/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:16:43  来源:igfitidea点击:

Writing a pickle file to an s3 bucket in AWS

pythonpandasamazon-web-servicesamazon-s3

提问by himi64

I'm trying to write a pandas dataframe as a pickle file into an s3 bucket in AWS. I know that I can write dataframe new_dfas a csv to an s3 bucket as follows:

我正在尝试将 Pandas 数据帧作为泡菜文件写入 AWS 中的 s3 存储桶。我知道我可以将数据帧new_df作为 csv写入s3 存储桶,如下所示:

bucket='mybucket'
key='path'

csv_buffer = StringIO()
s3_resource = boto3.resource('s3')

new_df.to_csv(csv_buffer, index=False)
s3_resource.Object(bucket,path).put(Body=csv_buffer.getvalue())

I've tried using the same code as above with to_pickle()but with no success.

我试过使用与上面相同的代码,to_pickle()但没有成功。

采纳答案by himi64

I've found the solution, need to call BytesIO into the buffer for pickle files instead of StringIO (which are for CSV files).

我找到了解决方案,需要将 BytesIO 调用到泡菜文件的缓冲区中,而不是 StringIO (用于 CSV 文件)。

import io
import boto3

pickle_buffer = io.BytesIO()
s3_resource = boto3.resource('s3')

new_df.to_pickle(pickle_buffer)
s3_resource.Object(bucket, key).put(Body=pickle_buffer.getvalue())

回答by Mostafa Shabani

Further to you answer, you don't need to convert to csv. pickle.dumps method returns a byte obj. see here: https://docs.python.org/3/library/pickle.html

除了您的回答之外,您不需要转换为 csv。pickle.dumps 方法返回一个字节 obj。见这里:https: //docs.python.org/3/library/pickle.html

import boto3
import pickle

bucket='your_bucket_name'
key='your_pickle_filename.pkl'
pickle_byte_obj = pickle.dumps([var1, var2, ..., varn]) 
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket,key).put(Body=pickle_byte_obj)

回答by Limsanity82

this worked for me with pandas 0.23.4 and boto3 1.7.80 :

这对我有用的 pandas 0.23.4 和 boto3 1.7.80 :

bucket='your_bucket_name'
key='your_pickle_filename.pkl'
new_df.to_pickle(key)
s3_resource.Object(bucket,path).put(Body=open(key, 'rb'))