pandas 将泡菜文件写入 AWS 中的 s3 存储桶
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49120069/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Writing a pickle file to an s3 bucket in AWS
提问by himi64
I'm trying to write a pandas dataframe as a pickle file into an s3 bucket in AWS. I know that I can write dataframe new_df
as a csv to an s3 bucket as follows:
我正在尝试将 Pandas 数据帧作为泡菜文件写入 AWS 中的 s3 存储桶。我知道我可以将数据帧new_df
作为 csv写入s3 存储桶,如下所示:
bucket='mybucket'
key='path'
csv_buffer = StringIO()
s3_resource = boto3.resource('s3')
new_df.to_csv(csv_buffer, index=False)
s3_resource.Object(bucket,path).put(Body=csv_buffer.getvalue())
I've tried using the same code as above with to_pickle()
but with no success.
我试过使用与上面相同的代码,to_pickle()
但没有成功。
采纳答案by himi64
I've found the solution, need to call BytesIO into the buffer for pickle files instead of StringIO (which are for CSV files).
我找到了解决方案,需要将 BytesIO 调用到泡菜文件的缓冲区中,而不是 StringIO (用于 CSV 文件)。
import io
import boto3
pickle_buffer = io.BytesIO()
s3_resource = boto3.resource('s3')
new_df.to_pickle(pickle_buffer)
s3_resource.Object(bucket, key).put(Body=pickle_buffer.getvalue())
回答by Mostafa Shabani
Further to you answer, you don't need to convert to csv. pickle.dumps method returns a byte obj. see here: https://docs.python.org/3/library/pickle.html
除了您的回答之外,您不需要转换为 csv。pickle.dumps 方法返回一个字节 obj。见这里:https: //docs.python.org/3/library/pickle.html
import boto3
import pickle
bucket='your_bucket_name'
key='your_pickle_filename.pkl'
pickle_byte_obj = pickle.dumps([var1, var2, ..., varn])
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket,key).put(Body=pickle_byte_obj)
回答by Limsanity82
this worked for me with pandas 0.23.4 and boto3 1.7.80 :
这对我有用的 pandas 0.23.4 和 boto3 1.7.80 :
bucket='your_bucket_name'
key='your_pickle_filename.pkl'
new_df.to_pickle(key)
s3_resource.Object(bucket,path).put(Body=open(key, 'rb'))