Python 使用 boto3 在两个 AWS S3 存储桶之间移动文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30161700/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Move files between two AWS S3 buckets using boto3
提问by Gal
I have to move files between one bucket to another with Python Boto API. (I need it to "Cut" the file from the first Bucket and "Paste" it in the second one). What is the best way to do that?
我必须使用 Python Boto API 在一个存储桶和另一个存储桶之间移动文件。(我需要它从第一个存储桶“剪切”文件并将其“粘贴”到第二个存储桶中)。最好的方法是什么?
** Note: Is that matter if I have two different ACCESS KEYS and SECRET KEYS?
** 注意:如果我有两个不同的 ACCESS KEYS 和 SECRET KEYS 有关系吗?
回答by Freek Wiekmeijer
I think the boto S3 documentation answers your question.
我认为 boto S3 文档回答了您的问题。
https://github.com/boto/boto/blob/develop/docs/source/s3_tut.rst
https://github.com/boto/boto/blob/develop/docs/source/s3_tut.rst
Moving files from one bucket to another via boto is effectively a copy of the keys from source to destination and then removing the key from source.
通过 boto 将文件从一个存储桶移动到另一个存储桶实际上是将密钥从源复制到目标,然后从源中删除密钥。
You can get access to the buckets:
您可以访问存储桶:
import boto
c = boto.connect_s3()
src = c.get_bucket('my_source_bucket')
dst = c.get_bucket('my_destination_bucket')
and iterate the keys:
并迭代密钥:
for k in src.list():
# copy stuff to your destination here
dst.copy_key(k.key.name, src.name, k.key.name)
# then delete the source key
k.delete()
See also: Is it possible to copy all files from one S3 bucket to another with s3cmd?
回答by SathishVenkat
Bucket name must be string not bucket object. Below change worked for me
存储桶名称必须是字符串而不是存储桶对象。以下更改对我有用
for k in src.list():
dst.copy_key(k.key, src.name, k.key)
回答by Artem Fedosov
awscli does the job 30 times faster for me than boto coping and deleting each key. Probably due to multithreading in awscli. If you still want to run it from your python script without calling shell commands from it, you may try something like this:
awscli 对我来说比 boto 处理和删除每个键快 30 倍。可能是由于 awscli 中的多线程。如果你仍然想从你的 python 脚本运行它而不从它调用 shell 命令,你可以尝试这样的事情:
Install awscli python package:
安装 awscli python 包:
sudo pip install awscli
And then it is as simple as this:
然后就这么简单:
import os
if os.environ.get('LC_CTYPE', '') == 'UTF-8':
os.environ['LC_CTYPE'] = 'en_US.UTF-8'
from awscli.clidriver import create_clidriver
driver = create_clidriver()
driver.main('s3 mv source_bucket target_bucket --recursive'.split())
回答by David Arenburg
回答by Tomasz Wojcik
If you want to
如果你想
Create a copy of an object that is already stored in Amazon S3.
创建已存储在 Amazon S3 中的对象的副本。
then copy_objectis the way to go in boto3.
然后copy_object是进入 boto3 的方法。
How I do it:
我是怎么做的:
import boto3
aws_access_key_id = ""
aws_secret_access_key = ""
bucket_from = ""
bucket_to = ""
s3 = boto3.resource(
's3',
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key
)
src = s3.Bucket(bucket_from)
def move_files():
for archive in src.objects.all():
# filters on archive.key might be applied here
s3.meta.client.copy_object(
ACL='public-read',
Bucket=bucket_to,
CopySource={'Bucket': bucket_from, 'Key': archive.key},
Key=archive.key
)
move_files()
回答by agrawalramakant
If you have 2 different buckets with different access credentials. Store the credentials accordingly in credentials and config files under ~/.aws folder.
如果您有 2 个具有不同访问凭证的不同存储桶。将凭据相应地存储在 ~/.aws 文件夹下的凭据和配置文件中。
you can use the following to copy object from one bucket with different credentials and then save the object in the other bucket with different credentials:
您可以使用以下命令从一个具有不同凭据的存储桶复制对象,然后将对象保存在具有不同凭据的另一个存储桶中:
import boto3
session_src = boto3.session.Session(profile_name=<source_profile_name>)
source_s3_r = session_src.resource('s3')
session_dest = boto3.session.Session(profile_name=<dest_profile_name>)
dest_s3_r = session_dest.resource('s3')
# create a reference to source image
old_obj = source_s3_r.Object(<source_s3_bucket_name>, <prefix_path> + <key_name>)
# create a reference for destination image
new_obj = dest_s3_r.Object(<dest_s3_bucket_name>, old_obj.key)
# upload the image to destination S3 object
new_obj.put(Body=old_obj.get()['Body'].read())
Both bucket do not need to have accessibility from each other in the ACL or the bucket policies.
两个存储桶不需要在 ACL 或存储桶策略中彼此具有可访问性。
回答by Ganesh Kharad
This is code I used to move files within sub-directories of a s3 bucket
这是我用来在 s3 存储桶的子目录中移动文件的代码
# =============================================================================
# CODE TO MOVE FILES within subfolders in S3 BUCKET
# =============================================================================
from boto3.session import Session
ACCESS_KEY = 'a_key'
SECRET_KEY = 's_key'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')#creating session of S3 as resource
s3client = session.client('s3')
resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub_folder/', Delimiter="/")
forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]#here we got all files list (max limit is 1000 at a time)
reload_no = 0
while len(forms2_dw) != 0 :
#resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub_folder/', Delimiter="/")
#with open('dw_bucket.json','w') as f:
# resp_dws =str(resp_dw)
# f.write(json.dumps(resp_dws))
#forms_dw = [x['Prefix'] for x in resp_dw['CommonPrefixes']]
#forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]
#forms2_dw[-1]
total_files = len(forms2_dw)
#i=0
for i in range(total_files):
#zip_filename='1819.zip'
foldername = resp_dw['Contents'][1:][i]['LastModified'].strftime('%Y%m%d')#Put your logic here for folder name
my_bcket = 'main_bucket'
my_file_old = resp_dw['Contents'][1:][i]['Key'] #file to be copied path
zip_filename =my_file_old.split('/')[-1]
subpath_nw='new_sub_folder/'+foldername+"/"+zip_filename #destination path
my_file_new = subpath_nw
#
print str(reload_no)+ '::: copying from====:'+my_file_old+' to :====='+s3_archive_subpath_nw
#print my_bcket+'/'+my_file_old
if zip_filename[-4:] == '.zip':
s3.Object(my_bcket,my_file_new).copy_from(CopySource=my_bcket+'/'+my_file_old)
s3.Object(my_bcket,my_file_old).delete()
print str(i)+' files moved of '+str(total_files)
resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub-folder/', Delimiter="/")
forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]
reload_no +=1