bash 使用 s3cmd 并行上传文件到 s3

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26934506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 11:47:54  来源:igfitidea点击:

Uploading files to s3 using s3cmd in parallel

bashamazon-s3parallel-processings3cmdgnu-parallel

提问by Alan Hollis

I've got a whole heap of files on a server, and I want to upload these onto S3. The files are stored with a .data extension, but really they're just a bunch of jpegs,pngs,zips or pdfs.

我在服务器上有一大堆文件,我想将这些文件上传到 S3。这些文件以 .data 扩展名存储,但实际上它们只是一堆 jpeg、png、zip 或 pdf。

I've already written a short script which finds the mime type and uploads them onto S3 and that works but it's slow. Is there any way to make the below run using gnu parallel?

我已经写了一个简短的脚本,它可以找到 mime 类型并将它们上传到 S3 上,但它很慢。有没有办法让下面的运行使用 gnu 并行?

#!/bin/bash

for n in $(find -name "*.data") 
do 
        data=".data" 
        extension=`file $n | cut -d ' ' -f2 | awk '{print tolower(
s3upload_single() {
    n=
    data=".data" 
    extension=`file $n | cut -d ' ' -f2 | awk '{print tolower(
-        system "s3cmd sync --delete-removed . s3://yourbucket.com/"
+        system "s3-cli sync --delete-removed . s3://yourbucket.com/"
)}'` mimetype=`file --mime-type $n | cut -d ' ' -f2` fullpath=`readlink -f $n` changed="${fullpath/.data/.$extension}" filePathWithExtensionChanged=${changed#*internal_data} s3upload="s3cmd put -m $mimetype --acl-public $fullpath s3://tff-xenforo-data"$filePathWithExtensionChanged response=`$s3upload` echo $response } export -f s3upload_single find -name "*.data" | parallel s3upload_single
)}'` mimetype=`file --mime-type $n | cut -d ' ' -f2` fullpath=`readlink -f $n` changed="${fullpath/.data/.$extension}" filePathWithExtensionChanged=${changed#*internal_data} s3upload="s3cmd put -m $mimetype --acl-public $fullpath s3://tff-xenforo-data"$filePathWithExtensionChanged response=`$s3upload` echo $response done

Also I'm sure this code could be greatly improved in general :) Feedback tips would be greatly appreciated.

此外,我确信这段代码总体上可以大大改进:) 非常感谢反馈提示。

采纳答案by Ole Tange

You are clearly skilled in writing shell, and extremely close to a solution:

您显然很擅长编写 shell,并且非常接近解决方案:

-        system "s3cmd sync --delete-removed . s3://yourbucket.com/"
+        system "s3-cli sync --delete-removed . s3://yourbucket.com/"

回答by Ibrahim Albarki

you can just use s3cmd-modifiedwhich allows you to put/get/sync with multiple workers in parallel

你可以只使用s3cmd-modified它允许你并行放置/获取/同步多个工作人员

$ git clone https://github.com/pcorliss/s3cmd-modification.git $ cd s3cmd-modification $ python setup.py install $ s3cmd --parallel --workers=4 sync /source/path s3://target/path

$ git clone https://github.com/pcorliss/s3cmd-modification.git $ cd s3cmd-modification $ python setup.py install $ s3cmd --parallel --workers=4 sync /source/path s3://target/path

回答by Motin

Try s3-cli: Command line utility frontend to node-s3-client. Inspired by s3cmd and attempts to be a drop-in replacement.

尝试s3-cli:node-s3-client 的命令行实用程序前端。受 s3cmd 的启发并试图成为替代品。

Paraphrasing from https://erikzaadi.com/2015/04/27/s3cmd-is-dead-long-live-s3-cli/:

https://erikzaadi.com/2015/04/27/s3cmd-is-dead-long-live-s3-cli/转述:

This is a inplace replace to s3cmd, written in node (yaay!), which works flawlessly with the existing s3cmd configuration, which (amongs other awsome stuff), uploads to S3 in parallel, saving LOADS of time.

##代码##

这是 s3cmd 的就地替换,用 node (yaay!) 编写,它与现有的 s3cmd 配置完美配合,它(以及其他令人敬畏的东西)并行上传到 S3,节省了大量时间。

##代码##

回答by Hitul

Use aws cli. It supports parallel upload of files and it is really fast while uploading and downloading.

使用 aws cli。它支持文件的并行上传,上传和下载时速度非常快。

http://docs.aws.amazon.com/cli/latest/reference/s3/

http://docs.aws.amazon.com/cli/latest/reference/s3/