bash 如何尽可能快地复制文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22903743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 10:08:09  来源:igfitidea点击:

How to copy files as fast as possible?

linuxbashunixubuntuscp

提问by john

I am running my shell script on machineAwhich copies the files from machineBand machineCto machineA.

我运行我的shell脚本上machineA创建一个副本文件从machineBmachineCmachineA

If the file is not there in machineB, then it should be there in machineCfor sure. So I will try to copy from machineBfirst, if it is not there in machineBthen I will go to machineCto copy the same files.

如果文件不在 中machineB,那么它machineC肯定应该在那里。所以我会先尝试复制machineB,如果它不在那里,machineB那么我会去machineC复制相同的文件。

In machineBand machineCthere will be a folder like this YYYYMMDDinside this folder -

machineBmachineC会有这样的文件夹,YYYYMMDD这个文件夹里面-

/data/pe_t1_snapshot

So whatever date is the latest date in this format YYYYMMDDinside the above folder - I will pick that folder as the full path from where I need to start copying the files -

因此,无论日期是YYYYMMDD上述文件夹中这种格式的最新日期,我都会选择该文件夹作为我需要开始复制文件的完整路径-

so suppose if this is the latest date folder 20140317inside /data/pe_t1_snapshotthen this will be the full path for me -

所以假设这是20140317里面的最新日期文件夹,/data/pe_t1_snapshot那么这将是我的完整路径 -

/data/pe_t1_snapshot/20140317

from where I need to start copying the files in machineBand machineC. I need to copy around 400files in machineAfrom machineBand machineCand each file size is 1.5 GB.

从那里我需要开始复制文件machineBmachineC。我需要从和 中复制400文件,每个文件大小为.machineAmachineBmachineC1.5 GB

Currently I have my below shell script which works fine as I am using scpbut somehow it takes ~2 hoursto copy the 400files in machineA which is too long for me I guess. :(

目前我有我的下面的 shell 脚本,它在我使用时运行良好,scp但不知何故它需要 ~2 hours复制400machineA 中的文件,我猜这对我来说太长了。:(

Below is my shell script -

下面是我的 shell 脚本 -

#!/bin/bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot
PRIMARY_PARTITION=(0 3 5 7 9) # this will have more file numbers around 200
SECONDARY_PARTITION=(1 2 4 6 8) # this will have more file numbers around 200

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

echo $dir1
echo $dir2

if [ "$dir1" = "$dir2" ]
then
    # delete all the files first
    find "$PRIMARY" -mindepth 1 -delete
    for el in "${PRIMARY_PARTITION[@]}"
    do
        scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.
    done

    # delete all the files first
    find "$SECONDARY" -mindepth 1 -delete
    for sl in "${SECONDARY_PARTITION[@]}"
    do
        scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.
    done
fi

I am copying PRIMARY_PARTITIONfiles in PRIMARYfolder and SECONDARY_PARTITIONfiles in SECONDARYfolder in machineA.

我正在复制PRIMARY_PARTITION文件PRIMARY夹中的文件和SECONDARY_PARTITION文件SECONDARY夹中的文件machineA

Is there any way to move the files faster in machineA. Can I copy 10 files at a time or 5 files at a time in parallel to speed up this process or any other approach?

有什么办法可以在machineA. 我可以一次复制 10 个文件或并行复制 5 个文件以加快此过程或任何其他方法吗?

NOTE: machineAis running on SSD

注意:machineA正在运行SSD

UPDATE:-

更新:-

Parallel Shell Script which I tried, top portion of shell script is same as shown above.

我尝试过的 Parallel Shell Script,shell 脚本的顶部与上图相同。

if [ "$dir1" = "$dir2" ] && [ "$length1" -gt 0 ] && [ "$length2" -gt 0 ]
then
    find "$PRIMARY" -mindepth 1 -delete
    for el in "${PRIMARY_PARTITION[@]}"
    do
        (scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.) &
          WAITPID="$WAITPID $!"        
    done

    find "$SECONDARY" -mindepth 1 -delete
    for sl in "${SECONDARY_PARTITION[@]}"
    do
        (scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.) &
          WAITPID="$WAITPID $!"        
    done
     wait $WAITPID
     echo "All files done copying."
fi

Errors I got with parallel shell script-

我在并行 shell 脚本中遇到的错误 -

channel 24: open failed: administratively prohibited: open failed
channel 25: open failed: administratively prohibited: open failed
channel 26: open failed: administratively prohibited: open failed
channel 28: open failed: administratively prohibited: open failed
channel 30: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 32: open failed: administratively prohibited: open failed
channel 36: open failed: administratively prohibited: open failed
channel 37: open failed: administratively prohibited: open failed
channel 38: open failed: administratively prohibited: open failed
channel 40: open failed: administratively prohibited: open failed
channel 46: open failed: administratively prohibited: open failed
channel 47: open failed: administratively prohibited: open failed
channel 49: open failed: administratively prohibited: open failed
channel 52: open failed: administratively prohibited: open failed
channel 54: open failed: administratively prohibited: open failed
channel 55: open failed: administratively prohibited: open failed
channel 56: open failed: administratively prohibited: open failed
channel 57: open failed: administratively prohibited: open failed
channel 59: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 61: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 64: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 68: open failed: administratively prohibited: open failed
channel 72: open failed: administratively prohibited: open failed
channel 74: open failed: administratively prohibited: open failed
channel 76: open failed: administratively prohibited: open failed
channel 78: open failed: administratively prohibited: open failed

回答by oohcode

you can try this command

你可以试试这个命令

rsync

from the

来自

man rsync

you will see that: The rsync remote-update protocol allows rsync to transfer just the differences between two sets of files across the network connection, using an efficient checksum-search algorithm described in the technical report that accompanies this package.

您将看到: rsync 远程更新协议允许 rsync 仅通过网络连接传输两组文件之间的差异,使用此包随附的技术报告中描述的高效校验和搜索算法。

回答by osgx

You may try the HPN-SSH (High Performance SSH/SCP) - http://www.psc.edu/index.php/hpn-sshor http://hpnssh.sourceforge.net/

您可以尝试 HPN-SSH(高性能 SSH/SCP) - http://www.psc.edu/index.php/hpn-sshhttp://hpnssh.sourceforge.net/

The HPN-SSH project is the set of patches for OpenSSH (scp is part of it), to better tune various tcp and internal buffers. There is also "none" cipher ("None Cipher Switching") which disables encryption, and this may help you too (if you don't use public networks to send the data).

HPN-SSH 项目是 OpenSSH 的补丁集(scp 是其中的一部分),以更好地调整各种 tcp 和内部缓冲区。还有禁用加密的“无”密码(“无密码切换”),这也可能对您有所帮助(如果您不使用公共网络发送数据)。

Both compression and encryption consumes CPU time; and 10 Gbit Ethernet sometimes may be faster to transfer uncompressed file then waiting CPU to compress and encrypt it.

压缩和加密都消耗 CPU 时间;有时 10 Gbit 以太网传输未压缩文件可能会更快,然后等待 CPU 对其进行压缩和加密。

You may profile your setup:

您可以配置您的设置:

  • Measure the network bandwidth between machines using iperfor netperf. Compare with the actual network (network cards capabilities, switches). With good setup you should get more than 80-90 percents of declared speed.
  • Calculate data volume and the time needed to transfer so much data with your network using speed from iperfor netperf. Compare with actual transfer time, is there huge difference?
    • If your CPU is fast, data is compressible and network is slow, compressing will help you.
  • Take a look on top, vmstat, iostat.
    • Are there 100% loaded CPU cores (run topand press 1to see cores)?
    • Are there too much interrupts (in) in vmstat 1? What about context switches (cs)?
    • What is file reading speed in iostat 1? Are your HDDs are fast enough to read data; to write data on receiver?
  • You can try to do full-system profiling using perf topor perf record -a. Is there lot of computing by scp, or network stack in Linux? If you can install dtraceor ktap, try to make also off-cpu profiling
  • 使用iperf或测量机器之间的网络带宽netperf。与实际网络(网卡功能、交换机)进行比较。通过良好的设置,您应该可以获得超过 80-90% 的声明速度。
  • 使用iperf或 的速度计算数据量和通过网络传输如此多数据所需的时间netperf。与实际中转时间相比,是否存在巨大差异?
    • 如果您的 CPU 速度快、数据可压缩且网络速度慢,那么压缩会对您有所帮助。
  • 看看top, vmstat, iostat
    • 是否有 100% 加载的 CPU 内核(运行top并按下1以查看内核)?
    • 中的中断( in)是否过多vmstat 1?上下文切换 ( cs) 怎么样?
    • 什么是文件读取速度iostat 1?您的 HDD 是否足够快以读取数据;在接收器上写入数据?
  • 您可以尝试使用perf top或进行全系统分析perf record -a。Linux 中是否有大量的 scp 或网络堆栈计算?如果您可以安装dtracektap,请尝试进行CPU 外分析

回答by hdante

You have 1.5 GB * 400 = 600 GB of data. Unrelated to the answer I suggest that the machine set up looks incorrect if you need to transfer this amount of data. You probably needed to generate this data at machine A in the first place.

您有 1.5 GB * 400 = 600 GB 的数据。与答案无关,如果您需要传输如此大量的数据,我建议机器设置看起来不正确。您可能首先需要在机器 A 上生成这些数据。

There are 600 GB of data being transferred in 2 hours, that is ~ 85 MB/s transfer rate, which means you probably reached the transfer limits of either your disk drives or (almost) the network. I believe you won't be able to transfer faster with any other command.

2 小时内传输了 600 GB 的数据,即大约 85 MB/s 的传输速率,这意味着您可能达到了磁盘驱动器或(几乎)网络的传输限制。我相信您将无法使用任何其他命令更快地传输。

If the machines are close to each other, the method of copying that I believe is the fastest is to physically remove the storage from machines B and C, put them in machine A and then locally copy them without transferring via the network. The time for this is the time to move around the storage, plus disk transfer times. I'm afraid, however, the copy won't be much faster than 85 MB/s.

如果机器彼此靠近,我认为最快的复制方法是从机器 B 和 C 物理移除存储,将它们放在机器 A 中,然后本地复制它们,无需通过网络传输。这个时间是移动存储的时间,加上磁盘传输时间。但是,恐怕复制速度不会超过 85 MB/s。

The network transfer command that I believe would be the fastest one is netcat, because it has no overhead related to encryption. Additionally, if the files are not media files, you have to compress them using a compressor that compresses faster than 85 MB/s. I know of lzop and lz4 that are granted to be faster than this rate. So my command line for transfering a single directory would be (BSD netcat syntax):

我认为最快的网络传输命令是 netcat,因为它没有与加密相关的开销。此外,如果文件不是媒体文件,则必须使用压缩速度超过 85 MB/s 的压缩器来压缩它们。我知道 lzop 和 lz4 被授予比这个速度更快的速度。所以我用于传输单个目录的命令行是(BSD netcat 语法):

machine A:

机器A:

$ nc -l 2000 | lzop -d | tar x

machine B or C (can be executed from machine A with the help of ssh):

机器 B 或 C(可以在 ssh 的帮助下从机器 A 执行):

$ tar c directory | lzop | nc machineA 2000

Remove the compressor if transfering media files, which are already compressed.

如果传输已经压缩的媒体文件,请移除压缩器。

The commands to organize your directory structure are irrelevant in terms of speed, so I didn't bother to write them here, but you can reuse your own code.

组织目录结构的命令与速度无关,因此我没有费心在这里编写它们,但是您可以重用自己的代码。

This is the fastest method I can think of, but, again, I don't believe this command will be much faster that what you already have.

这是我能想到的最快的方法,但是,同样,我不相信这个命令会比你已经拥有的更快。

回答by Frédéric N.

You definitely want to give rclonea try. This thing is crazy fast :

你肯定想试试rclone。这东西快疯了:

sudo rclone sync /usr /home/fred/temp -P -L --transfers 64

须藤 rclone 同步 /usr /home/fred/temp -P -L --transfers 64

Transferred: 17.929G / 17.929 GBytes, 100%, 165.692 MBytes/s, ETA 0s Errors: 75 (retrying may help) Checks: 691078 / 691078, 100% Transferred: 345539 / 345539, 100% Elapsed time: 1m50.8s

传输:17.929G / 17.929 GBytes, 100%, 165.692 MBytes/s, ETA 0s 错误:75(重试可能有帮助)检查:691078 / 691078,100% 传输:345455039% ETA 0s

This is a local copy from and to a LITEONIT LCS-256 (256GB) SSD.

这是来自 LITEONIT LCS-256 (256GB) SSD 的本地副本。

回答by Jeff Sheffield

rsyncis a good answer, but if you care about security then you should consider using:

rsync是一个很好的答案,但如果您关心安全性,那么您应该考虑使用:

rdist

Some details on the differences between rsync and rdist can be found here: rdist vs rsyncand a blog about how to set it up using ssh can be found here: non root remote updating

可以在此处找到有关 rsync 和 rdist 之间差异的一些详细信息: rdist vs rsync以及有关如何使用 ssh 进行设置的博客,请参见此处:非 root 远程更新

Finally you could use the infamous tar pipe tar pattern, with a sprinkle of ssh.

最后,您可以使用臭名昭著的 tar pipe tar 模式,并添加一些 ssh。

tar zcvf - /wwwdata | ssh [email protected] "cat > /backup/wwwdata.tar.gz"

This example is talked about here: tar copy over secure network

这个例子在这里讨论:tar copy over secure network

回答by Tom Hale

The remote doesn't support ssh multiplexing.

遥控器不支持 ssh 多路复用。

To silence the message:

要使消息静音:

mux_client_request_session: session request failed: Session open refused by peer

Change your ~/.ssh/configfile:

更改您的~/.ssh/config文件:

Host destination.hostname.com
  ControlMaster no

Host *
  ControlMaster auto
  ControlPersist yes
  ControlPath ~/.ssh/socket-%r@%h:%p

More details and notes can be found here.

可以在此处找到更多详细信息和注释。

回答by Milad Abooali

rsync optionally compresses its data. That typically makes the transfer go much faster.

rsync 可选择压缩其数据。这通常会使传输速度更快。

You didn't mention SCP, but SCP -C also compresses.

你没有提到 SCP,但 SCP -C 也压缩了。

Do note that compression might make the transfer go faster or slower, depending upon the speed of your CPU and of your network link.

请注意,压缩可能会使传输速度更快或更慢,具体取决于您的 CPU 和网络链接的速度。

Slower links and faster CPU make compression a good idea; faster links and slower CPU make compression a bad idea.

更慢的链接和更快的 CPU 使压缩成为一个好主意;更快的链接和更慢的 CPU 使压缩成为一个坏主意。

As with any optimization, measure the results in your own environment.

与任何优化一样,在您自己的环境中衡量结果。

Also I think ftp is another option for you, as my transfer speed test for large files (>10M) FTP work faster then SCP and even rsync (It's depended on file format and compression rate).

此外,我认为 ftp 是您的另一种选择,因为我对大文件 (>10M) FTP 的传输速度测试比 SCP 甚至 rsync 工作得更快(这取决于文件格式和压缩率)。