bash cpio VS tar 和 cp
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2966409/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
cpio VS tar and cp
提问by Tim
I just learned that cpio has three modes: copy-out, copy-in and pass-through.
刚刚了解到cpio有3种模式:copy-out、copy-in和pass-through。
I was wondering what are the advantages and disadvantages of cpio under copy-out and copy-in modes over tar. When is it better to use cpio and when to use tar?
我想知道在拷出和拷入模式下 cpio 比 tar 有什么优点和缺点。什么时候用cpio比较好,什么时候用tar?
Similar question for cpio under pass-through mode versus cp.
传递模式下 cpio 与 cp 的类似问题。
Thanks and regards!
感谢致敬!
回答by Adam Katz
I see no reason to use cpio for any reason other than ripping opened RPM files, via disrpmor rpm2cpio, but there may be corner cases in which cpio is preferable to tar.
除了通过disrpm或rpm2cpio翻录打开的 RPM 文件之外,我认为没有任何理由使用 cpio ,但在某些极端情况下,cpio 比 tar 更可取。
History and popularity
历史和流行
Both tarand cpioare competing archive formats that were introduced in Version 7 Unixin 1979 and then included in POSIX.1-1988, though only tar remained in the next standard, POSIX.1-20011.
无论焦油和的cpio都争相在中引入的归档格式版本的Unix 7在1979年,然后包括在POSIX.1-1988,虽然只是焦油留在一个标准,POSIX.1-2001 1。
Cpio's file format has changed several times and has not remained fully compatible between versions. For example, there is now an ASCII-encoded representation of binary file information data.
cpio 的文件格式已经改变了几次,并且在版本之间没有保持完全兼容。例如,现在有二进制文件信息数据的 ASCII 编码表示。
Tar is more universally known, has become more versatile over the years, and is more likely to be supported on a given system. Cpio is still used in a few areas, such as the Red Hat packageformat (RPM), though RPM v5(which is admittedly obscure) uses xarinstead of cpio.
Tar 更广为人知,多年来变得更加通用,并且更有可能在给定系统上得到支持。cpio 仍在少数领域使用,例如Red Hat 包格式 (RPM),尽管RPM v5(公认的晦涩)使用xar而不是 cpio。
Both live on most Unix-like systems, though tar is more common. Here are Debian's install stats:
两者都存在于大多数类 Unix 系统上,但 tar 更为常见。以下是Debian 的安装统计信息:
#rank name inst vote old recent no-files (maintainer)
13 tar 189206 172133 3707 13298 68 (Bdale Garbee)
61 cpio 189028 71664 96346 20920 98 (Anibal Monsalve Salazar)
Modes
模式
Copy-out: This is for archive creation, akin to tar -pc
Copy-out:这是用于存档创建,类似于tar -pc
Copy-in: This is for archive extraction, akin to tar -px
Copy-in:这是用于存档提取,类似于tar -px
Pass-through: This is basically both of the above, akin to tar -pc … |tar -pxbut in a single command (and therefore microscopically faster). It's similar to cp -pdr, though both cpio and (especially) tar have more customizability. Also consider rsync -a, which people often forget since it's more typically used across a network connection.
直通:这基本上是上述两种,类似于tar -pc … |tar -px但在单个命令中(因此在微观上更快)。它类似于cp -pdr,尽管 cpio 和(尤其是)tar 都有更多的可定制性。还要考虑rsync -a,人们经常忘记它,因为它更常用于网络连接。
I have not compared their performance, but I expect they'll be quite similar in CPU, memory, and archive size (after compression).
我没有比较它们的性能,但我预计它们在 CPU、内存和存档大小(压缩后)方面会非常相似。
回答by Ernest Montrose
TAR(1) is just as good as cpio() if not better. One can argue that it is , in fact, better than CPIO because it is ubiquitous and vetted. There's got to be a reason why we have tar balls everywhere.
如果不是更好的话,TAR(1) 与 cpio() 一样好。可以争辩说,它实际上比 CPIO 更好,因为它无处不在且经过。我们到处都有焦油球肯定是有原因的。
回答by Bubli Sagar
Why is cpio better than tar? A number of reasons.
为什么cpio比tar好?几个原因。
- cpio preserves hard links, which is important if you're using it for backups.
- cpio doesn't have that annoying filename length limitation. Sure, gnutar has a "hack" that allows you to use longer filenames (it creates a temporary file in which it stores the real name), but it's inherently not portable to non-gnu tar's.
- By default, cpio preserves timestamps
When scripting, it has much better control over which files are and are not copied, since you must explicitly list the files you want copied. For example, which of the following is easier to read and understand?
find . -type f -name '*.sh' -print | cpio -o | gzip >sh.cpio.gzor on Solaris:
find . -type f -name '*.sh' -print >/tmp/includeme tar -cf - . -I /tmp/includeme | gzip >sh.tar.gzor with gnutar:
find . -type f -name '*.sh' -print >/tmp/includeme tar -cf - . --files-from=/tmp/includeme | gzip >sh.tar.gzA couple of specific notes here: for large lists of files, you can't put find in reverse quotes; the command-line length will be overrun; you must use an intermediate file. Separate find and tar commands are inherently slower, since the actions are done serially.
Consider this more complex case where you want a tree completely packaged up, but some files in one tar, and the remaining files in another.
find . -depth -print >/tmp/files egrep '\.sh$' /tmp/files | cpio -o | gzip >with.cpio.gz egrep -v '\.sh$' /tmp/files | cpio -o | gzip >without.cpio.gzor under Solaris:
find . -depth -print >/tmp/files egrep '\.sh$' /tmp/files >/tmp/with tar -cf - . -I /tmp/with | gzip >with.tar.gz tar -cf - . /tmp/without | gzip >without.tar.gz ## ^^-- no there's no missing argument here. It's just empty that wayor with gnutar:
find . -depth -print >/tmp/files egrep '\.sh$' /tmp/files >/tmp/with tar -cf - . -I /tmp/with | gzip >with.tar.gz tar -cf - . -X /tmp/without | gzip >without.tar.gzAgain, some notes: Separate find and tar commands are inherently slower. Creating more intermediate files creates more clutter. gnutar feels a little cleaner, but the command-line options are inherently incompatible!
If you need to copy a lot of files from one machine to another in a hurry across a busy network, you can run multiple cpio's in parallel. For example:
find . -depth -print >/tmp/files split /tmp/files for F in /tmp/files?? ; do cat $F | cpio -o | ssh destination "cd /target && cpio -idum" & doneNote that it would help if you could split the input into even sized pieces. I created a utility called 'npipe' to do this. npipe would read lines from stdin, and create N output pipes and feed the lines to them as each line was consumed. This way, if the first entry was a large file that took 10 minutes to transfer and the rest were small files that took 2 minutes to transfer, you wouldn't get stalled waiting for the large file plus another dozen small files queued up behind it. This way you end up splitting by demand, not strictly by number of lines or bytes in the list of files. Similar functionality could be accomplished with gnu-xargs' parallel forking capability, except that puts arguments on the command-line instead of streaming them to stdin.
find . -depth -print >/tmp/files npipe -4 /tmp/files 'cpio -o | ssh destination "cd /target && cpio -idum"'How is this faster? Why not use NFS? Why not use rsync? NFS is inherently very slow, but more importantly, the use of any single tool is inherently single threaded. rsync reads in the source tree and writes to the destination tree one file at a time. If you have a multi processor machine (at the time I was using 16cpu's per machine), parallel writing became very important. I speeded the copy of a 8GB tree down to 30 minutes; that's 4.6MB/sec! Sure it sounds slow since a 100Mbit network can easily do 5-10MB/sec, but it's the inode creation time that makes it slow; there were easily 500,000 files in this tree. So if inode creation is the bottleneck, then I needed to parallelize that operation. By comparison, copying the files in a single-threaded manner would take 4 hours. That's 8x faster!
A secondary reason that this was faster is that parallel tcp pipes are less vulnerable to a lost packet here and there. If one pipe gets stalled because of a lost packet, the others will generally not be affected. I'm not really sure how much this made a difference, but for finely multi-threaded kernels, this can again be more efficient since the workload can be spread across all those idle cpu's
- cpio 保留硬链接,如果您将其用于备份,这一点很重要。
- cpio 没有那个烦人的文件名长度限制。当然,gnutar 有一个“hack”,它允许你使用更长的文件名(它创建一个临时文件来存储真实名称),但它本质上不能移植到非 gnu tar 的。
- 默认情况下,cpio 保留时间戳
在编写脚本时,它可以更好地控制复制和不复制哪些文件,因为您必须明确列出要复制的文件。例如,以下哪一项更容易阅读和理解?
find . -type f -name '*.sh' -print | cpio -o | gzip >sh.cpio.gz或在 Solaris 上:
find . -type f -name '*.sh' -print >/tmp/includeme tar -cf - . -I /tmp/includeme | gzip >sh.tar.gz或使用 gnutar:
find . -type f -name '*.sh' -print >/tmp/includeme tar -cf - . --files-from=/tmp/includeme | gzip >sh.tar.gz这里有几个特定的注意事项:对于大文件列表,您不能将 find 放在反引号中;命令行长度将被超限;您必须使用中间文件。单独的 find 和 tar 命令本质上更慢,因为这些操作是连续完成的。
考虑这种更复杂的情况,您希望将树完全打包,但将一些文件放在一个 tar 中,将其余文件放在另一个中。
find . -depth -print >/tmp/files egrep '\.sh$' /tmp/files | cpio -o | gzip >with.cpio.gz egrep -v '\.sh$' /tmp/files | cpio -o | gzip >without.cpio.gz或在 Solaris 下:
find . -depth -print >/tmp/files egrep '\.sh$' /tmp/files >/tmp/with tar -cf - . -I /tmp/with | gzip >with.tar.gz tar -cf - . /tmp/without | gzip >without.tar.gz ## ^^-- no there's no missing argument here. It's just empty that way或使用 gnutar:
find . -depth -print >/tmp/files egrep '\.sh$' /tmp/files >/tmp/with tar -cf - . -I /tmp/with | gzip >with.tar.gz tar -cf - . -X /tmp/without | gzip >without.tar.gz同样,一些注意事项:单独的 find 和 tar 命令本质上较慢。创建更多的中间文件会造成更多的混乱。gnutar 感觉更简洁一些,但命令行选项本质上是不兼容的!
如果您需要在繁忙的网络中快速将大量文件从一台机器复制到另一台机器,您可以并行运行多个 cpio。例如:
find . -depth -print >/tmp/files split /tmp/files for F in /tmp/files?? ; do cat $F | cpio -o | ssh destination "cd /target && cpio -idum" & done请注意,如果您可以将输入分成大小均匀的部分,这会有所帮助。我创建了一个名为“npipe”的实用程序来执行此操作。npipe 将从 stdin 读取行,并创建 N 个输出管道,并在每行被消耗时将这些行提供给它们。这样,如果第一个条目是一个需要 10 分钟传输的大文件,而其余是需要 2 分钟传输的小文件,您就不会在等待大文件以及在它后面排队的另外十几个小文件时停滞不前. 通过这种方式,您最终会按需求拆分,而不是严格按照文件列表中的行数或字节数进行拆分。使用 gnu-xargs 的并行分叉功能可以实现类似的功能,除了将参数放在命令行而不是将它们流式传输到标准输入之外。
find . -depth -print >/tmp/files npipe -4 /tmp/files 'cpio -o | ssh destination "cd /target && cpio -idum"'这如何更快?为什么不使用 NFS?为什么不使用 rsync?NFS 本质上很慢,但更重要的是,任何单一工具的使用本质上都是单线程的。rsync 读入源树并一次写入一个文件到目标树。如果您有一台多处理器机器(当时我每台机器使用 16cpu),并行写入变得非常重要。我将 8GB 树的复制速度缩短到 30 分钟;那是 4.6MB/秒!当然这听起来很慢,因为 100Mbit 的网络可以轻松地达到 5-10MB/秒,但正是 inode 创建时间让它变慢了;这棵树中很容易就有 500,000 个文件。因此,如果创建 inode 是瓶颈,那么我需要并行化该操作。相比之下,以单线程方式复制文件需要 4 个小时。快了 8 倍!
这样做更快的第二个原因是并行 tcp 管道不太容易受到这里和那里丢失的数据包的影响。如果一个管道由于丢失数据包而停止,其他管道通常不会受到影响。我不确定这有多大的不同,但是对于精细的多线程内核,这可以再次提高效率,因为工作负载可以分布在所有那些空闲的 CPU 上
In my experience, cpio does an overall better job than tar, as well as being more argument portable (arguments don't change between versions of cpio!), though it may not be found on some systems (not installed by default on RedHat), but then again Solaris doesn't come with gzip by default either.
根据我的经验,cpio 总体上比 tar 做得更好,并且参数更易于移植(参数在 cpio 版本之间不会改变!),尽管在某些系统上可能找不到它(RedHat 上默认未安装) ,但同样,Solaris 也默认不附带 gzip。

