Linux 带所有内核的 Gzip

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4341442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 00:12:25  来源:igfitidea点击:

Gzip with all cores

linuxbashgzip

提问by User1

I have a set of servers filled each with a bunch of files that can be gzipped. The servers all have different numbers of cores. How can I write a bash script to launch a gzip for each core and make sure the gzips are not zipping the same file?

我有一组服务器,每个服务器都装满了一堆可以 gzip 的文件。服务器都有不同数量的内核。如何编写 bash 脚本来为每个核心启动 gzip 并确保 gzip 没有压缩同一个文件?

采纳答案by Demosthenex

If you are on Linux, you can use GNU's xargs to launch as many processes as you have cores.

如果您使用的是 Linux,则可以使用 GNU 的 xargs 来启动与内核数量相同的进程。

CORES=$(grep -c '^processor' /proc/cpuinfo)
find /source -type f -print0 | xargs -0 -n 1 -P $CORES gzip -9
  • find -print0 / xargs -0 protects you from whitespace in filenames
  • xargs -n 1 means one gzip process per file
  • xargs -P specifies the number of jobs
  • gzip -9 means maximum compression
  • find -print0 / xargs -0 保护您免受文件名中的空白
  • xargs -n 1 表示每个文件一个 gzip 进程
  • xargs -P 指定作业数
  • gzip -9 表示最大压缩

回答by Gangadhar

You might want to consider checking GNU parallel. I also found this video on youtubewhich seems to do what you are looking for.

您可能需要考虑检查GNU parallel。我还在youtube 上找到了这个视频,它似乎可以满足您的需求。

回答by David Yaw

There is an implementation of gzip that is multithreaded, pigz. Since it is compressing one file on multiple threads, it should be able to read from disk more efficiently, compared to compressing multiple files at once.

有一个多线程的 gzip 实现,pigz。由于它在多个线程上压缩一个文件,因此与一次压缩多个文件相比,它应该能够更有效地从磁盘读取。