bash 如何将大型 tar.gz 文件的内容通过管道传输到 STDOUT?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34176788/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 14:01:00  来源:igfitidea点击:

how to pipe contents of large tar.gz file to STDOUT?

bash

提问by 719016

I have a large.tar.gzfile containing about 1 million files, out of which about 1/4 of them are html files, and I want to parse a few lines of each of the html files within.

我有一个large.tar.gz包含大约 100 万个文件的文件,其中大约 1/4 是 html 文件,我想解析其中每个 html 文件的几行。

I want to avoid having to extract the contents of large large.tar.gzinto a folder and then parse the html files, instead I would like to know how can I pipe the contents of the html files in the large.tar.gzstraight to STDOUTso that I can grep/parse out the information I want from them?

我想,以避免对大型内容解压large.tar.gz到一个文件夹,然后解析HTML文件,而不是我想知道我怎么能管的了HTML文件的内容,large.tar.gzSTDOUT让我可以grep /解析出我想从他们那里得到信息?

I presume there must be some magic like:

我想一定有一些魔法,比如:

tar -special_flags large.tar.gz | grep_only_files_with_extension html | xargs -n1 head -n 99999 | ./parse_contents.pl -

Any ideas?

有任何想法吗?

回答by Cyrus

Use this with GNU tar to extract a tgz to stdout:

将此与 GNU tar 一起使用以将 tgz 提取到标准输出:

tar -xOzf large.tar.gz --wildcards '*.html' | grep ...


-O, --to-stdout: extract files to standard output

-O, --to-stdout: 提取文件到标准输出