bash 如何将大型 tar.gz 文件的内容通过管道传输到 STDOUT?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34176788/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to pipe contents of large tar.gz file to STDOUT?
提问by 719016
I have a large.tar.gz
file containing about 1 million files, out of which about 1/4 of them are html files, and I want to parse a few lines of each of the html files within.
我有一个large.tar.gz
包含大约 100 万个文件的文件,其中大约 1/4 是 html 文件,我想解析其中每个 html 文件的几行。
I want to avoid having to extract the contents of large large.tar.gz
into a folder and then parse the html files, instead I would like to know how can I pipe the contents of the html files in the large.tar.gz
straight to STDOUT
so that I can grep/parse out the information I want from them?
我想,以避免对大型内容解压large.tar.gz
到一个文件夹,然后解析HTML文件,而不是我想知道我怎么能管的了HTML文件的内容,large.tar.gz
直STDOUT
让我可以grep /解析出我想从他们那里得到信息?
I presume there must be some magic like:
我想一定有一些魔法,比如:
tar -special_flags large.tar.gz | grep_only_files_with_extension html | xargs -n1 head -n 99999 | ./parse_contents.pl -
Any ideas?
有任何想法吗?
回答by Cyrus
Use this with GNU tar to extract a tgz to stdout:
将此与 GNU tar 一起使用以将 tgz 提取到标准输出:
tar -xOzf large.tar.gz --wildcards '*.html' | grep ...
-O, --to-stdout
: extract files to standard output
-O, --to-stdout
: 提取文件到标准输出