bash 如何将大型 tar.gz 文件的内容通过管道传输到 STDOUT？

Question

提问by 719016

I have a large.tar.gzfile containing about 1 million files, out of which about 1/4 of them are html files, and I want to parse a few lines of each of the html files within.

我有一个large.tar.gz包含大约 100 万个文件的文件，其中大约 1/4 是 html 文件，我想解析其中每个 html 文件的几行。

I want to avoid having to extract the contents of large large.tar.gzinto a folder and then parse the html files, instead I would like to know how can I pipe the contents of the html files in the large.tar.gzstraight to STDOUTso that I can grep/parse out the information I want from them?

我想，以避免对大型内容解压large.tar.gz到一个文件夹，然后解析HTML文件，而不是我想知道我怎么能管的了HTML文件的内容，large.tar.gz直STDOUT让我可以grep /解析出我想从他们那里得到信息？

I presume there must be some magic like:

我想一定有一些魔法，比如：

tar -special_flags large.tar.gz | grep_only_files_with_extension html | xargs -n1 head -n 99999 | ./parse_contents.pl -

Any ideas?

有任何想法吗？

Answer 1

回答by Cyrus

Use this with GNU tar to extract a tgz to stdout:

将此与 GNU tar 一起使用以将 tgz 提取到标准输出：

tar -xOzf large.tar.gz --wildcards '*.html' | grep ...

-O, --to-stdout: extract files to standard output

-O, --to-stdout: 提取文件到标准输出

bash 如何将大型 tar.gz 文件的内容通过管道传输到 STDOUT？

提问by 719016

回答by Cyrus

相关推荐

最近更新

标签

bash 如何将大型 tar.gz 文件的内容通过管道传输到 STDOUT？

提问by 719016

回答by Cyrus

相关推荐

bash 用空格缩进 heredocs

将 bash 数组传递给 python 列表

git fetch - bash.exe：警告：找不到/tmp，请创建

在 bash 脚本中使用 ssh 密钥

相关推荐

最近更新

标签