Linux 在压缩文件上使用 sed

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6986946/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 05:34:48  来源:igfitidea点击:

Using sed on a compressed file

linuxshellsed

提问by learner

I have written a file processing program and now it needs to read from a zipped file(.gz unzipped file may get as large as 2TB),

我已经编写了一个文件处理程序,现在它需要从一个压缩文件中读取(.gz 解压缩文件可能会大到 2TB),

Is there a sed equivalent for zipped files like (zcat/cat) or else what would be the best approach to do the following efficiently

是否有类似 (zcat/cat) 之类的压缩文件的 sed 等效项,否则什么是有效执行以下操作的最佳方法

    ONE=`zcat filename.gz| sed -n $counts`

$counts : counter to read(line by line)

$counts :要读取的计数器(逐行)

The above method works, but is quite slow for large file as I need to read each line and perform the matching on certain fields.

上述方法有效,但对于大文件来说速度很慢,因为我需要读取每一行并对某些字段执行匹配。

Thanks

谢谢

EDIT

编辑

Though not directly helpful, here are a set of zcommands

虽然没有直接帮助,但这里有一组 zcommands

http://www.cyberciti.biz/tips/decompress-and-expand-text-files.html

http://www.cyberciti.biz/tips/decompress-and-expand-text-files.html

回答by C. Ramseyer

Well you either can have more speed (i.e. use uncompressed files) or more free space (i.e. use compressed files and the pipe you showed)... sorry. Using compressed files will always have an overhead.

好吧,您可以拥有更高的速度(即使用未压缩的文件)或更多的可用空间(即使用压缩文件和您显示的管道)...抱歉。使用压缩文件总是会有开销。

回答by Chris Stratton

If you understand the internal structure of the compression format it is possible that you could write a pattern matcher that can operate on compressed data without fully decompressing it, but instead by simply determining from the compressed data if the pattern would be present in a given piece of decompressed data.

如果您了解压缩格式的内部结构,则可能会编写一个模式匹配器,它可以在不完全解压压缩数据的情况下对压缩数据进行操作,而是通过简单地从压缩数据中确定该模式是否存在于给定片段中解压后的数据。

If the pattern has any complexity at all this sounds like quite a complicated project as you'd have to handle cases where the pattern could be satisfied by the combination of output from two (or more) separate pieces of decompression.

如果模式有任何复杂性,这听起来像是一个相当复杂的项目,因为您必须处理可以通过来自两个(或更多)独立解压缩部分的输出组合来满足模式的情况。