bash 找到通过管道传输到 zcat 然后到 head 的结果

Question

提问by furedde

I'm trying to search for a certain string in a lot of gziped csv files, the string is located at the first row and my thought was to get the first row of each file by combining find, zcat and head. But I can't get them to work together.

我试图在许多 gzip csv 文件中搜索某个字符串，该字符串位于第一行，我的想法是通过组合 find、zcat 和 head 来获取每个文件的第一行。但我不能让他们一起工作。

$find . -name "*.gz" -print | xargs zcat -f | head -1
20051114083300,1070074.00,0.00000000
xargs: zcat: terminated by signal 13

example file:
$zcat 113.gz | head
20050629171845,1069335.50,-1.00000000
20050629171930,1069315.00,-1.00000000
20050629172015,1069382.50,-1.00000000
 .. and 2 milion rows like these ...

Though I solved the problem by writing a bash script, iterating over the files and writing to a temp file, it would be great to know what I did wrong, how to do it, and if there might be other ways to go about it.

虽然我通过编写 bash 脚本、迭代文件并写入临时文件解决了这个问题，但知道我做错了什么、如何做以及是否有其他方法可以解决这个问题会很棒。

Answer 1

采纳答案by Paused until further notice.

You should find that this will work:

你应该会发现这会起作用：

find . -name "*.gz" | while read -r file; do zcat -f "$file" | head -n 1; done

Answer 2

回答by msw

It worked as you asked it to.

它按您的要求工作。

headdid its job, printed one line, and exited. zcatthen running under the auspices of xargstried to write to a closed pipe and received a fatal SIGPIPE for its efforts. Having its child die, xargs reported the whyfor.

head完成它的工作，打印一行，然后退出。zcat然后在xargs尝试写入封闭管道的主持下运行，并因其努力收到了致命的 SIGPIPE。孩子死了，xargs 报告了原因。

To get the desired behaviour, you'd need to find -exec ...construction or a custom zheadto give to xargs.

要获得所需的行为，您需要find -exec ...构造或自定义zhead以提供给 xargs。

added junk code I found behind the fridge:

添加了我在冰箱后面发现的垃圾代码：

#!/usr/bin/python

"""zhead - poor man's zcat file... | head -n
   no argument error checking, prefers to continue in the face of
   IO errors, with diagnostic to stderr

   sample usage: find ... | xargs zhead.py -1"""

import gzip
import sys

if sys.argv[1].startswith('-'):
    nlines = int(sys.argv[1][1:])
    start = 2
else:
    nlines = 10
    start = 1

for zfile in sys.argv[start:]:
    try:
        zin = gzip.open(zfile)
        for i in range(nlines):
            line = zin.readline()
            if not line:
                break
            print line,
    except Exception as err:
        print >> sys.stderr, zfile, err
    finally:
        try:
            zin.close()
        except:
            pass

It processed 10k files in /usr/share/man in about a minute.

它在大约一分钟内处理了 /usr/share/man 中的 10k 个文件。

Answer 3

回答by Ole Tange

If you have GNU Parallel http://www.gnu.org/software/parallel/installed:

如果您安装了 GNU Parallel http://www.gnu.org/software/parallel/：

find . -name '*.gz' | parallel 'zcat {} | head -n1'

Watch the intro video to GNU Parallel at http://www.youtube.com/watch?v=OpaiGYxkSuQ

在http://www.youtube.com/watch?v=OpaiGYxkSuQ 上观看 GNU Parallel 的介绍视频

Answer 4

回答by ghostdog74

zcat -r * 2>/dev/null | awk -vRS= -vFS="\n" '{print }'

bash 找到通过管道传输到 zcat 然后到 head 的结果

提问by furedde

采纳答案by Paused until further notice.

回答by msw

回答by Ole Tange

回答by ghostdog74

相关推荐

最近更新

标签

bash 找到通过管道传输到 zcat 然后到 head 的结果

提问by furedde

采纳答案by Paused until further notice.

回答by msw

回答by Ole Tange

回答by ghostdog74

相关推荐

如何：从 shell 脚本检测 bash

提示用户选择一个带有 bash 脚本的目录并读取结果

在 bash 中，文件运算符 (-f) 可以不区分大小写吗？

bash 脚本能否判断它是否通过 cron 运行？

相关推荐

最近更新

标签