bash 找到通过管道传输到 zcat 然后到 head 的结果
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3340349/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
find results piped to zcat and then to head
提问by furedde
I'm trying to search for a certain string in a lot of gziped csv files, the string is located at the first row and my thought was to get the first row of each file by combining find, zcat and head. But I can't get them to work together.
我试图在许多 gzip csv 文件中搜索某个字符串,该字符串位于第一行,我的想法是通过组合 find、zcat 和 head 来获取每个文件的第一行。但我不能让他们一起工作。
$find . -name "*.gz" -print | xargs zcat -f | head -1
20051114083300,1070074.00,0.00000000
xargs: zcat: terminated by signal 13
example file:
$zcat 113.gz | head
20050629171845,1069335.50,-1.00000000
20050629171930,1069315.00,-1.00000000
20050629172015,1069382.50,-1.00000000
.. and 2 milion rows like these ...
Though I solved the problem by writing a bash script, iterating over the files and writing to a temp file, it would be great to know what I did wrong, how to do it, and if there might be other ways to go about it.
虽然我通过编写 bash 脚本、迭代文件并写入临时文件解决了这个问题,但知道我做错了什么、如何做以及是否有其他方法可以解决这个问题会很棒。
采纳答案by Paused until further notice.
You should find that this will work:
你应该会发现这会起作用:
find . -name "*.gz" | while read -r file; do zcat -f "$file" | head -n 1; done
回答by msw
It worked as you asked it to.
它按您的要求工作。
headdid its job, printed one line, and exited. zcatthen running under the auspices of xargstried to write to a closed pipe and received a fatal SIGPIPE for its efforts. Having its child die, xargs reported the whyfor.
head完成它的工作,打印一行,然后退出。zcat然后在xargs尝试写入封闭管道的主持下运行,并因其努力收到了致命的 SIGPIPE。孩子死了,xargs 报告了原因。
To get the desired behaviour, you'd need to find -exec ...construction or a custom zheadto give to xargs.
要获得所需的行为,您需要find -exec ...构造或自定义zhead以提供给 xargs。
added junk code I found behind the fridge:
添加了我在冰箱后面发现的垃圾代码:
#!/usr/bin/python
"""zhead - poor man's zcat file... | head -n
no argument error checking, prefers to continue in the face of
IO errors, with diagnostic to stderr
sample usage: find ... | xargs zhead.py -1"""
import gzip
import sys
if sys.argv[1].startswith('-'):
nlines = int(sys.argv[1][1:])
start = 2
else:
nlines = 10
start = 1
for zfile in sys.argv[start:]:
try:
zin = gzip.open(zfile)
for i in range(nlines):
line = zin.readline()
if not line:
break
print line,
except Exception as err:
print >> sys.stderr, zfile, err
finally:
try:
zin.close()
except:
pass
It processed 10k files in /usr/share/man in about a minute.
它在大约一分钟内处理了 /usr/share/man 中的 10k 个文件。
回答by Ole Tange
If you have GNU Parallel http://www.gnu.org/software/parallel/installed:
如果您安装了 GNU Parallel http://www.gnu.org/software/parallel/:
find . -name '*.gz' | parallel 'zcat {} | head -n1'
Watch the intro video to GNU Parallel at http://www.youtube.com/watch?v=OpaiGYxkSuQ
在http://www.youtube.com/watch?v=OpaiGYxkSuQ 上观看 GNU Parallel 的介绍视频
回答by ghostdog74
zcat -r * 2>/dev/null | awk -vRS= -vFS="\n" '{print }'

