bash 带换行符的 Grep 搜索字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1858312/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Grep search strings with line breaks
提问by Vijay Dev
How to use grep to output occurrences of the string 'export to excel' in the input files given below? Specifically, how to handle the line breaks that happen in between the search strings? Is there a switch in grep that can do this or some other command probably?
如何使用 grep 在下面给出的输入文件中输出出现的字符串“export to excel”?具体来说,如何处理搜索字符串之间发生的换行符?grep 中是否有可以执行此操作或其他命令的开关?
Input files:
输入文件:
File a.txt:
文件a.txt:
blah blah ... export to
excel ...
blah blah..
等等等等......导出到
excel......
等等......
File b.txt:
文件 b.txt:
blah blah ... export to excel ...
blah blah..
等等等等......导出到excel......
等等......
回答by Laurence Gonsalves
Do you just want to find files that contain the pattern, ignoring linebreaks, or do you want to actually see the matching lines?
您是只想查找包含该模式的文件而忽略换行符,还是想实际查看匹配的行?
If the former, you can use tr
to convert newlines to spaces:
如果是前者,您可以使用tr
将换行符转换为空格:
tr '\n' ' ' | grep 'export to excel'
If the latter you can do the same thing, but you may want to use the -o flag to only print the actual match. You'll then want to adjust your regex to include any extra context you want.
如果后者你可以做同样的事情,但你可能想使用 -o 标志只打印实际匹配。然后,您需要调整正则表达式以包含您想要的任何额外上下文。
回答by steveha
I don't know how to do this in grep. I checked the man page for egrep(1)
and it can't match with a newline in the middle either.
我不知道如何在 grep 中做到这一点。我检查了手册页egrep(1)
,它也无法与中间的换行符匹配。
I like the solution @Laurence Gonsalves suggested, of using tr(1)
to wipe out the newlines. But as he noted, it will be a pain to print the matching lines if you do it that way.
我喜欢@Laurence Gonsalves 建议的解决方案,tr(1)
用于清除换行符。但正如他所指出的,如果你这样做,打印匹配的行会很痛苦。
If you want to match despite a newline and then print the matching line(s), I can't think of a way to do it with grep, but it would be not too hard in any of Python, AWK, Perl, or Ruby.
如果你想匹配换行符然后打印匹配的行,我想不出用 grep 来做的方法,但在 Python、AWK、Perl 或 Ruby 中的任何一个都不会太难.
Here's a Python script that solves the problem. I decided that, for lines that only match when joined to the previous line, I would print a -->
arrow before the second line of the match. Lines that match outright are always printed without the arrow.
这是一个解决问题的 Python 脚本。我决定,对于仅在连接到前一行时才匹配的行,我会-->
在匹配的第二行之前打印一个箭头。完全匹配的行总是不带箭头打印。
This is written assuming that /usr/bin/python is Python 2.x. You can trivially change the script to work under Python 3.x if desired.
这是假设 /usr/bin/python 是 Python 2.x 编写的。如果需要,您可以简单地更改脚本以在 Python 3.x 下工作。
#!/usr/bin/python
import re
import sys
s_pat = "export\s+to\s+excel"
pat = re.compile(s_pat)
def print_ete(fname):
try:
f = open(fname, "rt")
except IOError:
sys.stderr.write('print_ete: unable to open file "%s"\n' % fname)
sys.exit(2)
prev_line = ""
i_last = -10
for i, line in enumerate(f):
# is ete within current line?
if pat.search(line):
print "%s:%d: %s" % (fname, i+1, line.strip())
i_last = i
else:
# construct extended line that included previous
# note newline is stripped
s = prev_line.strip("\n") + " " + line
# is ete within extended line?
if pat.search(s):
# matched ete in extended so want both lines printed
# did we print prev line?
if not i_last == (i - 1):
# no so print it now
print "%s:%d: %s" % (fname, i, prev_line.strip())
# print cur line with special marker
print "--> %s:%d: %s" % (fname, i+1, line.strip())
i_last = i
# make sure we don't match ete twice
prev_line = re.sub(pat, "", line)
try:
if sys.argv[1] in ("-h", "--help"):
raise IndexError # print help
except IndexError:
sys.stderr.write("print_ete <filename>\n")
sys.stderr.write('grep-like tool to print lines matching "%s"\n' %
"export to excel")
sys.exit(1)
print_ete(sys.argv[1])
EDIT: added comments.
编辑:添加评论。
I went to some trouble to make it print the correct line number on each line, using a format similar to what you would get with grep -Hn
.
我遇到了一些麻烦,让它在每一行上打印正确的行号,使用的格式类似于grep -Hn
.
It could be much shorter and simpler if you don't need line numbers, and you don't mind reading in the whole file at once into memory:
如果您不需要行号,它可能会更短更简单,并且您不介意一次将整个文件读入内存:
#!/usr/bin/python
import re
import sys
# This pattern not compiled with re.MULTILINE on purpose.
# We *want* the \s pattern to match a newline here so it can
# match across multiple lines.
# Note the match group that gathers text around ete pattern uses a character
# class that matches anything but "\n", to grab text around ete.
s_pat = "([^\n]*export\s+to\s+excel[^\n]*)"
pat = re.compile(s_pat)
def print_ete(fname):
try:
text = open(fname, "rt").read()
except IOError:
sys.stderr.write('print_ete: unable to open file "%s"\n' % fname)
sys.exit(2)
for s_match in re.findall(pat, text):
print s_match
try:
if sys.argv[1] in ("-h", "--help"):
raise IndexError # print help
except IndexError:
sys.stderr.write("print_ete <filename>\n")
sys.stderr.write('grep-like tool to print lines matching "%s"\n' %
"export to excel")
sys.exit(1)
print_ete(sys.argv[1])
回答by christian.buggle
grep -A1 "export to" filename | grep -B1 "excel"
grep -A1 "导出到" 文件名 | grep -B1 "excel"
回答by ghostdog74
use gawk. set record separator as excel, then check for "export to".
使用呆呆。将记录分隔符设置为 excel,然后检查“导出到”。
gawk -vRS="excel" '/export.*to/{print "found export to excel at record: "NR}' file
or
或者
gawk '/export.*to.*excel/{print}
/export to/&&!/excel/{
s=sed -n '$b; /export to excel/{p; b}; N; /export to\nexcel/{p; b}; D' filename
getline line
if (line~/excel/){
printf "%s\n%s\n",s,line
}
}' file
回答by Paused until further notice.
I have tested this a little and it seems to work:
我对此进行了一些测试,它似乎有效:
sed -n '$b; /export to excel/{p; b}; N; /export to\s*\n\s*excel/{p; b}; D' filename
You can allow for some extra white space at the end and beginning of the lines like this:
您可以在行的末尾和开头允许一些额外的空格,如下所示:
##代码##