bash 带换行符的 Grep 搜索字符串

Question

提问by Vijay Dev

How to use grep to output occurrences of the string 'export to excel' in the input files given below? Specifically, how to handle the line breaks that happen in between the search strings? Is there a switch in grep that can do this or some other command probably?

如何使用 grep 在下面给出的输入文件中输出出现的字符串“export to excel”？具体来说，如何处理搜索字符串之间发生的换行符？grep 中是否有可以执行此操作或其他命令的开关？

Input files:

输入文件：

File a.txt:

文件a.txt：

blah blah ... export to
excel ...
blah blah..

等等等等......导出到
excel......
等等......

File b.txt:

文件 b.txt：

blah blah ... export to excel ...
blah blah..

等等等等......导出到excel......
等等......

Answer 1

回答by Laurence Gonsalves

Do you just want to find files that contain the pattern, ignoring linebreaks, or do you want to actually see the matching lines?

您是只想查找包含该模式的文件而忽略换行符，还是想实际查看匹配的行？

If the former, you can use trto convert newlines to spaces:

如果是前者，您可以使用tr将换行符转换为空格：

tr '\n' ' ' | grep 'export to excel'

If the latter you can do the same thing, but you may want to use the -o flag to only print the actual match. You'll then want to adjust your regex to include any extra context you want.

如果后者你可以做同样的事情，但你可能想使用 -o 标志只打印实际匹配。然后，您需要调整正则表达式以包含您想要的任何额外上下文。

Answer 2

回答by steveha

I don't know how to do this in grep. I checked the man page for egrep(1)and it can't match with a newline in the middle either.

我不知道如何在 grep 中做到这一点。我检查了手册页egrep(1)，它也无法与中间的换行符匹配。

I like the solution @Laurence Gonsalves suggested, of using tr(1)to wipe out the newlines. But as he noted, it will be a pain to print the matching lines if you do it that way.

我喜欢@Laurence Gonsalves 建议的解决方案，tr(1)用于清除换行符。但正如他所指出的，如果你这样做，打印匹配的行会很痛苦。

If you want to match despite a newline and then print the matching line(s), I can't think of a way to do it with grep, but it would be not too hard in any of Python, AWK, Perl, or Ruby.

如果你想匹配换行符然后打印匹配的行，我想不出用 grep 来做的方法，但在 Python、AWK、Perl 或 Ruby 中的任何一个都不会太难.

Here's a Python script that solves the problem. I decided that, for lines that only match when joined to the previous line, I would print a -->arrow before the second line of the match. Lines that match outright are always printed without the arrow.

这是一个解决问题的 Python 脚本。我决定，对于仅在连接到前一行时才匹配的行，我会-->在匹配的第二行之前打印一个箭头。完全匹配的行总是不带箭头打印。

This is written assuming that /usr/bin/python is Python 2.x. You can trivially change the script to work under Python 3.x if desired.

这是假设 /usr/bin/python 是 Python 2.x 编写的。如果需要，您可以简单地更改脚本以在 Python 3.x 下工作。

#!/usr/bin/python

import re
import sys

s_pat = "export\s+to\s+excel"
pat = re.compile(s_pat)

def print_ete(fname):
    try:
        f = open(fname, "rt")
    except IOError:
        sys.stderr.write('print_ete: unable to open file "%s"\n' % fname)
        sys.exit(2)

    prev_line = ""
    i_last = -10
    for i, line in enumerate(f):
        # is ete within current line?
        if pat.search(line):
            print "%s:%d: %s" % (fname, i+1, line.strip())
            i_last = i
        else:
            # construct extended line that included previous
            # note newline is stripped
            s = prev_line.strip("\n") + " " + line
            # is ete within extended line?
            if pat.search(s):
                # matched ete in extended so want both lines printed
                # did we print prev line?
                if not i_last == (i - 1):
                    # no so print it now
                    print "%s:%d: %s" % (fname, i, prev_line.strip())
                # print cur line with special marker
                print "-->  %s:%d: %s" % (fname, i+1, line.strip())
                i_last = i
        # make sure we don't match ete twice
        prev_line = re.sub(pat, "", line)

try:
    if sys.argv[1] in ("-h", "--help"):
        raise IndexError # print help
except IndexError:
    sys.stderr.write("print_ete <filename>\n")
    sys.stderr.write('grep-like tool to print lines matching "%s"\n' %
            "export to excel")
    sys.exit(1)

print_ete(sys.argv[1])

EDIT: added comments.

编辑：添加评论。

I went to some trouble to make it print the correct line number on each line, using a format similar to what you would get with grep -Hn.

我遇到了一些麻烦，让它在每一行上打印正确的行号，使用的格式类似于grep -Hn.

It could be much shorter and simpler if you don't need line numbers, and you don't mind reading in the whole file at once into memory:

如果您不需要行号，它可能会更短更简单，并且您不介意一次将整个文件读入内存：

#!/usr/bin/python

import re
import sys

# This pattern not compiled with re.MULTILINE on purpose.
# We *want* the \s pattern to match a newline here so it can
# match across multiple lines.
# Note the match group that gathers text around ete pattern uses a character
# class that matches anything but "\n", to grab text around ete.
s_pat = "([^\n]*export\s+to\s+excel[^\n]*)"
pat = re.compile(s_pat)

def print_ete(fname):
    try:
        text = open(fname, "rt").read()
    except IOError:
        sys.stderr.write('print_ete: unable to open file "%s"\n' % fname)
        sys.exit(2)

    for s_match in re.findall(pat, text):
        print s_match

try:
    if sys.argv[1] in ("-h", "--help"):
        raise IndexError # print help
except IndexError:
    sys.stderr.write("print_ete <filename>\n")
    sys.stderr.write('grep-like tool to print lines matching "%s"\n' %
            "export to excel")
    sys.exit(1)

print_ete(sys.argv[1])

Answer 3

回答by christian.buggle

grep -A1 "export to" filename | grep -B1 "excel"

grep -A1 "导出到" 文件名 | grep -B1 "excel"

Answer 4

回答by ghostdog74

use gawk. set record separator as excel, then check for "export to".

使用呆呆。将记录分隔符设置为 excel，然后检查“导出到”。

gawk -vRS="excel" '/export.*to/{print "found export to excel at record: "NR}' file

or

或者

gawk '/export.*to.*excel/{print}
/export to/&&!/excel/{
  s=sed -n '$b; /export to excel/{p; b}; N; /export to\nexcel/{p; b}; D' filename

  getline line
  if (line~/excel/){
   printf "%s\n%s\n",s,line
  } 
}' file

Answer 5

回答by Paused until further notice.

I have tested this a little and it seems to work:

我对此进行了一些测试，它似乎有效：

sed -n '$b; /export to excel/{p; b}; N; /export to\s*\n\s*excel/{p; b}; D' filename

You can allow for some extra white space at the end and beginning of the lines like this:

您可以在行的末尾和开头允许一些额外的空格，如下所示：

##代码##

bash 带换行符的 Grep 搜索字符串

提问by Vijay Dev

回答by Laurence Gonsalves

回答by steveha

回答by christian.buggle

回答by ghostdog74

回答by Paused until further notice.

相关推荐

最近更新

标签

bash 带换行符的 Grep 搜索字符串

提问by Vijay Dev

回答by Laurence Gonsalves

回答by steveha

回答by christian.buggle

回答by ghostdog74

回答by Paused until further notice.

相关推荐

bash 什么递归地扩展到当前目录中的所有文件？

bash 在bash中用（下划线）_替换空格的最简单方法

在 Bash 中获取日期（当前时间前一天）

我似乎无法在“-c”选项字符串之后使用带有参数的 Bash“-c”选项

相关推荐

最近更新

标签