Linux 需要多行搜索的正则表达式(grep)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3717772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 23:33:28  来源:igfitidea点击:

Regex (grep) for multi-line search needed

regexlinuxcygwingrep

提问by Ciaran Archer

Possible Duplicate:
How can I search for a multiline pattern in a file ? Use pcregrep

可能的重复:
如何在文件中搜索多行模式?使用 pcregrep

I'm running a grepto find any *.sql file that has the word selectfollowed by the word customerNamefollowed by the word from. This select statement can span many lines and can contain tabs and newlines.

我正在运行 agrep以查找任何包含单词select后跟单词customerName后跟单词的 *.sql 文件from。这个 select 语句可以跨越多行,并且可以包含制表符和换行符。

I've tried a few variations on the following:

我在以下方面尝试了一些变体:

$ grep -liIr --include="*.sql" --exclude-dir="\.svn*" --regexp="select[a-zA-Z0-
9+\n\r]*customerName[a-zA-Z0-9+\n\r]*from"

This, however, just runs forever. Can anyone help me with the correct syntax please?

然而,这只是永远运行。任何人都可以帮助我使用正确的语法吗?

采纳答案by albfan

Without the need to install the grep variant pcregrep, you can do multiline search with grep.

无需安装 grep 变体 pcregrep,您可以使用 grep 进行多行搜索。

$ grep -Pzo "(?s)^(\s*)\N*main.*?{.*?^}" *.c

Explanation:

解释:

-Pactivate perl-regexpfor grep (a powerful extension of regular expressions)

-P为 grep激活perl-regexp(正则表达式的强大扩展)

-zsuppress newline at the end of line, substituting it for null character. That is, grep knows where end of line is, but sees the input as one big line.

-z在行尾取消换行符,将其替换为空字符。也就是说,grep 知道行尾在哪里,但将输入视为一条大行。

-oprint only matching. Because we're using -z, the whole file is like a single big line, so if there is a match, the entire file would be printed; this way it won't do that.

-o仅打印匹配。因为我们使用-z,整个文件就像一个大行,所以如果有匹配,整个文件就会被打印出来;这样它就不会那样做。

In regexp:

在正则表达式中:

(?s)activate PCRE_DOTALL, which means that .finds any character or newline

(?s)activate PCRE_DOTALL,这意味着.找到任何字符或换行符

\Nfind anything except newline, even with PCRE_DOTALLactivated

\N找到除换行符以外的任何内容,即使已PCRE_DOTALL激活

.*?find .in non-greedy mode, that is, stops as soon as possible.

.*?find.在非贪婪模式下,即尽快停止。

^find start of line

^找到行首

\1backreference to the first group (\s*). This is a try to find the same indentation of method.

\1对第一组 ( \s*) 的反向引用。这是尝试找到相同缩进的方法。

As you can imagine, this search prints the main method in a C (*.c) source file.

可以想象,此搜索将打印 C ( *.c) 源文件中的 main 方法。

回答by Jonathan Leffler

Your fundamental problem is that grepworks one line at a time - so it cannot find a SELECT statement spread across lines.

您的基本问题是grep一次只能运行一行 - 因此它找不到跨行分布的 SELECT 语句。

Your second problem is that the regex you are using doesn't deal with the complexity of what can appear between SELECT and FROM - in particular, it omits commas, full stops (periods) and blanks, but also quotes and anything that can be inside a quoted string.

你的第二个问题是你使用的正则表达式没有处理 SELECT 和 FROM 之间出现的复杂性 - 特别是,它省略了逗号、句号(句点)和空格,但也省略了引号和任何可以在里面的东西带引号的字符串。

I would likely go with a Perl-based solution, having Perl read 'paragraphs' at a time and applying a regex to that. The downside is having to deal with the recursive search - there are modules to do that, of course, including the core module File::Find.

我可能会使用基于 Perl 的解决方案,让 Perl 一次读取“段落”并对其应用正则表达式。缺点是必须处理递归搜索——当然,有一些模块可以做到这一点,包括核心模块File::Find

In outline, for a single file:

概括地说,对于单个文件:

$/ = "\n\n";    # Paragraphs

while (<>)
{
     if ($_ =~ m/SELECT.*customerName.*FROM/mi)
     {
         printf file name
         go to next file
     }
}

That needs to be wrapped into a sub that is then invoked by the methods of File::Find.

这需要包装到一个子程序中,然后由 File::Find 的方法调用。

回答by Amit

I am not very good in grep. But your problem can be solved using AWKcommand. Just see

我不太擅长 grep。但是您的问题可以使用AWK命令解决。看看

awk '/select/,/from/' *.sql

The above code will result from first occurence of selecttill first sequence of from. Now you need to verify whether returned statements are having customernameor not. For this you can pipe the result. And can use awk or grep again.

上面的代码将从 的第一次出现select到 的第一个序列产生from。现在您需要验证返回的语句是否有customername。为此,您可以通过管道传输结果。并且可以再次使用 awk 或 grep。