Linux 需要多行搜索的正则表达式(grep)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3717772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex (grep) for multi-line search needed
提问by Ciaran Archer
Possible Duplicate:
How can I search for a multiline pattern in a file ? Use pcregrep
可能的重复:
如何在文件中搜索多行模式?使用 pcregrep
I'm running a grep
to find any *.sql file that has the word select
followed by the word customerName
followed by the word from
. This select statement can span many lines and can contain tabs and newlines.
我正在运行 agrep
以查找任何包含单词select
后跟单词customerName
后跟单词的 *.sql 文件from
。这个 select 语句可以跨越多行,并且可以包含制表符和换行符。
I've tried a few variations on the following:
我在以下方面尝试了一些变体:
$ grep -liIr --include="*.sql" --exclude-dir="\.svn*" --regexp="select[a-zA-Z0-
9+\n\r]*customerName[a-zA-Z0-9+\n\r]*from"
This, however, just runs forever. Can anyone help me with the correct syntax please?
然而,这只是永远运行。任何人都可以帮助我使用正确的语法吗?
采纳答案by albfan
Without the need to install the grep variant pcregrep, you can do multiline search with grep.
无需安装 grep 变体 pcregrep,您可以使用 grep 进行多行搜索。
$ grep -Pzo "(?s)^(\s*)\N*main.*?{.*?^}" *.c
Explanation:
解释:
-P
activate perl-regexpfor grep (a powerful extension of regular expressions)
-P
为 grep激活perl-regexp(正则表达式的强大扩展)
-z
suppress newline at the end of line, substituting it for null character. That is, grep knows where end of line is, but sees the input as one big line.
-z
在行尾取消换行符,将其替换为空字符。也就是说,grep 知道行尾在哪里,但将输入视为一条大行。
-o
print only matching. Because we're using -z
, the whole file is like a single big line, so if there is a match, the entire file would be printed; this way it won't do that.
-o
仅打印匹配。因为我们使用-z
,整个文件就像一个大行,所以如果有匹配,整个文件就会被打印出来;这样它就不会那样做。
In regexp:
在正则表达式中:
(?s)
activate PCRE_DOTALL
, which means that .
finds any character or newline
(?s)
activate PCRE_DOTALL
,这意味着.
找到任何字符或换行符
\N
find anything except newline, even with PCRE_DOTALL
activated
\N
找到除换行符以外的任何内容,即使已PCRE_DOTALL
激活
.*?
find .
in non-greedy mode, that is, stops as soon as possible.
.*?
find.
在非贪婪模式下,即尽快停止。
^
find start of line
^
找到行首
\1
backreference to the first group (\s*
). This is a try to find the same indentation of method.
\1
对第一组 ( \s*
) 的反向引用。这是尝试找到相同缩进的方法。
As you can imagine, this search prints the main method in a C (*.c
) source file.
可以想象,此搜索将打印 C ( *.c
) 源文件中的 main 方法。
回答by Jonathan Leffler
Your fundamental problem is that grep
works one line at a time - so it cannot find a SELECT statement spread across lines.
您的基本问题是grep
一次只能运行一行 - 因此它找不到跨行分布的 SELECT 语句。
Your second problem is that the regex you are using doesn't deal with the complexity of what can appear between SELECT and FROM - in particular, it omits commas, full stops (periods) and blanks, but also quotes and anything that can be inside a quoted string.
你的第二个问题是你使用的正则表达式没有处理 SELECT 和 FROM 之间出现的复杂性 - 特别是,它省略了逗号、句号(句点)和空格,但也省略了引号和任何可以在里面的东西带引号的字符串。
I would likely go with a Perl-based solution, having Perl read 'paragraphs' at a time and applying a regex to that. The downside is having to deal with the recursive search - there are modules to do that, of course, including the core module File::Find.
我可能会使用基于 Perl 的解决方案,让 Perl 一次读取“段落”并对其应用正则表达式。缺点是必须处理递归搜索——当然,有一些模块可以做到这一点,包括核心模块File::Find。
In outline, for a single file:
概括地说,对于单个文件:
$/ = "\n\n"; # Paragraphs
while (<>)
{
if ($_ =~ m/SELECT.*customerName.*FROM/mi)
{
printf file name
go to next file
}
}
That needs to be wrapped into a sub that is then invoked by the methods of File::Find.
这需要包装到一个子程序中,然后由 File::Find 的方法调用。
回答by Amit
I am not very good in grep. But your problem can be solved using AWKcommand. Just see
我不太擅长 grep。但是您的问题可以使用AWK命令解决。看看
awk '/select/,/from/' *.sql
The above code will result from first occurence of select
till first sequence of from
. Now you need to verify whether returned statements are having customername
or not. For this you can pipe the result. And can use awk or grep again.
上面的代码将从 的第一次出现select
到 的第一个序列产生from
。现在您需要验证返回的语句是否有customername
。为此,您可以通过管道传输结果。并且可以再次使用 awk 或 grep。