java 让正则表达式忽略新行并匹配整个大字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3570099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Have regex ignore new lines and just match on a whole large string?
提问by Zombies
I have this string here:
我这里有这个字符串:
CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
SOME_PK ASC
);
I want to match across the multiple lines and match the SQL statements (all of them, there will be many in 1 large string)... something like this, however I am only getting a match on CREATE UNIQUE INDEX index555 ON
我想在多行中匹配并匹配 SQL 语句(所有这些,都会有很多在 1 个大字符串中)......像这样,但是我只得到匹配 CREATE UNIQUE INDEX index555 ON
(CREATE\s.+;)
note: I am trying to accomplish this in java if it matters.
注意:如果重要的话,我正在尝试在 Java 中完成此操作。
回答by
You need to use DOTALL and MULTILINE flags when compiling a regular expression. Here is a Java code example:
编译正则表达式时需要使用 DOTALL 和 MULTILINE 标志。这是一个 Java 代码示例:
import java.util.regex.*;
public class test
{
public static void main(String[] args)
{
String s =
"CREATE UNIQUE INDEX index555 ON\nSOME_TABLE\n(\n SOME_PK ASC\n);\nCREATE UNIQUE INDEX index666 ON\nOTHER_TABLE\n(\n OTHER_PK ASC\n);\n";
Pattern p = Pattern.compile("([^;]*?('.*?')?)*?;\s*", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
Matcher m = p.matcher(s);
while (m.find())
{
System.out.println ("--- Statement ---");
System.out.println (m.group ());
}
}
}
The output will be:
输出将是:
--- Statement ---
CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
SOME_PK ASC
);
--- Statement ---
CREATE UNIQUE INDEX index666 ON
OTHER_TABLE
(
OTHER_PK ASC
);
回答by lowercase
回答by Alan Moore
The DOTALL
flag lets the .
match newlines, but if you simply apply it to your existing regex, you'll end up matching everything from the first CREATE
to the last ;
in one go. If you want to match the statements individually, you'll need to do more. One option is to use a non-greedy quantifier:
该DOTALL
标志允许.
匹配换行符,但如果您只是将它应用于现有的正则表达式,您最终会一次性匹配从第一个CREATE
到最后;
一个的所有内容。如果您想单独匹配语句,则需要执行更多操作。一种选择是使用非贪婪量词:
Pattern p = Pattern.compile("^CREATE\b.+?;",
Pattern.DOTALL | Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
I also used the MULTILINE
flag to let the ^
anchor match after newlines, and CASE_INSENSITIVE
because SQL is--at least, every flavor I've heard of. Note that all three flags have "inline" forms that you can use in the regex itself:
我还使用MULTILINE
标志让^
锚在换行符之后匹配,CASE_INSENSITIVE
因为 SQL 是——至少,我听说过的每一种风格。请注意,所有三个标志都有您可以在正则表达式本身中使用的“内联”形式:
Pattern p = Pattern.compile("(?smi)^CREATE\b.+?;");
(The inline form of DOTALL
is s
for historical reasons; it was called "single-line" mode in Perl, where it originated.) Another option is to use a negated character class:
(内联形式DOTALL
是s
出于历史原因;它在 Perl 中被称为“单行”模式,起源于此。)另一种选择是使用否定字符类:
Pattern p = Pattern.compile("(?mi)^CREATE\b[^;]+;");
[^;]+
matches one or more of any character except ;
--that includes newlines, so the s
flag isn't needed.
[^;]+
匹配除;
--that 包括换行符之外的任何字符中的一个或多个,因此s
不需要该标志。
So far, I've assumed that every statement starts at the beginning of a line and ends with a semicolon, as in your example. I don't think either of those things is required by the SQL standard, but I expect you'll know if you can count on them in this instance. You might want to start matching at a word boundary instead of a line boundary:
到目前为止,我假设每个语句都从一行的开头开始并以分号结束,如您的示例所示。我认为 SQL 标准不需要这些东西中的任何一个,但我希望您会知道在这种情况下是否可以依靠它们。您可能希望在单词边界而不是行边界处开始匹配:
Pattern p = Pattern.compile("(?i)\bCREATE\b[^;]+;");
Finally, if you're thinking about doing anything more complicated with regexes and SQL, don't. Parsing SQL with regexes is a fool's game--it's an even worse fit than HTML and regexes.
最后,如果您正在考虑使用正则表达式和 SQL 做任何更复杂的事情,请不要. 用正则表达式解析 SQL 是一个傻瓜的游戏——它比 HTML 和正则表达式更适合。
回答by Don Kirkby
Check out the various flags that can be passed to Pattern.compile. I think DOTALL is the one you need.
查看可以传递给Pattern.compile的各种标志。我认为 DOTALL 是您需要的。
回答by Kibbee
You'll want to use the Pattern.DOTALLflag to match across lines.
您将需要使用Pattern.DOTALL标志来跨行匹配。