java 让正则表达式忽略新行并匹配整个大字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3570099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Have regex ignore new lines and just match on a whole large string?
提问by Zombies
I have this string here:
我这里有这个字符串:
CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
SOME_PK ASC
);
I want to match across the multiple lines and match the SQL statements (all of them, there will be many in 1 large string)... something like this, however I am only getting a match on CREATE UNIQUE INDEX index555 ON
我想在多行中匹配并匹配 SQL 语句(所有这些,都会有很多在 1 个大字符串中)......像这样,但是我只得到匹配 CREATE UNIQUE INDEX index555 ON
(CREATE\s.+;)
note: I am trying to accomplish this in java if it matters.
注意:如果重要的话,我正在尝试在 Java 中完成此操作。
回答by
You need to use DOTALL and MULTILINE flags when compiling a regular expression. Here is a Java code example:
编译正则表达式时需要使用 DOTALL 和 MULTILINE 标志。这是一个 Java 代码示例:
import java.util.regex.*;
public class test
{
public static void main(String[] args)
{
String s =
"CREATE UNIQUE INDEX index555 ON\nSOME_TABLE\n(\n SOME_PK ASC\n);\nCREATE UNIQUE INDEX index666 ON\nOTHER_TABLE\n(\n OTHER_PK ASC\n);\n";
Pattern p = Pattern.compile("([^;]*?('.*?')?)*?;\s*", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
Matcher m = p.matcher(s);
while (m.find())
{
System.out.println ("--- Statement ---");
System.out.println (m.group ());
}
}
}
The output will be:
输出将是:
--- Statement ---
CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
SOME_PK ASC
);
--- Statement ---
CREATE UNIQUE INDEX index666 ON
OTHER_TABLE
(
OTHER_PK ASC
);
回答by lowercase
回答by Alan Moore
The DOTALLflag lets the .match newlines, but if you simply apply it to your existing regex, you'll end up matching everything from the first CREATEto the last ;in one go. If you want to match the statements individually, you'll need to do more. One option is to use a non-greedy quantifier:
该DOTALL标志允许.匹配换行符,但如果您只是将它应用于现有的正则表达式,您最终会一次性匹配从第一个CREATE到最后;一个的所有内容。如果您想单独匹配语句,则需要执行更多操作。一种选择是使用非贪婪量词:
Pattern p = Pattern.compile("^CREATE\b.+?;",
Pattern.DOTALL | Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
I also used the MULTILINEflag to let the ^anchor match after newlines, and CASE_INSENSITIVEbecause SQL is--at least, every flavor I've heard of. Note that all three flags have "inline" forms that you can use in the regex itself:
我还使用MULTILINE标志让^锚在换行符之后匹配,CASE_INSENSITIVE因为 SQL 是——至少,我听说过的每一种风格。请注意,所有三个标志都有您可以在正则表达式本身中使用的“内联”形式:
Pattern p = Pattern.compile("(?smi)^CREATE\b.+?;");
(The inline form of DOTALLis sfor historical reasons; it was called "single-line" mode in Perl, where it originated.) Another option is to use a negated character class:
(内联形式DOTALL是s出于历史原因;它在 Perl 中被称为“单行”模式,起源于此。)另一种选择是使用否定字符类:
Pattern p = Pattern.compile("(?mi)^CREATE\b[^;]+;");
[^;]+matches one or more of any character except ;--that includes newlines, so the sflag isn't needed.
[^;]+匹配除;--that 包括换行符之外的任何字符中的一个或多个,因此s不需要该标志。
So far, I've assumed that every statement starts at the beginning of a line and ends with a semicolon, as in your example. I don't think either of those things is required by the SQL standard, but I expect you'll know if you can count on them in this instance. You might want to start matching at a word boundary instead of a line boundary:
到目前为止,我假设每个语句都从一行的开头开始并以分号结束,如您的示例所示。我认为 SQL 标准不需要这些东西中的任何一个,但我希望您会知道在这种情况下是否可以依靠它们。您可能希望在单词边界而不是行边界处开始匹配:
Pattern p = Pattern.compile("(?i)\bCREATE\b[^;]+;");
Finally, if you're thinking about doing anything more complicated with regexes and SQL, don't. Parsing SQL with regexes is a fool's game--it's an even worse fit than HTML and regexes.
最后,如果您正在考虑使用正则表达式和 SQL 做任何更复杂的事情,请不要. 用正则表达式解析 SQL 是一个傻瓜的游戏——它比 HTML 和正则表达式更适合。
回答by Don Kirkby
Check out the various flags that can be passed to Pattern.compile. I think DOTALL is the one you need.
查看可以传递给Pattern.compile的各种标志。我认为 DOTALL 是您需要的。
回答by Kibbee
You'll want to use the Pattern.DOTALLflag to match across lines.
您将需要使用Pattern.DOTALL标志来跨行匹配。

