java 让正则表达式忽略新行并匹配整个大字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3570099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 02:27:20  来源:igfitidea点击:

Have regex ignore new lines and just match on a whole large string?

javaregex

提问by Zombies

I have this string here:

我这里有这个字符串:

CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
    SOME_PK          ASC
);

I want to match across the multiple lines and match the SQL statements (all of them, there will be many in 1 large string)... something like this, however I am only getting a match on CREATE UNIQUE INDEX index555 ON

我想在多行中匹配并匹配 SQL 语句(所有这些,都会有很多在 1 个大字符串中)......像这样,但是我只得到匹配 CREATE UNIQUE INDEX index555 ON

(CREATE\s.+;)

note: I am trying to accomplish this in java if it matters.

注意:如果重要的话,我正在尝试在 Java 中完成此操作。

回答by

You need to use DOTALL and MULTILINE flags when compiling a regular expression. Here is a Java code example:

编译正则表达式时需要使用 DOTALL 和 MULTILINE 标志。这是一个 Java 代码示例:

import java.util.regex.*;

public class test
{
    public static void main(String[] args)
    {
        String s =
        "CREATE UNIQUE INDEX index555 ON\nSOME_TABLE\n(\n    SOME_PK          ASC\n);\nCREATE UNIQUE INDEX index666 ON\nOTHER_TABLE\n(\n    OTHER_PK          ASC\n);\n";

        Pattern p = Pattern.compile("([^;]*?('.*?')?)*?;\s*", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);

        Matcher m = p.matcher(s);

        while (m.find())
        {
        System.out.println ("--- Statement ---");
        System.out.println (m.group ());
        }
    }
}

The output will be:

输出将是:

--- Statement ---
CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
    SOME_PK          ASC
);

--- Statement ---
CREATE UNIQUE INDEX index666 ON
OTHER_TABLE
(
    OTHER_PK          ASC
);

回答by lowercase

Check this

检查这个

The regular expression . matches any character except a line terminator unless the DOTALL flag is specified

正则表达式。匹配除行终止符以外的任何字符,除非指定了 DOTALL 标志

So you need to do something like this

所以你需要做这样的事情

Pattern p = Pattern.compile("your pattern", Pattern.DOTALL);

回答by Alan Moore

The DOTALLflag lets the .match newlines, but if you simply apply it to your existing regex, you'll end up matching everything from the first CREATEto the last ;in one go. If you want to match the statements individually, you'll need to do more. One option is to use a non-greedy quantifier:

DOTALL标志允许.匹配换行符,但如果您只是将它应用于现有的正则表达式,您最终会一次性匹配从第一个CREATE到最后;一个的所有内容。如果您想单独匹配语句,则需要执行更多操作。一种选择是使用非贪婪量词:

Pattern p = Pattern.compile("^CREATE\b.+?;",
    Pattern.DOTALL | Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);

I also used the MULTILINEflag to let the ^anchor match after newlines, and CASE_INSENSITIVEbecause SQL is--at least, every flavor I've heard of. Note that all three flags have "inline" forms that you can use in the regex itself:

我还使用MULTILINE标志让^锚在换行符之后匹配,CASE_INSENSITIVE因为 SQL 是——至少,我听说过的每一种风格。请注意,所有三个标志都有您可以在正则表达式本身中使用的“内联”形式:

Pattern p = Pattern.compile("(?smi)^CREATE\b.+?;");

(The inline form of DOTALLis sfor historical reasons; it was called "single-line" mode in Perl, where it originated.) Another option is to use a negated character class:

(内联形式DOTALLs出于历史原因;它在 Perl 中被称为“单行”模式,起源于此。)另一种选择是使用否定字符类:

Pattern p = Pattern.compile("(?mi)^CREATE\b[^;]+;");

[^;]+matches one or more of any character except ;--that includes newlines, so the sflag isn't needed.

[^;]+匹配除;--that 包括换行符之外的任何字符中的一个或多个,因此s不需要该标志。

So far, I've assumed that every statement starts at the beginning of a line and ends with a semicolon, as in your example. I don't think either of those things is required by the SQL standard, but I expect you'll know if you can count on them in this instance. You might want to start matching at a word boundary instead of a line boundary:

到目前为止,我假设每个语句都从一行的开头开始并以分号结束,如您的示例所示。我认为 SQL 标准不需要这些东西中的任何一个,但我希望您会知道在这种情况下是否可以依靠它们。您可能希望在单词边界而不是行边界处开始匹配:

Pattern p = Pattern.compile("(?i)\bCREATE\b[^;]+;");

Finally, if you're thinking about doing anything more complicated with regexes and SQL, don't. Parsing SQL with regexes is a fool's game--it's an even worse fit than HTML and regexes.

最后,如果您正在考虑使用正则表达式和 SQL 做任何更复杂的事情,请不要. 用正则表达式解析 SQL 是一个傻瓜的游戏——它比 HTML 和正则表达式更适合。

回答by Don Kirkby

Check out the various flags that can be passed to Pattern.compile. I think DOTALL is the one you need.

查看可以传递给Pattern.compile的各种标志。我认为 DOTALL 是您需要的。

回答by Kibbee

You'll want to use the Pattern.DOTALLflag to match across lines.

您将需要使用Pattern.DOTALL标志来跨行匹配。