Java 将字符串与多个正则表达式模式匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22252297/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 14:33:46  来源:igfitidea点击:

Match a string against multiple regex patterns

javaregex

提问by Patan

I have an input string.

我有一个输入字符串。

I am thinking how to match this string against more than one regular expression effectively.

我正在考虑如何有效地将此字符串与多个正则表达式匹配。

Example Input: ABCD

I'd like to match against these reg-ex patterns, and return trueif at least one of them matches:

我想匹配这些正则表达式模式,true如果其中至少一个匹配,则返回:

[a-zA-Z]{3}

^[^\d].*

([\w&&[^b]])*

I am not sure how to match against multiple patterns at once. Can some one tell me how do we do it effectively?

我不确定如何一次匹配多个模式。有人能告诉我我们如何有效地做到这一点吗?

采纳答案by Marko Topolnik

If you have just a few regexes, and they are all known at compile time, then this can be enough:

如果您只有几个正则表达式,并且它们在编译时都是已知的,那么这就足够了:

private static final Pattern
  rx1 = Pattern.compile("..."),
  rx2 = Pattern.compile("..."),
  ...;

return rx1.matcher(s).matches() || rx2.matcher(s).matches() || ...;

If there are more of them, or they are loaded at runtime, then use a list of patterns:

如果它们更多,或者它们在运行时加载,则使用模式列表:

final List<Pattern> rxs = new ArrayList<>();


for (Pattern rx : rxs) if (rx.matcher(input).matches()) return true;
return false;

回答by NeplatnyUdaj

I'm not sure what effectivelymeans, but if it's about performance and you want to check a lot of strings, I'd go for this

我不确定是什么effectively意思,但如果是关于性能并且你想检查很多字符串,我会这样做

...
static Pattern p1 = Pattern.compile("[a-zA-Z]{3}");
static Pattern p2 = Pattern.compile("^[^\d].*");
static Pattern p3 = Pattern.compile("([\w&&[^b]])*");

public static boolean test(String s){
   return p1.matcher(s).matches ? true: 
        p2.matcher(s).matches ? true: 
        p3.matcher(s).matches;
}

I'm not sure how it will affect performance, but combining them all in one regexp with |could also help.

我不确定它会如何影响性能,但将它们全部组合在一个正则表达式中|也可能有所帮助。

回答by Pshemo

To avoid recreating instances of Pattern and Matcher classes you can create one of each and reuse them. To reuse Matcherclass you can use reset(newInput)method. Warning: This approach is not thread safe. Use it only when you can guarantee that only one thread will be able to use this method, otherwise create separate instance of Matcher for each methods call.

为避免重新创建 Pattern 和 Matcher 类的实例,您可以分别创建一个并重用它们。要重用Matcher类,您可以使用reset(newInput)方法。 警告:这种方法不是线程安全的。仅当您可以保证只有一个线程能够使用此方法时才使用它,否则为每个方法调用创建单独的 Matcher 实例。

This is one of possible code examples

这是可能的代码示例之一

private static Matcher m1 = Pattern.compile("regex1").matcher("");
private static Matcher m2 = Pattern.compile("regex2").matcher("");
private static Matcher m3 = Pattern.compile("regex3").matcher("");

public boolean matchesAtLeastOneRegex(String input) {
    return     m1.reset(input).matches() 
            || m2.reset(input).matches()
            || m3.reset(input).matches();
}

回答by vandale

you can make one large regex out of the individual ones:

您可以从单个正则表达式中生成一个大型正则表达式:

[a-zA-Z]{3}|^[^\d].*|([\w&&[^b]])*

回答by NobodyReally

Here's an alternative. Note that one thing this doesn't do is return them in a specific order. But one could do that by sorting by m.start() for example.

这是一个替代方案。请注意,这不会做的一件事是以特定顺序返回它们。但是可以通过例如 m.start() 排序来做到这一点。

private static HashMap<String, String> regs = new HashMap<String, String>();

...

...

    regs.put("COMMA", ",");
    regs.put("ID", "[a-z][a-zA-Z0-9]*");
    regs.put("SEMI", ";");
    regs.put("GETS", ":=");
    regs.put("DOT", "\.");

    for (HashMap.Entry<String, String> entry : regs.entrySet()) {
        String key = entry.getKey();
        String value = entry.getValue();
        Matcher m = Pattern.compile(value).matcher("program var a, b, c; begin a := 0; end.");
        boolean f = m.find();
        while(f) 
        {
            System.out.println(key);
            System.out.print(m.group() + " ");
            System.out.print(m.start() + " ");
            System.out.println(m.end());
            f = m.find();
        }

    }   
}

回答by SkateScout

like it was explained in (Running multiple regex patterns on String) it is better to concatenate each regex to one large regex and than run the matcher only one. This is an large improvement is you often reuse the regex.

就像在(在 String 上运行多个正则表达式模式)中解释的那样,最好将每个正则表达式连接到一个大正则表达式,而不是只运行匹配器一个。这是一个很大的改进,因为您经常重用正则表达式。