java Java如何根据输入检查多个正则表达式模式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42988414/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 07:00:04  来源:igfitidea点击:

Java how to check multiple regex patterns against an input?

javaregex

提问by SuperCow

(If I'm taking the complete wrong direction let me know if there is a better way I should be approaching this)

(如果我的方向完全错误,请告诉我是否有更好的方法来解决这个问题)

I have a Java program that will have multiple patterns that I want to compare against an input. If one of the patterns matches then I want to save that value in a String. I can get it to work with a single pattern but I'd like to be able to check against many.

我有一个 Java 程序,它将有多个模式,我想与输入进行比较。如果其中一个模式匹配,那么我想将该值保存在一个字符串中。我可以让它与单一模式一起工作,但我希望能够检查许多模式。

Right now I have this to check if an input matches one pattern:

现在我有这个来检查输入是否匹配一个模式:

Pattern pattern = Pattern.compile("TST\w{1,}");
Matcher match = pattern.matcher(input);
String ID = match.find()?match.group():null;

So, if the input was TST1234 or abcTST1234 then ID = "TST1234"

所以,如果输入是 TST1234 或 abcTST1234 那么 ID = "TST1234"

I want to have multiple patterns like:

我想要多种模式,例如:

Pattern pattern = Pattern.compile("TST\w{1,}");
Pattern pattern = Pattern.compile("TWT\w{1,}");
...

and then to a collection and then check each one against the input:

然后到一个集合,然后根据输入检查每个集合:

List<Pattern> rxs = new ArrayList<Pattern>();
rxs.add(pattern);
rxs.add(pattern2);

String ID = null;

for (Pattern rx : rxs) {
    if (rx.matcher(requestEnt).matches()){
        ID = //???
    }
}

I'm not sure how to set ID to what I want. I've tried

我不确定如何将 ID 设置为我想要的。我试过了

ID = rx.matcher(requestEnt).group();

and

ID = rx.matcher(requestEnt).find()?rx.matcher(requestEnt).group():null;

Not really sure how to make this work or where to go from here though. Any help or suggestions are appreciated. Thanks.

不太确定如何使这项工作或从这里去哪里。任何帮助或建议表示赞赏。谢谢。

EDIT: Yes the patterns will change over time. So The patten list will grow.

编辑:是的,模式会随着时间的推移而改变。所以模式列表将会增长。

I just need to get the string of the match...ie if the input is abcTWT123 it will first check against "TST\w{1,}", then move on to "TWT\w{1,}" and since that matches the ID String will be set to "TWT123".

我只需要获取匹配的字符串...即,如果输入是 abcTWT123,它将首先检查“TST\w{1,}”,然后转到“TWT\w{1,}”,从那以后匹配的 ID 字符串将被设置为“TWT123”。

回答by sprinter

To collect the matched string in the result you may need to create a group in your regexp if you are matching less than the entire string:

要在结果中收集匹配的字符串,如果您匹配的字符串少于整个字符串,您可能需要在正则表达式中创建一个组:

List<Pattern> patterns = new ArrayList<>();
patterns.add(Pattern.compile("(TST\w+)");
...

Optional<String> result = Optional.empty();
for (Pattern pattern: patterns) {
    Matcher matcher = pattern.match();
    if (matcher.matches()) {
        result = Optional.of(matcher.group(1));
        break;
    }
}

Or, if you are familiar with streams:

或者,如果您熟悉流:

Optional<String> result = patterns.stream()
    .map(Pattern::match).filter(Matcher::matches)
    .map(m -> m.group(1)).findFirst();

The alternative is to use find(as in @Raffaele's answer) that implicitly creates a group.

另一种方法是使用find(如@Raffaele 的回答)隐式创建一个组。

Another alternative you may want to consider is to put all your matches into a single pattern.

您可能要考虑的另一种选择是将所有匹配项放入一个模式中。

Pattern pattern = Pattern.compile("(TST\w+|TWT\w+|...");

Then you can match and group in a single operation. However this might might it harder to change the matches over time.

然后,您可以在单个操作中进行匹配和分组。然而,随着时间的推移,这可能更难改变匹配。

Group 1 is the first matched group (i.e. the match inside the first set of parentheses). Group 0 is the entire match. So if you want the entire match (I wasn't sure from your question) then you could perhaps use group 0.

第 1 组是第一个匹配的组(即第一组括号内的匹配)。第 0 组是整场比赛。因此,如果您想要整个比赛(我不确定您的问题),那么您也许可以使用组 0。

回答by Raffaele

Maybe you just need to end the loop when the first pattern matches:

也许您只需要在第一个模式匹配时结束循环:

// TST\w{1,}
// TWT\w{1,}
private List<Pattern> patterns;

public String findIdOrNull(String input) {
  for (Pattern p : patterns) {
    Matcher m = p.matcher(input);
    // First match. If the whole string must match use .matches()
    if (m.find()) {
      return m.group(0);
    }
  }
  return null; // Or throw an Exception if this should never happen
}

回答by Bohemian

Use an alternation |(a regex OR):

使用交替|(正则表达式 OR):

Pattern pattern = Pattern.compile("TST\w+|TWT\w+|etc");

Then just check the pattern once.

然后只需检查一次模式。

Note also that {1,}can be replaced with +.

另请注意,{1,}可以替换为+.

回答by Stephen P

If your patterns are all going to be simple prefixes like your examples TSTand TWTyou can define all of those at once, and user regex alternation |so you won't need to loop over the patterns.

如果你的模式都是简单的前缀,比如你的例子TSTTWT,你可以一次定义所有这些,并且用户正则表达式交替,|这样你就不需要循环模式。

An example:

一个例子:

    String prefixes = "TWT|TST|WHW";
    String regex = "(" + prefixes + ")\w+";
    Pattern pattern = Pattern.compile(regex);

    String input = "abcTST123";
    Matcher match = pattern.matcher(input);
    String ID = match.find() ? match.group() : null;

    // given this, ID will come out as "TST123"

Now prefixescould be read in from a java .propertiesfile, or a simple text file; or passed as a parameter to the method that does this.
You could also define the prefixes as a comma-separated list or one-per-line in a file then process that to turn them into one|two|three|etcbefore passing it on.

现在prefixes可以从 java.properties文件或简单的文本文件中读取;或作为参数传递给执行此操作的方法。
您还可以将前缀定义为逗号分隔的列表或文件中的每行一个,然后one|two|three|etc在传递之前对其进行处理以将其转换。

You may be looping over several inputs, and then you would want to create the regexand patternvariables only once, creating only the Matcher for each separate input.

您可能会遍历多个输入,然后您只想创建regexpattern变量一次,仅为每个单独的输入创建匹配器。