正则表达式可选组捕获 JAVA

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21267929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 07:47:08  来源:igfitidea点击:

regex optional group capturing JAVA

javaregex

提问by rad07

I have a pattern where a user specifies:

我有一个用户指定的模式:

1998-2010:Make:model:trim:engine

trimand engineare optional, if present I should capture them; if not, the matcher should at least validate YMM.

trim并且engine是可选的,如果存在我应该捕获它们;如果没有,匹配器至少应该验证 YMM。

([0-9]+-*[0-9]+):(.*):(.*):(.*):(.*)

This matches if all three are there, but how do I make the last two and only two fields optional?

如果所有三个都在那里,这匹配,但我如何使最后两个和只有两个字段可选?

采纳答案by Joshua Taylor

Using a regular expression and ?, the “zero or one quantifier”

使用正则表达式 and ?,“零或一个量词”

You can use ?to match zero or one of something, which is what you want to do with the last bit. However, your pattern needs a bit a modification to be more like [^:]*rather than .*. Some sample code and its?output follow. The regular expression I ended up with was:

您可以使用?匹配零或一个,这是您想要对最后一位执行的操作。但是,您的模式需要稍作修改才能更像[^:]*而不是.*. 一些示例代码及其输出如下。我最终得到的正则表达式是:

([^:]*):([^:]*):([^:]*)(?::([^:]*))?(?::([^:]*))?
|-----| |-----| |-----|    |-----|      |-----|
   a       a       a          a            a

                       |-----------||-----------|
                             b            b

Each amatches a sequence of non colons (although you'd want to modify the first one to match years), and bis a non-capturinggroup (so it starts with ?:) and matches zero or one time (because it has the final ?quantifier). This means that the fourth and fifth fields are optional. The sample code shows that this pattern matches in the case that there are three, four, or five fields present, and does not match if there are more than five fields or fewer than?three.

每个a匹配一系列非冒号(尽管您想修改第一个以匹配年份),并且b是一个非捕获组(因此它以 开头?:)并且匹配零次或一次(因为它具有最终?量词) . 这意味着第四和第五个字段是可选的。示例代码显示,此模式在存在三个、四个或五个字段的情况下匹配,如果字段多于五个或少于三个则不匹配。

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class QuestionMarkQuantifier {
    public static void main(String[] args) {
        final String input = "a:b:c:d:e:f:g:h";
        final Pattern p = Pattern.compile( "([^:]*):([^:]*):([^:]*)(?::([^:]*))?(?::([^:]*))?" );
        for ( int i = 1; i <= input.length(); i += 2 ) {
            final String string = input.substring( 0, i );
            final Matcher m = p.matcher( string );
            if ( m.matches() ) {
                System.out.println( "\n=== Matches for: "+string+" ===" );
                final int count = m.groupCount();
                for ( int j = 0; j <= count; j++ ) {
                    System.out.println( j + ": "+ m.group( j ));
                }
            }
            else {
                System.out.println( "\n=== No matches for: "+string+" ===" );
            }
        }
    }
}
=== No matches for: a ===

=== No matches for: a:b ===

=== Matches for: a:b:c ===
0: a:b:c
1: a
2: b
3: c
4: null
5: null

=== Matches for: a:b:c:d ===
0: a:b:c:d
1: a
2: b
3: c
4: d
5: null

=== Matches for: a:b:c:d:e ===
0: a:b:c:d:e
1: a
2: b
3: c
4: d
5: e

=== No matches for: a:b:c:d:e:f ===

=== No matches for: a:b:c:d:e:f:g ===

=== No matches for: a:b:c:d:e:f:g:h ===

While it's certainly possible to match this kind of string by using a regular expression, it does seem like it might be easier to just split the string on :and check how many values you get back. That doesn't necessarily do other kinds of checking (e.g., characters in each field), so maybe splitting isn't quite so useful in whatever non-minimal situation is motivating this.

虽然通过使用正则表达式当然可以匹配这种字符串,但似乎只是拆分字符串:并检查返回的值可能更容易。这不一定会进行其他类型的检查(例如,每个字段中的字符),因此在任何非最小情况下,拆分可能不是那么有用。

Using String.split and a limit parameter

使用 String.split 和一个限制参数

I noticed your commenton another post that recommended using String.split(String)(emphasis added):

我注意到对另一篇推荐使用String.split(String) 的帖子的评论(强调):

Yes I know this function, but it work for me cause I have a string which is a:b:c:d:e:f:g:h.. but I just want to group the data as a:b:c:d:e if any as one and the rest of the string as another group

是的,我知道这个函数,但它对我有用,因为我有一个字符串 a:b:c:d:e:f:g:h.. 但我只想将数据分组为 a:b:c: d:e 如果任何作为一个和字符串的其余部分作为另一个组

It's worth noting that there's a version of split that takes one more parameter, String.split(String,int). The second parameter is a limit, described as:

值得注意的是,有一个 split 版本需要一个更多的参数String.split(String,int)。第二个参数是一个限制,描述为:

The limitparameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit nis greater than zero then the pattern will be applied at most n- 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If nis non-positive then the pattern will be applied as many times as possible and the array can have any length. If nis zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

limit参数控制应用模式的次数,因此会影响结果数组的长度。如果限制n大于零,则该模式将最多应用n- 1 次,数组的长度将不大于n,并且数组的最后一个条目将包含最后一个匹配的分隔符之外的所有输入。如果n为非正数,则该模式将被应用尽可能多的次数,并且数组可以具有任意长度。如果n为零,则该模式将被应用尽可能多的次数,数组可以具有任意长度,并且将丢弃尾随的空字符串。

This means that you could use split and the limit 6 to get up to five fields from your input, and you'd have the remaining input as the last string. You'd still have to check whether you had at least3 elements, to make sure that there was enough input, but all in all, this seems like it might be a bit simpler.

这意味着您可以使用 split 和 limit 6 从您的输入中获取最多五个字段,并将剩余的输入作为最后一个字符串。您仍然需要检查您是否至少有3 个元素,以确保有足够的输入,但总而言之,这似乎可能更简单一些。

import java.util.Arrays;

public class QuestionMarkQuantifier {
    public static void main(String[] args) {
        final String input = "a:b:c:d:e:f:g:h";
        for ( int i = 1; i <= input.length(); i += 2 ) {
            final String string = input.substring( 0, i );
            System.out.println( "\n== Splits for "+string+" ===" );
            System.out.println( Arrays.toString( string.split( ":", 6 )));
        }
    }
}
== Splits for a ===
[a]

== Splits for a:b ===
[a, b]

== Splits for a:b:c ===
[a, b, c]

== Splits for a:b:c:d ===
[a, b, c, d]

== Splits for a:b:c:d:e ===
[a, b, c, d, e]

== Splits for a:b:c:d:e:f ===
[a, b, c, d, e, f]

== Splits for a:b:c:d:e:f:g ===
[a, b, c, d, e, f:g]

== Splits for a:b:c:d:e:f:g:h ===
[a, b, c, d, e, f:g:h]

回答by kmera

Why not skip the regex and use split(":"). Seems to be straight forward. From the length of the resulting array you will then know whether or not model and engine etc was provided.

为什么不跳过正则表达式并使用split(":"). 似乎是直接的。根据结果​​数组的长度,您将知道是否提供了模型和引擎等。

String str = "1998-2010:Make:model:trim:engine";
String[] parts  = str.split(":");
//parts[0] == Y
//parts[1] == M
//parts[2] == M
//etc

Edit: As others have mentioned, String.splituses a regex pattern too. In my oppinion that doesn't really matter though. To have a truly regex-less solution use StrwingUtils.splitfrom apache commons (which does not use a regex at all) :)

编辑:正如其他人所提到的,也String.split使用正则表达式模式。在我看来,这并不重要。StrwingUtils.split从 apache commons 中使用真正的 regex-less 解决方案(根本不使用正则表达式):)