使用 Java 正则表达式模式解析字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45175606/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 08:33:59  来源:igfitidea点击:

Parse string using Java Regex Pattern?

javaregexstringpattern-matching

提问by Bharath Reddy

I have the below java string in the below format.

我有以下格式的java字符串。

String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:"

Using the java.util.regex package matter and pattern classes I have to get the output string int the following format:

使用 java.util.regex 包问题和模式类,我必须获得以下格式的输出字符串 int :

Output: [NYK:1100][CLT:2300][KTY:3540]

Can you suggest a RegEx pattern which can help me get the above output format?

你能建议一个可以帮助我获得上述输出格式的 RegEx 模式吗?

回答by YCF_L

You can use this regex \[name:([A-Z]+)\]\[distance:(\d+)\]with Pattern like this :

您可以\[name:([A-Z]+)\]\[distance:(\d+)\]像这样将此正则表达式与 Pattern 一起使用:

String regex = "\[name:([A-Z]+)\]\[distance:(\d+)\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);

StringBuilder result = new StringBuilder();
while (matcher.find()) {                                                
    result.append("[");
    result.append(matcher.group(1));
    result.append(":");
    result.append(matcher.group(2));
    result.append("]");
}

System.out.println(result.toString());

Output

输出

[NYK:1100][CLT:2300][KTY:3540]
  • regex demo
  • \[name:([A-Z]+)\]\[distance:(\d+)\]mean get two groups one the upper letters after the \[name:([A-Z]+)\]the second get the number after \[distance:(\d+)\]
  • 正则表达式演示
  • \[name:([A-Z]+)\]\[distance:(\d+)\]意思是得到两个组,一个是大写字母\[name:([A-Z]+)\],第二个是后面的数字\[distance:(\d+)\]


Another solution from @tradeJmarkyou can use this regex :

@tradeJmark 的另一个解决方案您可以使用这个正则表达式:

String regex = "\[name:(?<name>[A-Z]+)\]\[distance:(?<distance>\d+)\]";

So you can easily get the results of each group by the name of group instead of the index like this :

因此,您可以通过组名而不是像这样的索引轻松获取每个组的结果:

while (matcher.find()) {                                                
    result.append("[");
    result.append(matcher.group("name"));
    //----------------------------^^
    result.append(":");
    result.append(matcher.group("distance"));
    //------------------------------^^
    result.append("]");
}

回答by Wiktor Stribi?ew

If the format of the string is fixed, and you always have just 3 [...]groups inside to deal with, you may define a block that matches [name:...]and captures the 2 parts into separate groups and use a quite simple code with .replaceAll:

如果字符串的格式是固定的,并且[...]内部总是只有 3 个组要​​处理,则可以定义一个块来匹配[name:...]并将这 2 个部分捕获到单独的组中,并使用非常简单的代码.replaceAll

String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:";
String matchingBlock = "\s*\[name:([A-Z]+)]\[distance:(\d+)]";
String res = s.replaceAll(String.format(".*%1$s%1$s%1$s.*", matchingBlock), 
    "[:][:][:]");
System.out.println(res); // [NYK:1100][CLT:2300][KTY:3540]

See the Java demoand a regex demo.

请参阅Java 演示正则表达式演示

The block pattern matches:

块模式匹配:

  • \\s*- 0+ whitespaces
  • \\[name:- a literal [name:substring
  • ([A-Z]+)- Group n capturing 1 or more uppercase ASCII chars (\\w+can also be used)
  • ]\\[distance:- a literal ][distance:substring
  • (\\d+)- Group m capturing 1 or more digits
  • ]- a ]symbol.
  • \\s*- 0+ 个空格
  • \\[name:- 文字[name:子串
  • ([A-Z]+)- 组 n 捕获 1 个或多个大写 ASCII 字符(\\w+也可以使用)
  • ]\\[distance:- 文字][distance:子串
  • (\\d+)- 组 m 捕获 1 个或多个数字
  • ]- 一个]符号。

In the .*%1$s%1$s%1$s.*pattern, the groups will have 1 to 6 IDs (referred to with $1- $6backreferences from the replacement pattern) and the leading and final .*will remove start and end of the string (add (?s)at the start of the pattern if the string can contain line breaks).

.*%1$s%1$s%1$s.*模式中,组将有 1 到 6 个 ID(用$1-$6来自替换模式的反向引用引用),前导.*和结尾将删除字符串的开头和结尾((?s)如果字符串可以包含行,则在模式的开头添加断)。