Java - 使用正则表达式提取字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1224934/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 01:07:04  来源:igfitidea点击:

Java - Extract strings with Regex

javaregex

提问by mickthompson

I've this string

我有这个字符串

String myString ="A~BC~FGH~~zuzy|XX~ 1234~ ~~ABC~01/01/2010 06:30~BCD~01/01/2011 07:45";

and I need to extract these 3 substrings
1234
06:30
07:45

If I use this regex \\d{2}\:\\d{2} I'm only able to extract the first hour 06:30

我需要提取这 3 个子字符串
1234
06:30
07:45

如果我使用这个正则表达式 \\d{2}\:\\d{2} 我只能提取第一个小时 06:30

Pattern depArrHours = Pattern.compile("\d{2}\:\d{2}");
Matcher matcher = depArrHours.matcher(myString);
String firstHour = matcher.group(0);
String secondHour = matcher.group(1); (IndexOutOfBoundException no Group 1)

matcher.group(1) throws an exception.
Also I don't know how to extract 1234. This string can change but it always comes after 'XX~ '
Do you have any idea on how to match these strings with regex expressions?

matcher.group(1) 抛出异常。
另外我不知道如何提取 1234。这个字符串可以改变,但它总是在 'XX~' 之后
你知道如何将这些字符串与正则表达式匹配吗?

UPDATE

更新

Thanks to Adam suggestion I've now this regex that match my string

感谢 Adam 的建议,我现在有了与我的字符串匹配的正则表达式

Pattern p = Pattern.compile(".*XX~ (\d{3,4}).*(\d{1,2}:\d{2}).*(\d{1,2}:\d{2})";

I match the number, and the 2 hours with matcher.group(1); matcher.group(2); matcher.group(3);

我将数字和 2 小时与 matcher.group(1) 匹配;matcher.group(2); matcher.group(3);

采纳答案by Adam Batkin

The matcher.group()function expects to take a single integer argument: The capturing group index, starting from 1. The index 0 is special, which means "the entire match". A capturing group is created using a pair of parenthesis "(...)". Anything within the parenthesis is captures. Groups are numbered from left to right (again, starting from 1), by opening parenthesis (which means that groups can overlap). Since there are no parenthesis in your regular expression, there can be no group 1.

matcher.group()函数需要一个整数参数:捕获组索引,从 1 开始。索引 0 是特殊的,这意味着“整个匹配”。使用一对括号“ (...)”创建捕获组。括号内的任何内容都是捕获。组从左到右编号(同样,从 1 开始),通过左括号(这意味着组可以重叠)。由于正则表达式中没有括号,因此不能有第 1 组。

The javadoc on the Patternclass covers the regular expression syntax.

Pattern类上的 javadoc涵盖了正则表达式语法。

If you are looking for a pattern that might recur some number of times, you can use Matcher.find()repeatedly until it returns false. Matcher.group(0)once on each iteration will then return what matched that time.

如果您正在寻找可能重复出现多次的模式,您可以重复使用Matcher.find()直到它返回 false。Matcher.group(0)每次迭代一次,然后将返回匹配该时间的内容。

If you want to build one big regular expression that matches everything all at once (which I believe is what you want) then around each of the three sets of things that you want to capture, put a set of capturing parenthesis, use Matcher.match()and then Matcher.group(n)where n is 1, 2 and 3 respectively. Of course Matcher.match()might also return false, in which case the pattern did not match, and you can't retrieve any of the groups.

如果你想构建一个大的正则表达式来一次匹配所有的东西(我相信这是你想要的)然后围绕你想要捕获的三组事物中的每一个,放置一组捕获括号,使用Matcher.match()然后Matcher.group(n)在哪里n 分别为 1、2 和 3。当然Matcher.match()也可能返回 false,在这种情况下模式不匹配,并且您无法检索任何组。

In your example, what you probably want to do is have it match some preceding text, then start a capturing group, match for digits, end the capturing group, etc...I don't know enough about your exact input format, but here is an example.

在你的例子中,你可能想要做的是让它匹配一些前面的文本,然后开始一个捕获组,匹配数字,结束捕获组等等......我对你的确切输入格式了解不够,但是这是一个例子。

Lets say I had strings of the form:

假设我有以下形式的字符串:

Eat 12 carrots at 12:30
Take 3 pills at 01:15

And I wanted to extract the quantity and times. My regular expression would look something like:

我想提取数量和时间。我的正则表达式看起来像:

"\w+ (\d+) [\w ]+ (\d{1,2}:\d{2})"

The code would look something like:

代码如下所示:

Pattern p = Pattern.compile("\w+ (\d+) [\w ]+ (\d{2}:\d{2})");
Matcher m = p.matcher(oneline);
if(m.matches()) {
    System.out.println("The quantity is " + m.group(1));
    System.out.println("The time is " + m.group(2));
}

The regular expression means "a string containing a word, a space, one or more digits (which are captured in group 1), a space, a set of words and spaces ending with a space, followed by a time (captured in group 2, and the time assumes that hour is always 0-padded out to 2 digits). I would give a closer example to what you are looking for, but the description of the possible input is a little vague.

正则表达式的意思是“一个字符串,包含一个单词、一个空格、一个或多个数字(在第 1 组中捕获)、一个空格、一组单词和以空格结尾的空格,后跟一个时间(在第 2 组中捕获) , 并且时间假定小时总是从 0 填充到 2 位数字)。我会给出一个更接近您正在寻找的示例,但对可能输入的描述有点模糊。