在 Java 中使用正则表达式提取值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/237061/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Regular Expressions to Extract a Value in Java
提问by Craig Walker
I have several strings in the rough form:
我有几个粗略的字符串:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
我想使用 Java Regex 类提取 [some number] 中的文本。
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
我大致知道我想使用什么正则表达式(尽管欢迎所有建议)。我真正感兴趣的是 Java 调用以获取正则表达式字符串并在源数据上使用它来生成 [某个数字] 的值。
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].
编辑:我应该补充一点,我只对一个[某个数字](基本上是第一个实例)感兴趣。源字符串很短,我不会寻找[某个数字] 的多次出现。
采纳答案by Allain Lalonde
Full example:
完整示例:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
由于您正在寻找第一个数字,您可以使用这样的正则表达式:
^\D+(\d+).*
and m.group(1)
will return you the first number. Note that signed numbers can contain a minus sign:
并且m.group(1)
会回报你的第一个数字。请注意,有符号数可以包含减号:
^\D+(-?\d+).*
回答by Hyman Leow
In Java 1.4 and up:
在 Java 1.4 及更高版本中:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}
回答by Axeman
Allain basically has the java code, so you can use that. However, his expression only matches ifyour numbers are only preceded by a stream of word characters.
Allin 基本上有 java 代码,所以你可以使用它。但是,他的表达式仅在您的数字前面仅是一串单词字符时才匹配。
"(\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
应该能够找到第一串数字。如果您确定它将是第一个数字字符串,则无需指定它之前的内容。同样,除非您想要,否则指定其后的内容也没有用。如果您只想要数字,并且确定它将是一个或多个数字的第一个字符串,那么这就是您所需要的。
If you expect it to be offset by spaces, it will make it even more distinct to specify
如果您希望它被空格抵消,那么指定它会更加明显
"\s+(\d+)\s+"
might be better.
可能会更好。
If you need all three parts, this will do:
如果您需要所有三个部分,这将执行以下操作:
"(\D+)(\d+)(.*)"
EDITThe Expressions given by Allain and Hyman suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d
then it's going to ignore everything before the digits. If J or A's expression fitsyour pattern, then the whole match equalsthe input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.
编辑Alllain 和 Hyman 给出的表达式表明您需要指定一些非数字子集以捕获数字。如果你告诉你正在寻找的正则表达式引擎,\d
那么它会忽略数字之前的所有内容。如果J或A的表达适合你的模式,那么整个比赛等于该输入字符串。而且没有理由指定它。如果它没有被完全忽略,它可能会减慢一场干净的比赛。
回答by arturo
How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).*
I think it would take care of numbers with fractional part.
I included white spaces and included ,
as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.
[^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).*
我认为它会处理带有小数部分的数字如何。我包括了空格并,
尽可能包括了分隔符。我试图从包含浮点数的字符串中获取数字,并考虑到用户可能会犯错误并在键入数字时包含空格。
回答by Tint Naing Win
Try doing something like this:
尝试做这样的事情:
Pattern p = Pattern.compile("^.+(\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}
回答by javaMan
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
输出:
1234
789
2345
回答by Vitalii Fedorenko
回答by shounak
Look you can do it using StringTokenizer
看你可以使用 StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)
由于我们将这些数字数据放入三个不同的变量中,因此我们可以在代码中的任何位置使用这些数据(以供进一步使用)
回答by LukaszTaraszka
This function collect all matching sequences from string. In this example it takes all email addresses from string.
此函数从字符串中收集所有匹配的序列。在这个例子中,它从字符串中获取所有电子邮件地址。
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\+]+(\.[_A-Za-z0-9-]+)*@"
+ "[A-Za-z0-9-]+(\.[A-Za-z0-9]+)*(\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "[email protected], <[email protected]>>>> [email protected]"
it will create List of 3 elements.
因为message = "[email protected], <[email protected]>>>> [email protected]"
它将创建 3 个元素的列表。
回答by user1722707
Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
有时您可以使用 java.lang.String 中提供的简单 .split("REGEXP") 方法。例如:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]