我可以替换 Java 正则表达式中的组吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/988655/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can I replace groups in Java regex?
提问by wokena
I have this code, and I want to know, if I can replace only groups (not all pattern) in Java regex. Code:
我有这个代码,我想知道,如果我只能替换 Java 正则表达式中的组(不是所有模式)。代码:
//...
Pattern p = Pattern.compile("(\d).*(\d)");
String input = "6 example input 4";
Matcher m = p.matcher(input);
if (m.find()) {
//Now I want replace group one ( (\d) ) with number
//and group two (too (\d) ) with 1, but I don't know how.
}
采纳答案by Chadwick
Use $n
(where n is a digit) to refer to captured subsequences in replaceFirst(...)
. I'm assuming you wanted to replace the first group with the literal string "number"and the second group with the value of the first group.
使用$n
(其中 n 是数字)指代 中捕获的子序列replaceFirst(...)
。我假设您想用文字字符串“number”替换第一组,用第一组的值替换第二组。
Pattern p = Pattern.compile("(\d)(.*)(\d)");
String input = "6 example input 4";
Matcher m = p.matcher(input);
if (m.find()) {
// replace first number with "number" and second number with the first
String output = m.replaceFirst("number "); // number 46
}
Consider (\D+)
for the second group instead of (.*)
. *
is a greedy matcher, and will at first consume the last digit. The matcher will then have to backtrack when it realizes the final (\d)
has nothing to match, before it can match to the final digit.
考虑(\D+)
第二组而不是(.*)
。 *
是一个贪婪的匹配器,首先会消耗最后一位数字。当匹配器意识到最终(\d)
没有匹配项时,匹配器将不得不回溯,然后才能匹配到最终数字。
回答by mkb
Add a third group by adding parens around .*
, then replace the subsequence with "number" + m.group(2) + "1"
. e.g.:
通过在周围添加括号来添加第三组.*
,然后用 替换子序列"number" + m.group(2) + "1"
。例如:
String output = m.replaceFirst("number" + m.group(2) + "1");
回答by ydanneg
You can use matcher.start() and matcher.end() methods to get the group positions. So using this positions you can easily replace any text.
您可以使用 matcher.start() 和 matcher.end() 方法来获取组位置。因此,使用此位置,您可以轻松替换任何文本。
回答by acdcjunior
You could use Matcher#start(group)
and Matcher#end(group)
to build a generic replacement method:
您可以使用Matcher#start(group)
和Matcher#end(group)
构建通用替换方法:
public static String replaceGroup(String regex, String source, int groupToReplace, String replacement) {
return replaceGroup(regex, source, groupToReplace, 1, replacement);
}
public static String replaceGroup(String regex, String source, int groupToReplace, int groupOccurrence, String replacement) {
Matcher m = Pattern.compile(regex).matcher(source);
for (int i = 0; i < groupOccurrence; i++)
if (!m.find()) return source; // pattern not met, may also throw an exception here
return new StringBuilder(source).replace(m.start(groupToReplace), m.end(groupToReplace), replacement).toString();
}
public static void main(String[] args) {
// replace with "%" what was matched by group 1
// input: aaa123ccc
// output: %123ccc
System.out.println(replaceGroup("([a-z]+)([0-9]+)([a-z]+)", "aaa123ccc", 1, "%"));
// replace with "!!!" what was matched the 4th time by the group 2
// input: a1b2c3d4e5
// output: a1b2c3d!!!e5
System.out.println(replaceGroup("([a-z])(\d)", "a1b2c3d4e5", 2, 4, "!!!"));
}
Check online demo here.
回答by Jonas_Hess
Here is a different solution, that also allows the replacement of a single group in multiple matches. It uses stacks to reverse the execution order, so the string operation can be safely executed.
这是一个不同的解决方案,它也允许在多个匹配中替换单个组。它使用堆栈来反转执行顺序,因此可以安全地执行字符串操作。
private static void demo () {
final String sourceString = "hello world!";
final String regex = "(hello) (world)(!)";
final Pattern pattern = Pattern.compile(regex);
String result = replaceTextOfMatchGroup(sourceString, pattern, 2, world -> world.toUpperCase());
System.out.println(result); // output: hello WORLD!
}
public static String replaceTextOfMatchGroup(String sourceString, Pattern pattern, int groupToReplace, Function<String,String> replaceStrategy) {
Stack<Integer> startPositions = new Stack<>();
Stack<Integer> endPositions = new Stack<>();
Matcher matcher = pattern.matcher(sourceString);
while (matcher.find()) {
startPositions.push(matcher.start(groupToReplace));
endPositions.push(matcher.end(groupToReplace));
}
StringBuilder sb = new StringBuilder(sourceString);
while (! startPositions.isEmpty()) {
int start = startPositions.pop();
int end = endPositions.pop();
if (start >= 0 && end >= 0) {
sb.replace(start, end, replaceStrategy.apply(sourceString.substring(start, end)));
}
}
return sb.toString();
}
回答by Yaro
Sorry to beat a dead horse, but it is kind-of weird that no-one pointed this out - "Yes you can, but this is the opposite of how you use capturing groups in real life".
很抱歉打败了一匹死马,但没有人指出这一点有点奇怪 - “是的,你可以,但这与你在现实生活中使用捕获组的方式相反”。
If you use Regex the way it is meant to be used, the solution is as simple as this:
如果您按照使用的方式使用正则表达式,则解决方案非常简单:
"6 example input 4".replaceAll("(?:\d)(.*)(?:\d)", "number");
Or as rightfully pointed out by shmosel below,
或者正如下面的 shmosel 正确指出的那样,
"6 example input 4".replaceAll("\d(.*)\d", "number");
...since in your regex there is no good reason to group the decimals at all.
...因为在您的正则表达式中,根本没有充分的理由对小数进行分组。
You don't usually use capturinggroups on the parts of the string you want to discard, you use them on the part of the string you want to keep.
您通常不会在要丢弃的字符串部分使用捕获组,而是在要保留的字符串部分使用它们。
If you really want groups that you want to replace, what you probably want instead is a templating engine (e.g. moustache, ejs, StringTemplate, ...).
如果您真的想要替换组,那么您可能想要的是模板引擎(例如,小胡子、ejs、StringTemplate 等)。
As an aside for the curious, even non-capturing groups in regexes are just there for the case that the regex engine needs them to recognize and skip variable text. For example, in
顺便说一句,即使正则表达式中的非捕获组也只是在正则表达式引擎需要它们识别和跳过变量文本的情况下才存在。例如,在
(?:abc)*(capture me)(?:bcd)*
you need them if your input can look either like "abcabccapture mebcdbcd" or "abccapture mebcd" or even just "capture me".
如果您的输入看起来像“abcabc capture mebcdbcd”或“abc capture mebcd”甚至只是“capture me”,则您需要它们。
Or to put it the other way around: if the text is always the same, and you don't capture it, there is no reason to use groups at all.
或者反过来说:如果文本总是相同的,而您没有捕获它,则根本没有理由使用组。
回答by whimmy
replace the password fields from the input:
替换输入中的密码字段:
{"_csrf":["9d90c85f-ac73-4b15-ad08-ebaa3fa4a005"],"originPassword":["uaas"],"newPassword":["uaas"],"confirmPassword":["uaas"]}
private static final Pattern PATTERN = Pattern.compile(".*?password.*?\":\[\"(.*?)\"\](,\"|}$)", Pattern.CASE_INSENSITIVE);
private static String replacePassword(String input, String replacement) {
Matcher m = PATTERN.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
Matcher m2 = PATTERN.matcher(m.group(0));
if (m2.find()) {
StringBuilder stringBuilder = new StringBuilder(m2.group(0));
String result = stringBuilder.replace(m2.start(1), m2.end(1), replacement).toString();
m.appendReplacement(sb, result);
}
}
m.appendTail(sb);
return sb.toString();
}
@Test
public void test1() {
String input = "{\"_csrf\":[\"9d90c85f-ac73-4b15-ad08-ebaa3fa4a005\"],\"originPassword\":[\"123\"],\"newPassword\":[\"456\"],\"confirmPassword\":[\"456\"]}";
String expected = "{\"_csrf\":[\"9d90c85f-ac73-4b15-ad08-ebaa3fa4a005\"],\"originPassword\":[\"**\"],\"newPassword\":[\"**\"],\"confirmPassword\":[\"**\"]}";
Assert.assertEquals(expected, replacePassword(input, "**"));
}