java 在java正则表达式中获取组名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15588903/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get group names in java regex
提问by Roy Reznik
I'm trying to receive both a pattern & a string and return a map of group name -> matched result.
我试图同时接收一个模式和一个字符串,并返回一个组名 -> 匹配结果的映射。
Example:
例子:
(?<user>.*)
I would like to return for a map containing "user" as a key and whatever it matches as its value.
我想返回一个包含“用户”作为键以及它匹配的任何值作为其值的地图。
the problem is that I can't seem to get the group name from the Java regex api. I can only get the matched values by name or by index. I don't have the list of group names and neither Pattern nor Matcher seem to expose this information. I have checked its source and it seems as if the information is there - it's just not exposed to the user.
问题是我似乎无法从 Java regex api 获取组名。我只能按名称或按索引获取匹配的值。我没有组名列表,Pattern 和 Matcher 似乎都没有公开这些信息。我检查了它的来源,似乎信息就在那里——它只是没有暴露给用户。
I tried both Java's java.util.regex and jregex. (and don't really care if someone suggested any other library that is good, supported & high in terms performance that supports this feature).
我尝试了 Java 的 java.util.regex 和 jregex。(并且并不真正关心是否有人建议了任何其他在支持此功能方面性能良好、受支持和高的库)。
回答by nhahtdh
There is no API in Java to obtain the names of the named capturing groups. I think this is a missing feature.
Java 中没有用于获取命名捕获组名称的 API。我认为这是一个缺失的功能。
The easy way out is to pick out candidate named capturing groups from the pattern, then try to access the named group from the match. In other words, you don't know the exact names of the named capturing groups, until you plug in a string that matches the whole pattern.
简单的方法是从模式中挑选出候选命名捕获组,然后尝试从 match访问命名组。换句话说,在插入与整个模式匹配的字符串之前,您不知道命名捕获组的确切名称。
The Pattern
to capture the names of the named capturing group is \(\?<([a-zA-Z][a-zA-Z0-9]*)>
(derived based on Pattern
class documentation).
该Pattern
捕捉名为捕获组的名称是\(\?<([a-zA-Z][a-zA-Z0-9]*)>
(根据派生Pattern
类的文档)。
(The hard way is to implement a parser for regex and get the names of the capturing groups).
(困难的方法是为正则表达式实现解析器并获取捕获组的名称)。
A sample implementation:
示例实现:
import java.util.Scanner;
import java.util.Set;
import java.util.TreeSet;
import java.util.Iterator;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.MatchResult;
class RegexTester {
public static void main(String args[]) {
Scanner scanner = new Scanner(System.in);
String regex = scanner.nextLine();
StringBuilder input = new StringBuilder();
while (scanner.hasNextLine()) {
input.append(scanner.nextLine()).append('\n');
}
Set<String> namedGroups = getNamedGroupCandidates(regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
int groupCount = m.groupCount();
int matchCount = 0;
if (m.find()) {
// Remove invalid groups
Iterator<String> i = namedGroups.iterator();
while (i.hasNext()) {
try {
m.group(i.next());
} catch (IllegalArgumentException e) {
i.remove();
}
}
matchCount += 1;
System.out.println("Match " + matchCount + ":");
System.out.println("=" + m.group() + "=");
System.out.println();
printMatches(m, namedGroups);
while (m.find()) {
matchCount += 1;
System.out.println("Match " + matchCount + ":");
System.out.println("=" + m.group() + "=");
System.out.println();
printMatches(m, namedGroups);
}
}
}
private static void printMatches(Matcher matcher, Set<String> namedGroups) {
for (String name: namedGroups) {
String matchedString = matcher.group(name);
if (matchedString != null) {
System.out.println(name + "=" + matchedString + "=");
} else {
System.out.println(name + "_");
}
}
System.out.println();
for (int i = 1; i < matcher.groupCount(); i++) {
String matchedString = matcher.group(i);
if (matchedString != null) {
System.out.println(i + "=" + matchedString + "=");
} else {
System.out.println(i + "_");
}
}
System.out.println();
}
private static Set<String> getNamedGroupCandidates(String regex) {
Set<String> namedGroups = new TreeSet<String>();
Matcher m = Pattern.compile("\(\?<([a-zA-Z][a-zA-Z0-9]*)>").matcher(regex);
while (m.find()) {
namedGroups.add(m.group(1));
}
return namedGroups;
}
}
}
There is a caveat to this implementation, though. It currently doesn't work with regex in Pattern.COMMENTS
mode.
但是,此实现有一个警告。它目前不适用于 regexPattern.COMMENTS
模式。
回答by nhahtdh
This is the second easy approach to the problem: we will call the non-public method namedGroups()
in Pattern class to obtain a Map<String, Integer>
that maps group names to the group numbers via Java Reflection API. The advantage of this approach is that we don't need a string that contains a match to the regex to find the exact named groups.
这是解决该问题的第二种简单方法:我们将调用namedGroups()
Pattern 类中的非公共方法,以Map<String, Integer>
通过Java 反射 API获取将组名映射到组号的 a 。这种方法的优点是我们不需要包含与正则表达式匹配的字符串来查找确切的命名组。
Personally, I think it is not much of an advantage, since it is useless to know the named groups of a regex where a match to the regex does not exist among the input strings.
就我个人而言,我认为这没什么好处,因为知道输入字符串中不存在与正则表达式匹配的正则表达式的命名组是没有用的。
However, please take note of the drawbacks:
但是,请注意以下缺点:
- This approach may not apply if the code is run in a system with security restrictions to deny any attempts to gain access to non-public methods (no modifier, protected and private methods).
- The code is only applicable to JRE from Oracle or OpenJDK.
- The code may also break in future releases, since we are calling a non-public method.
- There may also be performance hit from calling function via reflection. (In this case, the performance hit mainly comes from the reflection overhead, since there is not much going on in
namedGroups()
method). I do not know how the performance hit affects overall performance, so please do measurement on your system.
- 如果代码运行在具有安全限制以拒绝任何尝试访问非公共方法(无修饰符、受保护和私有方法)的系统中,则此方法可能不适用。
- 该代码仅适用于来自 Oracle 或 OpenJDK 的 JRE。
- 由于我们正在调用非公共方法,因此该代码在未来版本中也可能会中断。
- 通过反射调用函数也可能会影响性能。(在这种情况下,性能影响主要来自反射开销,因为方法中没有太多事情发生
namedGroups()
)。我不知道性能下降如何影响整体性能,因此请对您的系统进行测量。
import java.util.Collections;
import java.util.Map;
import java.util.Scanner;
import java.util.regex.Pattern;
import java.lang.reflect.Method;
import java.lang.reflect.InvocationTargetException;
class RegexTester {
public static void main(String args[]) {
Scanner scanner = new Scanner(System.in);
String regex = scanner.nextLine();
// String regex = "(?<group>[a-z]*)[trick(?<nothing>ha)]\Q(?<quoted>Q+E+)\E(.*)(?<Another6group>\w+)";
Pattern p = Pattern.compile(regex);
Map<String, Integer> namedGroups = null;
try {
namedGroups = getNamedGroups(p);
} catch (Exception e) {
// Just an example here. You need to handle the Exception properly
e.printStackTrace();
}
System.out.println(namedGroups);
}
@SuppressWarnings("unchecked")
private static Map<String, Integer> getNamedGroups(Pattern regex)
throws NoSuchMethodException, SecurityException,
IllegalAccessException, IllegalArgumentException,
InvocationTargetException {
Method namedGroupsMethod = Pattern.class.getDeclaredMethod("namedGroups");
namedGroupsMethod.setAccessible(true);
Map<String, Integer> namedGroups = null;
namedGroups = (Map<String, Integer>) namedGroupsMethod.invoke(regex);
if (namedGroups == null) {
throw new InternalError();
}
return Collections.unmodifiableMap(namedGroups);
}
}
回答by Stephan
You want to use the small name-regexplibrary. It is a thin wrapper around java.util.regex
with named capture groups support for Java 5 or 6 users.
您想使用小的name-regexp库。它是一个瘦包装器java.util.regex
,为 Java 5 或 6 用户提供命名捕获组支持。
Sample usage:
示例用法:
Pattern p = Pattern.compile("(?<user>.*)");
Matcher m = p.matcher("JohnDoe");
System.out.println(m.namedGroups()); // {user=JohnDoe}
Maven:
马文:
<dependency>
<groupId>com.github.tony19</groupId>
<artifactId>named-regexp</artifactId>
<version>0.2.3</version>
</dependency>
References:
参考:
回答by Michael
There is no way to do this with the standard API. You can use reflection to access these:
使用标准 API 无法做到这一点。您可以使用反射来访问这些:
final Field namedGroups = pattern.getClass().getDeclaredField("namedGroups");
namedGroups.setAccessible(true);
final Map<String, Integer> nameToGroupIndex = (Map<String, Integer>) namedGroups.get(pattern);
Use the key set of the map if you don't care about indexes.
如果您不关心索引,请使用映射的键集。