Java中的通配符匹配
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24337657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Wildcard matching in Java
提问by Johannes Schaub - litb
I'm writing a simple debugging program that takes as input simple strings that can contain stars to indicate a wildcard match-any
我正在编写一个简单的调试程序,该程序将包含星号的简单字符串作为输入,以指示通配符匹配任意
*.wav // matches <anything>.wav
(*, a) // matches (<anything>, a)
I thought I would simply take that pattern, escape any regular expression special characters in it, then replace any \\*back to .*. And then use a regular expression matcher.
我以为我会简单地采用该模式,转义其中的任何正则表达式特殊字符,然后将任何替换\\*回.*. 然后使用正则表达式匹配器。
But I can't find any Java function to escape a regular expression. The best match I could find is Pattern.quote, which however just puts \Qand \Eat the begin and end of the string.
但是我找不到任何 Java 函数来转义正则表达式。我能找到的最佳匹配是Pattern.quote,但它只是将\Q和\E放在字符串的开头和结尾。
Is there anything in Java that allows you to simply do that wildcard matching without you having to implement the algorithm from scratch?
Java 中有什么东西可以让您简单地进行通配符匹配,而不必从头开始实现算法?
采纳答案by zx81
Using A Simple Regex
使用简单的正则表达式
One of this method's benefits is that we can easily add tokens besides *(see Adding Tokensat the bottom).
这种方法的好处之一是我们可以轻松地添加令牌*(请参阅底部的添加令牌)。
Search: [^*]+|(\*)
搜索: [^*]+|(\*)
- The left side of the
|matches any chars that are not a star - The right side captures all stars to Group 1
- If Group 1 is empty: replace with
\Q+ Match +E - If Group 1 is set: replace with
.*
- 左侧
|匹配任何不是星号的字符 - 右侧将所有星星捕获到第 1 组
- 如果第 1 组为空:替换为
\Q+ 匹配 +E - 如果设置了组 1:替换为
.*
Here is some working code (see the output of the online demo).
这是一些工作代码(参见在线演示的输出)。
Input: audio*2012*.wav
输入: audio*2012*.wav
Output: \Qaudio\E.*\Q2012\E.*\Q.wav\E
输出: \Qaudio\E.*\Q2012\E.*\Q.wav\E
String subject = "audio*2012*.wav";
Pattern regex = Pattern.compile("[^*]+|(\*)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, ".*");
else m.appendReplacement(b, "\\Q" + m.group(0) + "\\E");
}
m.appendTail(b);
String replaced = b.toString();
System.out.println(replaced);
Adding Tokens
添加令牌
Suppose we also want to convert the wildcard ?, which stands for a single character, by a dot. We just add a capture group to the regex, and exclude it from the matchall on the left:
假设我们还想将?代表单个字符的通配符 转换为一个点。我们只需在正则表达式中添加一个捕获组,并将其从左侧的 matchall 中排除:
Search: [^*?]+|(\*)|(\?)
搜索: [^*?]+|(\*)|(\?)
In the replace function we the add something like:
在替换函数中,我们添加如下内容:
else if(m.group(2) != null) m.appendReplacement(b, ".");
回答by Bohemian
Just escape everything - no harm will come of it.
逃避一切——不会有任何伤害。
String input = "*.wav";
String regex = ("\Q" + input + "\E").replace("*", "\E.*\Q");
System.out.println(regex); // \Q\E.*\Q.wav\E
System.out.println("abcd.wav".matches(regex)); // true
Or you can use character classes:
或者您可以使用字符类:
String input = "*.wav";
String regex = input.replaceAll(".", "[ String input = "*.wav";
String regex = "\Q" + input.replace("*", "\E.*?\Q") + "\E";
// regex = "\Q\E.*?\Q.wav\E"
]").replace("[*]", ".*");
System.out.println(regex); // .*[.][w][a][v]
System.out.println("abcd.wav".matches(regex)); // true
It's easier to "escape" the characters by putting them in a character class, as almost all characters lose any special meaning when in a character class. Unless you're expecting weird file names, this will work.
将字符放在字符类中更容易“转义”字符,因为几乎所有字符在字符类中都会失去任何特殊含义。除非你期待奇怪的文件名,否则这会起作用。
回答by Matt Coubrough
You can also use the Quotation escape characters: \\Q and \\E- everything between them is treated as literal and not considered to be part of the regex to be evaluated. Thus this code should work:
您还可以使用引号转义字符:\\Q and \\E- 它们之间的所有内容都被视为文字,不被视为要评估的正则表达式的一部分。因此这段代码应该可以工作:
CharacterRunAutomaton characterRunAutomaton;
boolean matches;
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Walmart")));
matches = characterRunAutomaton.run("Walmart"); // true
matches = characterRunAutomaton.run("Wal*mart"); // false
matches = characterRunAutomaton.run("Wal\*mart"); // false
matches = characterRunAutomaton.run("Waldomart"); // false
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal*mart")));
matches = characterRunAutomaton.run("Walmart"); // true
matches = characterRunAutomaton.run("Wal*mart"); // true
matches = characterRunAutomaton.run("Wal\*mart"); // true
matches = characterRunAutomaton.run("Waldomart"); // true
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal\*mart")));
matches = characterRunAutomaton.run("Walmart"); // false
matches = characterRunAutomaton.run("Wal*mart"); // true
matches = characterRunAutomaton.run("Wal\*mart"); // false
matches = characterRunAutomaton.run("Waldomart"); // false
Note that your * wildcard might also be best matched only against word characters using \w depending on how you want your wildcard to behave(?)
请注意,您的 * 通配符也可能仅与使用 \w 的单词字符进行最佳匹配,具体取决于您希望通配符的行为方式(?)
回答by Paul Hymanson
Lucene has classes that provide this capability, with additional support for backslash as an escape character. ?matches a single character, 1matches 0 or more characters, \escapes the following character. Supports Unicode code points. Supposed to be fast but I haven't tested.
Lucene 具有提供此功能的类,并额外支持反斜杠作为转义字符。?匹配单个字符,1匹配 0 个或多个字符,\转义后面的字符。支持 Unicode 代码点。应该很快,但我还没有测试过。
public String wildcardToRegex(String wildcardStr) {
Pattern regex=Pattern.compile("[^*?\\]+|(\*)|(\?)|(\\)");
Matcher m=regex.matcher(wildcardStr);
StringBuffer sb=new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(sb, ".*");
else if(m.group(2) != null) m.appendReplacement(sb, ".");
else if(m.group(3) != null) m.appendReplacement(sb, "\\\\");
else m.appendReplacement(sb, "\\Q" + m.group(0) + "\\E");
}
m.appendTail(sb);
return sb.toString();
}
回答by J. Hanney
Regex While Accommodating A DOS/Windows Path
正则表达式同时适应 DOS/Windows 路径
Implementing the Quotation escape characters \Qand \Eis probably the best approach. However, since a backslash is typically used as a DOS/Windows file separator, a "\E" sequence within the path could effect the pairing of \Qand \E. While accounting for the *and ?wildcard tokens, this situation of the backslash could be addressed in this manner:
实施报价转义字符\Q,并\E可能是最好的办法。然而,由于一个反斜杠通常被用作一个DOS / Windows的文件分隔符,一个“ \E”路径中序列可能影响的配对\Q和\E。在考虑通配符*和?通配符时,可以通过以下方式解决反斜杠的这种情况:
Search: [^*?\\]+|(\*)|(\?)|(\\)
搜索: [^*?\\]+|(\*)|(\?)|(\\)
Two new lines would be added in the replace function of the "Using A Simple Regex" example to accommodate the new search pattern. The code would still be "Linux-friendly". As a method, it could be written like this:
将在“使用简单正则表达式”示例的替换功能中添加两行新行以适应新的搜索模式。代码仍然是“Linux 友好的”。作为一种方法,它可以这样写:
String s = "C:\Temp\Extra\audio??2012*.wav";
System.out.println("Input: "+s);
System.out.println("Output: "+wildcardToRegex(s));
Code to demonstrate the implementation of this method could be written like this:
演示此方法实现的代码可以这样编写:
Input: C:\Temp\Extra\audio??2012*.wav
Output: \QC:\E\\QTemp\E\\QExtra\E\\Qaudio\E..\Q2012\E.*\Q.wav\E
This would be the generated results:
这将是生成的结果:
##代码##回答by Marek Gregor
There is small utility method in Apache Commons-IO library: org.apache.commons.io.FilenameUtils#wildcardMatch(), which you can use without intricacies of the regular expression.
Apache Commons-IO 库中有一个小的实用方法:org.apache.commons.io.FilenameUtils#wildcardMatch(),您可以使用它而无需复杂的正则表达式。
API documentation could be found in: https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,%20java.lang.String)
API 文档可以在以下位置找到:https: //commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,% 20java.lang.String)

