Java中的通配符匹配
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24337657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Wildcard matching in Java
提问by Johannes Schaub - litb
I'm writing a simple debugging program that takes as input simple strings that can contain stars to indicate a wildcard match-any
我正在编写一个简单的调试程序,该程序将包含星号的简单字符串作为输入,以指示通配符匹配任意
*.wav // matches <anything>.wav
(*, a) // matches (<anything>, a)
I thought I would simply take that pattern, escape any regular expression special characters in it, then replace any \\*
back to .*
. And then use a regular expression matcher.
我以为我会简单地采用该模式,转义其中的任何正则表达式特殊字符,然后将任何替换\\*
回.*
. 然后使用正则表达式匹配器。
But I can't find any Java function to escape a regular expression. The best match I could find is Pattern.quote
, which however just puts \Q
and \E
at the begin and end of the string.
但是我找不到任何 Java 函数来转义正则表达式。我能找到的最佳匹配是Pattern.quote
,但它只是将\Q
和\E
放在字符串的开头和结尾。
Is there anything in Java that allows you to simply do that wildcard matching without you having to implement the algorithm from scratch?
Java 中有什么东西可以让您简单地进行通配符匹配,而不必从头开始实现算法?
采纳答案by zx81
Using A Simple Regex
使用简单的正则表达式
One of this method's benefits is that we can easily add tokens besides *
(see Adding Tokensat the bottom).
这种方法的好处之一是我们可以轻松地添加令牌*
(请参阅底部的添加令牌)。
Search: [^*]+|(\*)
搜索: [^*]+|(\*)
- The left side of the
|
matches any chars that are not a star - The right side captures all stars to Group 1
- If Group 1 is empty: replace with
\Q
+ Match +E
- If Group 1 is set: replace with
.*
- 左侧
|
匹配任何不是星号的字符 - 右侧将所有星星捕获到第 1 组
- 如果第 1 组为空:替换为
\Q
+ 匹配 +E
- 如果设置了组 1:替换为
.*
Here is some working code (see the output of the online demo).
这是一些工作代码(参见在线演示的输出)。
Input: audio*2012*.wav
输入: audio*2012*.wav
Output: \Qaudio\E.*\Q2012\E.*\Q.wav\E
输出: \Qaudio\E.*\Q2012\E.*\Q.wav\E
String subject = "audio*2012*.wav";
Pattern regex = Pattern.compile("[^*]+|(\*)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, ".*");
else m.appendReplacement(b, "\\Q" + m.group(0) + "\\E");
}
m.appendTail(b);
String replaced = b.toString();
System.out.println(replaced);
Adding Tokens
添加令牌
Suppose we also want to convert the wildcard ?
, which stands for a single character, by a dot. We just add a capture group to the regex, and exclude it from the matchall on the left:
假设我们还想将?
代表单个字符的通配符 转换为一个点。我们只需在正则表达式中添加一个捕获组,并将其从左侧的 matchall 中排除:
Search: [^*?]+|(\*)|(\?)
搜索: [^*?]+|(\*)|(\?)
In the replace function we the add something like:
在替换函数中,我们添加如下内容:
else if(m.group(2) != null) m.appendReplacement(b, ".");
回答by Bohemian
Just escape everything - no harm will come of it.
逃避一切——不会有任何伤害。
String input = "*.wav";
String regex = ("\Q" + input + "\E").replace("*", "\E.*\Q");
System.out.println(regex); // \Q\E.*\Q.wav\E
System.out.println("abcd.wav".matches(regex)); // true
Or you can use character classes:
或者您可以使用字符类:
String input = "*.wav";
String regex = input.replaceAll(".", "[ String input = "*.wav";
String regex = "\Q" + input.replace("*", "\E.*?\Q") + "\E";
// regex = "\Q\E.*?\Q.wav\E"
]").replace("[*]", ".*");
System.out.println(regex); // .*[.][w][a][v]
System.out.println("abcd.wav".matches(regex)); // true
It's easier to "escape" the characters by putting them in a character class, as almost all characters lose any special meaning when in a character class. Unless you're expecting weird file names, this will work.
将字符放在字符类中更容易“转义”字符,因为几乎所有字符在字符类中都会失去任何特殊含义。除非你期待奇怪的文件名,否则这会起作用。
回答by Matt Coubrough
You can also use the Quotation escape characters: \\Q and \\E
- everything between them is treated as literal and not considered to be part of the regex to be evaluated. Thus this code should work:
您还可以使用引号转义字符:\\Q and \\E
- 它们之间的所有内容都被视为文字,不被视为要评估的正则表达式的一部分。因此这段代码应该可以工作:
CharacterRunAutomaton characterRunAutomaton;
boolean matches;
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Walmart")));
matches = characterRunAutomaton.run("Walmart"); // true
matches = characterRunAutomaton.run("Wal*mart"); // false
matches = characterRunAutomaton.run("Wal\*mart"); // false
matches = characterRunAutomaton.run("Waldomart"); // false
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal*mart")));
matches = characterRunAutomaton.run("Walmart"); // true
matches = characterRunAutomaton.run("Wal*mart"); // true
matches = characterRunAutomaton.run("Wal\*mart"); // true
matches = characterRunAutomaton.run("Waldomart"); // true
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal\*mart")));
matches = characterRunAutomaton.run("Walmart"); // false
matches = characterRunAutomaton.run("Wal*mart"); // true
matches = characterRunAutomaton.run("Wal\*mart"); // false
matches = characterRunAutomaton.run("Waldomart"); // false
Note that your * wildcard might also be best matched only against word characters using \w depending on how you want your wildcard to behave(?)
请注意,您的 * 通配符也可能仅与使用 \w 的单词字符进行最佳匹配,具体取决于您希望通配符的行为方式(?)
回答by Paul Hymanson
Lucene has classes that provide this capability, with additional support for backslash as an escape character. ?
matches a single character, 1
matches 0 or more characters, \
escapes the following character. Supports Unicode code points. Supposed to be fast but I haven't tested.
Lucene 具有提供此功能的类,并额外支持反斜杠作为转义字符。?
匹配单个字符,1
匹配 0 个或多个字符,\
转义后面的字符。支持 Unicode 代码点。应该很快,但我还没有测试过。
public String wildcardToRegex(String wildcardStr) {
Pattern regex=Pattern.compile("[^*?\\]+|(\*)|(\?)|(\\)");
Matcher m=regex.matcher(wildcardStr);
StringBuffer sb=new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(sb, ".*");
else if(m.group(2) != null) m.appendReplacement(sb, ".");
else if(m.group(3) != null) m.appendReplacement(sb, "\\\\");
else m.appendReplacement(sb, "\\Q" + m.group(0) + "\\E");
}
m.appendTail(sb);
return sb.toString();
}
回答by J. Hanney
Regex While Accommodating A DOS/Windows Path
正则表达式同时适应 DOS/Windows 路径
Implementing the Quotation escape characters \Q
and \E
is probably the best approach. However, since a backslash is typically used as a DOS/Windows file separator, a "\E
" sequence within the path could effect the pairing of \Q
and \E
. While accounting for the *
and ?
wildcard tokens, this situation of the backslash could be addressed in this manner:
实施报价转义字符\Q
,并\E
可能是最好的办法。然而,由于一个反斜杠通常被用作一个DOS / Windows的文件分隔符,一个“ \E
”路径中序列可能影响的配对\Q
和\E
。在考虑通配符*
和?
通配符时,可以通过以下方式解决反斜杠的这种情况:
Search: [^*?\\]+|(\*)|(\?)|(\\)
搜索: [^*?\\]+|(\*)|(\?)|(\\)
Two new lines would be added in the replace function of the "Using A Simple Regex" example to accommodate the new search pattern. The code would still be "Linux-friendly". As a method, it could be written like this:
将在“使用简单正则表达式”示例的替换功能中添加两行新行以适应新的搜索模式。代码仍然是“Linux 友好的”。作为一种方法,它可以这样写:
String s = "C:\Temp\Extra\audio??2012*.wav";
System.out.println("Input: "+s);
System.out.println("Output: "+wildcardToRegex(s));
Code to demonstrate the implementation of this method could be written like this:
演示此方法实现的代码可以这样编写:
Input: C:\Temp\Extra\audio??2012*.wav
Output: \QC:\E\\QTemp\E\\QExtra\E\\Qaudio\E..\Q2012\E.*\Q.wav\E
This would be the generated results:
这将是生成的结果:
##代码##回答by Marek Gregor
There is small utility method in Apache Commons-IO library: org.apache.commons.io.FilenameUtils#wildcardMatch(), which you can use without intricacies of the regular expression.
Apache Commons-IO 库中有一个小的实用方法:org.apache.commons.io.FilenameUtils#wildcardMatch(),您可以使用它而无需复杂的正则表达式。
API documentation could be found in: https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,%20java.lang.String)
API 文档可以在以下位置找到:https: //commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,% 20java.lang.String)