Java中的通配符匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24337657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 11:37:25  来源:igfitidea点击:

Wildcard matching in Java

javaregexwildcard

提问by Johannes Schaub - litb

I'm writing a simple debugging program that takes as input simple strings that can contain stars to indicate a wildcard match-any

我正在编写一个简单的调试程序,该程序将包含星号的简单字符串作为输入,以指示通配符匹配任意

*.wav  // matches <anything>.wav
(*, a) // matches (<anything>, a)

I thought I would simply take that pattern, escape any regular expression special characters in it, then replace any \\*back to .*. And then use a regular expression matcher.

我以为我会简单地采用该模式,转义其中的任何正则表达式特殊字符,然后将任何替换\\*.*. 然后使用正则表达式匹配器。

But I can't find any Java function to escape a regular expression. The best match I could find is Pattern.quote, which however just puts \Qand \Eat the begin and end of the string.

但是我找不到任何 Java 函数来转义正则表达式。我能找到的最佳匹配是Pattern.quote,但它只是将\Q\E放在字符串的开头和结尾。

Is there anything in Java that allows you to simply do that wildcard matching without you having to implement the algorithm from scratch?

Java 中有什么东西可以让您简单地进行通配符匹配,而不必从头开始实现算法?

采纳答案by zx81

Using A Simple Regex

使用简单的正则表达式

One of this method's benefits is that we can easily add tokens besides *(see Adding Tokensat the bottom).

这种方法的好处之一是我们可以轻松地添加令牌*(请参阅底部的添加令牌)。

Search: [^*]+|(\*)

搜索: [^*]+|(\*)

  • The left side of the |matches any chars that are not a star
  • The right side captures all stars to Group 1
  • If Group 1 is empty: replace with \Q+ Match + E
  • If Group 1 is set: replace with .*
  • 左侧|匹配任何不是星号的字符
  • 右侧将所有星星捕获到第 1 组
  • 如果第 1 组为空:替换为\Q+ 匹配 +E
  • 如果设置了组 1:替换为 .*

Here is some working code (see the output of the online demo).

这是一些工作代码(参见在线演示的输出)。

Input: audio*2012*.wav

输入: audio*2012*.wav

Output: \Qaudio\E.*\Q2012\E.*\Q.wav\E

输出: \Qaudio\E.*\Q2012\E.*\Q.wav\E

String subject = "audio*2012*.wav";
Pattern regex = Pattern.compile("[^*]+|(\*)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
    if(m.group(1) != null) m.appendReplacement(b, ".*");
    else m.appendReplacement(b, "\\Q" + m.group(0) + "\\E");
}
m.appendTail(b);
String replaced = b.toString();
System.out.println(replaced);

Adding Tokens

添加令牌

Suppose we also want to convert the wildcard ?, which stands for a single character, by a dot. We just add a capture group to the regex, and exclude it from the matchall on the left:

假设我们还想将?代表单个字符的通配符 转换为一个点。我们只需在正则表达式中添加一个捕获组,并将其从左侧的 matchall 中排除:

Search: [^*?]+|(\*)|(\?)

搜索: [^*?]+|(\*)|(\?)

In the replace function we the add something like:

在替换函数中,我们添加如下内容:

else if(m.group(2) != null) m.appendReplacement(b, "."); 

回答by Bohemian

Just escape everything - no harm will come of it.

逃避一切——不会有任何伤害。

    String input = "*.wav";
    String regex = ("\Q" + input + "\E").replace("*", "\E.*\Q");
    System.out.println(regex); // \Q\E.*\Q.wav\E
    System.out.println("abcd.wav".matches(regex)); // true

Or you can use character classes:

或者您可以使用字符类:

    String input = "*.wav";
    String regex = input.replaceAll(".", "[
    String input = "*.wav";
    String regex = "\Q" + input.replace("*", "\E.*?\Q") + "\E";

    // regex = "\Q\E.*?\Q.wav\E"
]").replace("[*]", ".*"); System.out.println(regex); // .*[.][w][a][v] System.out.println("abcd.wav".matches(regex)); // true

It's easier to "escape" the characters by putting them in a character class, as almost all characters lose any special meaning when in a character class. Unless you're expecting weird file names, this will work.

将字符放在字符类中更容易“转义”字符,因为几乎所有字符在字符类中都会失去任何特殊含义。除非你期待奇怪的文件名,否则这会起作用。

回答by Matt Coubrough

You can also use the Quotation escape characters: \\Q and \\E- everything between them is treated as literal and not considered to be part of the regex to be evaluated. Thus this code should work:

您还可以使用引号转义字符:\\Q and \\E- 它们之间的所有内容都被视为文字,不被视为要评估的正则表达式的一部分。因此这段代码应该可以工作:

CharacterRunAutomaton characterRunAutomaton;
boolean matches;
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Walmart")));
matches = characterRunAutomaton.run("Walmart"); // true
matches = characterRunAutomaton.run("Wal*mart"); // false
matches = characterRunAutomaton.run("Wal\*mart"); // false
matches = characterRunAutomaton.run("Waldomart"); // false
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal*mart")));
matches = characterRunAutomaton.run("Walmart"); // true
matches = characterRunAutomaton.run("Wal*mart"); // true
matches = characterRunAutomaton.run("Wal\*mart"); // true
matches = characterRunAutomaton.run("Waldomart"); // true
characterRunAutomaton = new CharacterRunAutomaton(WildcardQuery.toAutomaton(new Term("", "Wal\*mart")));
matches = characterRunAutomaton.run("Walmart"); // false
matches = characterRunAutomaton.run("Wal*mart"); // true
matches = characterRunAutomaton.run("Wal\*mart"); // false
matches = characterRunAutomaton.run("Waldomart"); // false

Note that your * wildcard might also be best matched only against word characters using \w depending on how you want your wildcard to behave(?)

请注意,您的 * 通配符也可能仅与使用 \w 的单词字符进行最佳匹配,具体取决于您希望通配符的行为方式(?)

回答by Paul Hymanson

Lucene has classes that provide this capability, with additional support for backslash as an escape character. ?matches a single character, 1matches 0 or more characters, \escapes the following character. Supports Unicode code points. Supposed to be fast but I haven't tested.

Lucene 具有提供此功能的类,并额外支持反斜杠作为转义字符。?匹配单个字符,1匹配 0 个或多个字符,\转义后面的字符。支持 Unicode 代码点。应该很快,但我还没有测试过。

public String wildcardToRegex(String wildcardStr) {
    Pattern regex=Pattern.compile("[^*?\\]+|(\*)|(\?)|(\\)");
    Matcher m=regex.matcher(wildcardStr);
    StringBuffer sb=new StringBuffer();
    while (m.find()) {
        if(m.group(1) != null) m.appendReplacement(sb, ".*");
        else if(m.group(2) != null) m.appendReplacement(sb, ".");     
        else if(m.group(3) != null) m.appendReplacement(sb, "\\\\");
        else m.appendReplacement(sb, "\\Q" + m.group(0) + "\\E");
    }
    m.appendTail(sb);
    return sb.toString();
}

回答by J. Hanney

Regex While Accommodating A DOS/Windows Path

正则表达式同时适应 DOS/Windows 路径

Implementing the Quotation escape characters \Qand \Eis probably the best approach. However, since a backslash is typically used as a DOS/Windows file separator, a "\E" sequence within the path could effect the pairing of \Qand \E. While accounting for the *and ?wildcard tokens, this situation of the backslash could be addressed in this manner:

实施报价转义字符\Q,并\E可能是最好的办法。然而,由于一个反斜杠通常被用作一个DOS / Windows的文件分隔符,一个“ \E”路径中序列可能影响的配对\Q\E。在考虑通配符*?通配符时,可以通过以下方式解决反斜杠的这种情况:

Search: [^*?\\]+|(\*)|(\?)|(\\)

搜索: [^*?\\]+|(\*)|(\?)|(\\)

Two new lines would be added in the replace function of the "Using A Simple Regex" example to accommodate the new search pattern. The code would still be "Linux-friendly". As a method, it could be written like this:

将在“使用简单正则表达式”示例的替换功能中添加两行新行以适应新的搜索模式。代码仍然是“Linux 友好的”。作为一种方法,它可以这样写:

String s = "C:\Temp\Extra\audio??2012*.wav";
System.out.println("Input: "+s);
System.out.println("Output: "+wildcardToRegex(s));

Code to demonstrate the implementation of this method could be written like this:

演示此方法实现的代码可以这样编写:

Input: C:\Temp\Extra\audio??2012*.wav
Output: \QC:\E\\QTemp\E\\QExtra\E\\Qaudio\E..\Q2012\E.*\Q.wav\E

This would be the generated results:

这将是生成的结果:

##代码##

回答by Marek Gregor

There is small utility method in Apache Commons-IO library: org.apache.commons.io.FilenameUtils#wildcardMatch(), which you can use without intricacies of the regular expression.

Apache Commons-IO 库中有一个小的实用方法:org.apache.commons.io.FilenameUtils#wildcardMatch(),您可以使用它而无需复杂的正则表达式。

API documentation could be found in: https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,%20java.lang.String)

API 文档可以在以下位置找到:https: //commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/FilenameUtils.html#wildcardMatch(java.lang.String,% 20java.lang.String)