“glob”类型模式是否有相当于 java.util.regex 的东西?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1247772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 07:26:16  来源:igfitidea点击:

Is there an equivalent of java.util.regex for "glob" type patterns?

javaglob

提问by Paul Tomblin

Is there a standard (preferably Apache Commons or similarly non-viral) library for doing "glob" type matches in Java? When I had to do similar in Perl once, I just changed all the "." to "\.", the "*" to ".*" and the "?" to "." and that sort of thing, but I'm wondering if somebody has done the work for me.

是否有标准(最好是 Apache Commons 或类似的非病毒)库用于在 Java 中进行“glob”类型匹配?当我不得不在 Perl 中做类似的事情时,我只是将所有的“ .”改为“ \.”,“ *”改为“ .*”,“ ?”改为“ .”之类的,但我想知道是否有人做过为我工作。

Similar question: Create regex from glob expression

类似问题:从 glob 表达式创建正则表达式

采纳答案by Dave Ray

There's nothing built-in, but it's pretty simple to convert something glob-like to a regex:

没有内置任何东西,但是将类似 glob 的东西转换为正则表达式非常简单:

public static String createRegexFromGlob(String glob)
{
    String out = "^";
    for(int i = 0; i < glob.length(); ++i)
    {
        final char c = glob.charAt(i);
        switch(c)
        {
        case '*': out += ".*"; break;
        case '?': out += '.'; break;
        case '.': out += "\."; break;
        case '\': out += "\\"; break;
        default: out += c;
        }
    }
    out += '$';
    return out;
}

this works for me, but I'm not sure if it covers the glob "standard", if there is one :)

这对我有用,但我不确定它是否涵盖了全局“标准”,如果有的话:)

Update by Paul Tomblin: I found a perl program that does glob conversion, and adapting it to Java I end up with:

Paul Tomblin 的更新:我找到了一个执行 glob 转换的 perl 程序,并将其调整为 Java 我最终得到:

    private String convertGlobToRegEx(String line)
    {
    LOG.info("got line [" + line + "]");
    line = line.trim();
    int strLen = line.length();
    StringBuilder sb = new StringBuilder(strLen);
    // Remove beginning and ending * globs because they're useless
    if (line.startsWith("*"))
    {
        line = line.substring(1);
        strLen--;
    }
    if (line.endsWith("*"))
    {
        line = line.substring(0, strLen-1);
        strLen--;
    }
    boolean escaping = false;
    int inCurlies = 0;
    for (char currentChar : line.toCharArray())
    {
        switch (currentChar)
        {
        case '*':
            if (escaping)
                sb.append("\*");
            else
                sb.append(".*");
            escaping = false;
            break;
        case '?':
            if (escaping)
                sb.append("\?");
            else
                sb.append('.');
            escaping = false;
            break;
        case '.':
        case '(':
        case ')':
        case '+':
        case '|':
        case '^':
        case '$':
        case '@':
        case '%':
            sb.append('\');
            sb.append(currentChar);
            escaping = false;
            break;
        case '\':
            if (escaping)
            {
                sb.append("\\");
                escaping = false;
            }
            else
                escaping = true;
            break;
        case '{':
            if (escaping)
            {
                sb.append("\{");
            }
            else
            {
                sb.append('(');
                inCurlies++;
            }
            escaping = false;
            break;
        case '}':
            if (inCurlies > 0 && !escaping)
            {
                sb.append(')');
                inCurlies--;
            }
            else if (escaping)
                sb.append("\}");
            else
                sb.append("}");
            escaping = false;
            break;
        case ',':
            if (inCurlies > 0 && !escaping)
            {
                sb.append('|');
            }
            else if (escaping)
                sb.append("\,");
            else
                sb.append(",");
            break;
        default:
            escaping = false;
            sb.append(currentChar);
        }
    }
    return sb.toString();
}

I'm editing into this answer rather than making my own because this answer put me on the right track.

我正在编辑这个答案,而不是自己制作,因为这个答案让我走上了正确的轨道。

回答by Greg Mattes

I don't know about a "standard" implementation, but I know of a sourceforge project released under the BSD license that implemented glob matching for files. It's implemented in one file, maybe you can adapt it for your requirements.

我不知道“标准”实现,但我知道在 BSD 许可证下发布的 sourceforge 项目实现了文件的全局匹配。它在一个文件中实现,也许您可​​以根据您的要求进行调整。

回答by finnw

Globbing is also planned forimplemented in Java 7.

Globbing也计划在 Java 7 中实现。

See FileSystem.getPathMatcher(String)and the "Finding Files" tutorial.

FileSystem.getPathMatcher(String)了“查找文件”教程

回答by finnw

By the way, it seems as if you did it the hard way in Perl

顺便说一句,好像你在 Perl 中很难做到

This does the trick in Perl:

这在 Perl 中很有效:

my @files = glob("*.html")
# Or, if you prefer:
my @files = <*.html> 

回答by Tony Edgecombe

This is a simple Glob implementation which handles * and ? in the pattern

这是一个简单的 Glob 实现,它处理 * 和 ? 在模式中

public class GlobMatch {
    private String text;
    private String pattern;

    public boolean match(String text, String pattern) {
        this.text = text;
        this.pattern = pattern;

        return matchCharacter(0, 0);
    }

    private boolean matchCharacter(int patternIndex, int textIndex) {
        if (patternIndex >= pattern.length()) {
            return false;
        }

        switch(pattern.charAt(patternIndex)) {
            case '?':
                // Match any character
                if (textIndex >= text.length()) {
                    return false;
                }
                break;

            case '*':
                // * at the end of the pattern will match anything
                if (patternIndex + 1 >= pattern.length() || textIndex >= text.length()) {
                    return true;
                }

                // Probe forward to see if we can get a match
                while (textIndex < text.length()) {
                    if (matchCharacter(patternIndex + 1, textIndex)) {
                        return true;
                    }
                    textIndex++;
                }

                return false;

            default:
                if (textIndex >= text.length()) {
                    return false;
                }

                String textChar = text.substring(textIndex, textIndex + 1);
                String patternChar = pattern.substring(patternIndex, patternIndex + 1);

                // Note the match is case insensitive
                if (textChar.compareToIgnoreCase(patternChar) != 0) {
                    return false;
                }
        }

        // End of pattern and text?
        if (patternIndex + 1 >= pattern.length() && textIndex + 1 >= text.length()) {
            return true;
        }

        // Go on to match the next character in the pattern
        return matchCharacter(patternIndex + 1, textIndex + 1);
    }
}

回答by bobah

Long ago I was doing a massive glob-driven text filtering so I've written a small piece of code (15 lines of code, no dependencies beyond JDK). It handles only '*' (was sufficient for me), but can be easily extended for '?'. It is several times faster than pre-compiled regexp, does not require any pre-compilation (essentially it is a string-vs-string comparison every time the pattern is matched).

很久以前,我正在做一个大规模的 glob 驱动的文本过滤,所以我写了一小段代码(15 行代码,没有超出 JDK 的依赖项)。它只处理 '*'(对我来说已经足够了),但可以很容易地扩展为 '?'。它比预编译的正则表达式快几倍,不需要任何预编译(本质上它是每次模式匹配时的字符串与字符串比较)。

Code:

代码:

  public static boolean miniglob(String[] pattern, String line) {
    if (pattern.length == 0) return line.isEmpty();
    else if (pattern.length == 1) return line.equals(pattern[0]);
    else {
      if (!line.startsWith(pattern[0])) return false;
      int idx = pattern[0].length();
      for (int i = 1; i < pattern.length - 1; ++i) {
        String patternTok = pattern[i];
        int nextIdx = line.indexOf(patternTok, idx);
        if (nextIdx < 0) return false;
        else idx = nextIdx + patternTok.length();
      }
      if (!line.endsWith(pattern[pattern.length - 1])) return false;
      return true;
    }
  }

Usage:

用法:

  public static void main(String[] args) {
    BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
    try {
      // read from stdin space separated text and pattern
      for (String input = in.readLine(); input != null; input = in.readLine()) {
        String[] tokens = input.split(" ");
        String line = tokens[0];
        String[] pattern = tokens[1].split("\*+", -1 /* want empty trailing token if any */);

        // check matcher performance
        long tm0 = System.currentTimeMillis();
        for (int i = 0; i < 1000000; ++i) {
          miniglob(pattern, line);
        }
        long tm1 = System.currentTimeMillis();
        System.out.println("miniglob took " + (tm1-tm0) + " ms");

        // check regexp performance
        Pattern reptn = Pattern.compile(tokens[1].replace("*", ".*"));
        Matcher mtchr = reptn.matcher(line);
        tm0 = System.currentTimeMillis();
        for (int i = 0; i < 1000000; ++i) {
          mtchr.matches();
        }
        tm1 = System.currentTimeMillis();
        System.out.println("regexp took " + (tm1-tm0) + " ms");

        // check if miniglob worked correctly
        if (miniglob(pattern, line)) {
          System.out.println("+ >" + line);
        }
        else {
          System.out.println("- >" + line);
        }
      }
    } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }

Copy/paste from here

这里复制/粘贴

回答by Vincent Robert

I recently had to do it and used \Qand \Eto escape the glob pattern:

我最近不得不这样做并使用\Q\E逃避 glob 模式:

private static Pattern getPatternFromGlob(String glob) {
  return Pattern.compile(
    "^" + Pattern.quote(glob)
            .replace("*", "\E.*\Q")
            .replace("?", "\E.\Q") 
    + "$");
}

回答by mihi

Similar to Tony Edgecombe's answer, here is a short and simple globber that supports *and ?without using regex, if anybody needs one.

Tony Edgecombe回答类似,如果有人需要,这里有一个简短而简单的 globber 支持*并且?不使用正则表达式。

public static boolean matches(String text, String glob) {
    String rest = null;
    int pos = glob.indexOf('*');
    if (pos != -1) {
        rest = glob.substring(pos + 1);
        glob = glob.substring(0, pos);
    }

    if (glob.length() > text.length())
        return false;

    // handle the part up to the first *
    for (int i = 0; i < glob.length(); i++)
        if (glob.charAt(i) != '?' 
                && !glob.substring(i, i + 1).equalsIgnoreCase(text.substring(i, i + 1)))
            return false;

    // recurse for the part after the first *, if any
    if (rest == null) {
        return glob.length() == text.length();
    } else {
        for (int i = glob.length(); i <= text.length(); i++) {
            if (matches(text.substring(i), rest))
                return true;
        }
        return false;
    }
}

回答by Adam Gent

There are couple of libraries that do Glob-like pattern matching that are more modern than the ones listed:

有几个库可以进行类似 Glob 的模式匹配,它们比列出的更现代:

Theres Ants Directory ScannerAnd Springs AntPathMatcher

有 Ants Directory Scanner和 Springs AntPathMatcher

I recommend both over the other solutions since Ant Style Globbing has pretty much become the standard glob syntax in the Java world(Hudson, Spring, Ant and I think Maven).

我推荐这两种解决方案,因为Ant Style Globbing 几乎已成为 Java 世界中的标准 glob 语法(Hudson、Spring、Ant 和我认为是 Maven)。

回答by Neil Traft

Thanks to everyone here for their contributions. I wrote a more comprehensive conversion than any of the previous answers:

感谢这里的每一个人的贡献。我写了一个比之前任何一个答案都更全面的转换:

/**
 * Converts a standard POSIX Shell globbing pattern into a regular expression
 * pattern. The result can be used with the standard {@link java.util.regex} API to
 * recognize strings which match the glob pattern.
 * <p/>
 * See also, the POSIX Shell language:
 * http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13_01
 * 
 * @param pattern A glob pattern.
 * @return A regex pattern to recognize the given glob pattern.
 */
public static final String convertGlobToRegex(String pattern) {
    StringBuilder sb = new StringBuilder(pattern.length());
    int inGroup = 0;
    int inClass = 0;
    int firstIndexInClass = -1;
    char[] arr = pattern.toCharArray();
    for (int i = 0; i < arr.length; i++) {
        char ch = arr[i];
        switch (ch) {
            case '\':
                if (++i >= arr.length) {
                    sb.append('\');
                } else {
                    char next = arr[i];
                    switch (next) {
                        case ',':
                            // escape not needed
                            break;
                        case 'Q':
                        case 'E':
                            // extra escape needed
                            sb.append('\');
                        default:
                            sb.append('\');
                    }
                    sb.append(next);
                }
                break;
            case '*':
                if (inClass == 0)
                    sb.append(".*");
                else
                    sb.append('*');
                break;
            case '?':
                if (inClass == 0)
                    sb.append('.');
                else
                    sb.append('?');
                break;
            case '[':
                inClass++;
                firstIndexInClass = i+1;
                sb.append('[');
                break;
            case ']':
                inClass--;
                sb.append(']');
                break;
            case '.':
            case '(':
            case ')':
            case '+':
            case '|':
            case '^':
            case '$':
            case '@':
            case '%':
                if (inClass == 0 || (firstIndexInClass == i && ch == '^'))
                    sb.append('\');
                sb.append(ch);
                break;
            case '!':
                if (firstIndexInClass == i)
                    sb.append('^');
                else
                    sb.append('!');
                break;
            case '{':
                inGroup++;
                sb.append('(');
                break;
            case '}':
                inGroup--;
                sb.append(')');
                break;
            case ',':
                if (inGroup > 0)
                    sb.append('|');
                else
                    sb.append(',');
                break;
            default:
                sb.append(ch);
        }
    }
    return sb.toString();
}

And the unit tests to prove it works:

并通过单元测试来证明它有效:

/**
 * @author Neil Traft
 */
public class StringUtils_ConvertGlobToRegex_Test {

    @Test
    public void star_becomes_dot_star() throws Exception {
        assertEquals("gl.*b", StringUtils.convertGlobToRegex("gl*b"));
    }

    @Test
    public void escaped_star_is_unchanged() throws Exception {
        assertEquals("gl\*b", StringUtils.convertGlobToRegex("gl\*b"));
    }

    @Test
    public void question_mark_becomes_dot() throws Exception {
        assertEquals("gl.b", StringUtils.convertGlobToRegex("gl?b"));
    }

    @Test
    public void escaped_question_mark_is_unchanged() throws Exception {
        assertEquals("gl\?b", StringUtils.convertGlobToRegex("gl\?b"));
    }

    @Test
    public void character_classes_dont_need_conversion() throws Exception {
        assertEquals("gl[-o]b", StringUtils.convertGlobToRegex("gl[-o]b"));
    }

    @Test
    public void escaped_classes_are_unchanged() throws Exception {
        assertEquals("gl\[-o\]b", StringUtils.convertGlobToRegex("gl\[-o\]b"));
    }

    @Test
    public void negation_in_character_classes() throws Exception {
        assertEquals("gl[^a-n!p-z]b", StringUtils.convertGlobToRegex("gl[!a-n!p-z]b"));
    }

    @Test
    public void nested_negation_in_character_classes() throws Exception {
        assertEquals("gl[[^a-n]!p-z]b", StringUtils.convertGlobToRegex("gl[[!a-n]!p-z]b"));
    }

    @Test
    public void escape_carat_if_it_is_the_first_char_in_a_character_class() throws Exception {
        assertEquals("gl[\^o]b", StringUtils.convertGlobToRegex("gl[^o]b"));
    }

    @Test
    public void metachars_are_escaped() throws Exception {
        assertEquals("gl..*\.\(\)\+\|\^\$\@\%b", StringUtils.convertGlobToRegex("gl?*.()+|^$@%b"));
    }

    @Test
    public void metachars_in_character_classes_dont_need_escaping() throws Exception {
        assertEquals("gl[?*.()+|^$@%]b", StringUtils.convertGlobToRegex("gl[?*.()+|^$@%]b"));
    }

    @Test
    public void escaped_backslash_is_unchanged() throws Exception {
        assertEquals("gl\\b", StringUtils.convertGlobToRegex("gl\\b"));
    }

    @Test
    public void slashQ_and_slashE_are_escaped() throws Exception {
        assertEquals("\\Qglob\\E", StringUtils.convertGlobToRegex("\Qglob\E"));
    }

    @Test
    public void braces_are_turned_into_groups() throws Exception {
        assertEquals("(glob|regex)", StringUtils.convertGlobToRegex("{glob,regex}"));
    }

    @Test
    public void escaped_braces_are_unchanged() throws Exception {
        assertEquals("\{glob\}", StringUtils.convertGlobToRegex("\{glob\}"));
    }

    @Test
    public void commas_dont_need_escaping() throws Exception {
        assertEquals("(glob,regex),", StringUtils.convertGlobToRegex("{glob\,regex},"));
    }

}