java 如何创建正则表达式匹配流?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28148483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 13:05:04  来源:igfitidea点击:

How do I create a Stream of regex matches?

javaregexjava-8java-stream

提问by Alfredo Diaz

I am trying to parse standard input and extract every string that matches with a specific pattern, count the number of occurrences of each match, and print the results alphabetically. This problem seems like a good match for the Streams API, but I can't find a concise way to create a stream of matches from a Matcher.

我正在尝试解析标准输入并提取与特定模式匹配的每个字符串,计算每个匹配项的出现次数,并按字母顺序打印结果。这个问题似乎很适合 Streams API,但我找不到从 Matcher 创建匹配流的简洁方法。

I worked around this problem by implementing an iterator over the matches and wrapping it into a Stream, but the result is not very readable. How can I create a stream of regex matches without introducing additional classes?

我通过在匹配项上实现迭代器并将其包装到 Stream 中解决了这个问题,但结果不是很易读。如何在不引入其他类的情况下创建正则表达式匹配流?

public class PatternCounter
{
    static private class MatcherIterator implements Iterator<String> {
        private final Matcher matcher;
        public MatcherIterator(Matcher matcher) {
            this.matcher = matcher;
        }
        public boolean hasNext() {
            return matcher.find();
        }
        public String next() {
            return matcher.group(0);
        }
    }

    static public void main(String[] args) throws Throwable {
        Pattern pattern = Pattern.compile("[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)");

        new TreeMap<String, Long>(new BufferedReader(new InputStreamReader(System.in))
            .lines().map(line -> {
                Matcher matcher = pattern.matcher(line);
                return StreamSupport.stream(
                        Spliterators.spliteratorUnknownSize(new MatcherIterator(matcher), Spliterator.ORDERED), false);
            }).reduce(Stream.empty(), Stream::concat).collect(groupingBy(o -> o, counting()))
        ).forEach((k, v) -> {
            System.out.printf("%s\t%s\n",k,v);
        });
    }
}

回答by Holger

Well, in Java?8, there is Pattern.splitAsStreamwhich will provide a stream of items split by a delimiterpattern but unfortunately no support method for getting a stream of matches.

好吧,在 Java?8 中,Pattern.splitAsStream将提供由分隔符模式分割的项目流,但不幸的是没有支持获取匹配流的方法。

If you are going to implement such a Stream, I recommend implementing Spliteratordirectly rather than implementing and wrapping an Iterator. You may be more familiar with Iteratorbut implementing a simple Spliteratoris straight-forward:

如果您要实现这样的Stream,我建议Spliterator直接实现而不是实现和包装Iterator. 您可能更熟悉,Iterator但实现一个简单的方法Spliterator是直截了当的:

final class MatchItr extends Spliterators.AbstractSpliterator<String> {
    private final Matcher matcher;
    MatchItr(Matcher m) {
        super(m.regionEnd()-m.regionStart(), ORDERED|NONNULL);
        matcher=m;
    }
    public boolean tryAdvance(Consumer<? super String> action) {
        if(!matcher.find()) return false;
        action.accept(matcher.group());
        return true;
    }
}

You may consider overriding forEachRemainingwith a straight-forward loop, though.

不过,您可以考虑forEachRemaining使用直接循环进行覆盖。



If I understand your attempt correctly, the solution should look more like:

如果我正确理解您的尝试,解决方案应该更像是:

Pattern pattern = Pattern.compile(
                 "[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)");

try(BufferedReader br=new BufferedReader(System.console().reader())) {

    br.lines()
      .flatMap(line -> StreamSupport.stream(new MatchItr(pattern.matcher(line)), false))
      .collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
      .forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}


Java?9 provides a method Stream<MatchResult> results()directly on the Matcher. But for finding matches within a stream, there's an even more convenient method on Scanner. With that, the implementation simplifies to

Java?9Stream<MatchResult> results()直接在Matcher. 但是为了在流中查找匹配项,在 上有一个更方便的方法Scanner。这样,实现就简化为

try(Scanner s = new Scanner(System.console().reader())) {
    s.findAll(pattern)
     .collect(Collectors.groupingBy(MatchResult::group,TreeMap::new,Collectors.counting()))
     .forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}

This answercontains a back-port of Scanner.findAllthat can be used with Java?8.

此答案包含Scanner.findAll可与 Java?8 一起使用的后向端口。

回答by dimo414

Going off of Holger's solution, we can support arbitrary Matcheroperations (such as getting the nth group) by having the user provide a Function<Matcher, String>operation. We can also hide the Spliteratoras an implementation detail, so that callers can just work with the Streamdirectly. As a rule of thumb StreamSupportshould be used by library code, rather than users.

脱离 Holger 的解决方案,我们可以通过让用户提供操作来支持任意Matcher操作(例如获取第n个组)Function<Matcher, String>。我们也可以隐藏Spliterator作为一个实现细节,这样调用者就可以Stream直接使用。经验法则StreamSupport应该由库代码使用,而不是用户。

public class MatcherStream {
  private MatcherStream() {}

  public static Stream<String> find(Pattern pattern, CharSequence input) {
    return findMatches(pattern, input).map(MatchResult::group);
  }

  public static Stream<MatchResult> findMatches(
      Pattern pattern, CharSequence input) {
    Matcher matcher = pattern.matcher(input);

    Spliterator<MatchResult> spliterator = new Spliterators.AbstractSpliterator<MatchResult>(
        Long.MAX_VALUE, Spliterator.ORDERED|Spliterator.NONNULL) {
      @Override
      public boolean tryAdvance(Consumer<? super MatchResult> action) {
        if(!matcher.find()) return false;
        action.accept(matcher.toMatchResult());
        return true;
      }};

    return StreamSupport.stream(spliterator, false);
  }
}

You can then use it like so:

然后你可以像这样使用它:

MatcherStream.find(Pattern.compile("\w+"), "foo bar baz").forEach(System.out::println);

Or for your specific task (borrowing again from Holger):

或者对于您的特定任务(再次从 Holger 借用):

try(BufferedReader br = new BufferedReader(System.console().reader())) {
  br.lines()
    .flatMap(line -> MatcherStream.find(pattern, line))
    .collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
    .forEach((k, v) -> System.out.printf("%s\t%s\n", k, v));
}

回答by gil.fernandes

If you want to use a Scannertogether with regular expressions using the findWithinHorizonmethod you could also convert a regular expression into a stream of strings. Here we use a stream builder which is very convenient to use during a conventional whileloop.

如果您想使用方法将 aScanner与正则表达式一起使用,findWithinHorizon您还可以将正则表达式转换为字符串流。这里我们使用了一个流构建器,它在常规while循环中使用起来非常方便。

Here is an example:

下面是一个例子:

private Stream<String> extractRulesFrom(String text, Pattern pattern, int group) {
    Stream.Builder<String> builder = Stream.builder();
    try(Scanner scanner = new Scanner(text)) {
        while (scanner.findWithinHorizon(pattern, 0) != null) {
            builder.accept(scanner.match().group(group));
        }
    }
    return builder.build();
}