java 如何创建正则表达式匹配流?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28148483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I create a Stream of regex matches?
提问by Alfredo Diaz
I am trying to parse standard input and extract every string that matches with a specific pattern, count the number of occurrences of each match, and print the results alphabetically. This problem seems like a good match for the Streams API, but I can't find a concise way to create a stream of matches from a Matcher.
我正在尝试解析标准输入并提取与特定模式匹配的每个字符串,计算每个匹配项的出现次数,并按字母顺序打印结果。这个问题似乎很适合 Streams API,但我找不到从 Matcher 创建匹配流的简洁方法。
I worked around this problem by implementing an iterator over the matches and wrapping it into a Stream, but the result is not very readable. How can I create a stream of regex matches without introducing additional classes?
我通过在匹配项上实现迭代器并将其包装到 Stream 中解决了这个问题,但结果不是很易读。如何在不引入其他类的情况下创建正则表达式匹配流?
public class PatternCounter
{
static private class MatcherIterator implements Iterator<String> {
private final Matcher matcher;
public MatcherIterator(Matcher matcher) {
this.matcher = matcher;
}
public boolean hasNext() {
return matcher.find();
}
public String next() {
return matcher.group(0);
}
}
static public void main(String[] args) throws Throwable {
Pattern pattern = Pattern.compile("[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)");
new TreeMap<String, Long>(new BufferedReader(new InputStreamReader(System.in))
.lines().map(line -> {
Matcher matcher = pattern.matcher(line);
return StreamSupport.stream(
Spliterators.spliteratorUnknownSize(new MatcherIterator(matcher), Spliterator.ORDERED), false);
}).reduce(Stream.empty(), Stream::concat).collect(groupingBy(o -> o, counting()))
).forEach((k, v) -> {
System.out.printf("%s\t%s\n",k,v);
});
}
}
回答by Holger
Well, in Java?8, there is Pattern.splitAsStream
which will provide a stream of items split by a delimiterpattern but unfortunately no support method for getting a stream of matches.
好吧,在 Java?8 中,Pattern.splitAsStream
将提供由分隔符模式分割的项目流,但不幸的是没有支持获取匹配流的方法。
If you are going to implement such a Stream
, I recommend implementing Spliterator
directly rather than implementing and wrapping an Iterator
. You may be more familiar with Iterator
but implementing a simple Spliterator
is straight-forward:
如果您要实现这样的Stream
,我建议Spliterator
直接实现而不是实现和包装Iterator
. 您可能更熟悉,Iterator
但实现一个简单的方法Spliterator
是直截了当的:
final class MatchItr extends Spliterators.AbstractSpliterator<String> {
private final Matcher matcher;
MatchItr(Matcher m) {
super(m.regionEnd()-m.regionStart(), ORDERED|NONNULL);
matcher=m;
}
public boolean tryAdvance(Consumer<? super String> action) {
if(!matcher.find()) return false;
action.accept(matcher.group());
return true;
}
}
You may consider overriding forEachRemaining
with a straight-forward loop, though.
不过,您可以考虑forEachRemaining
使用直接循环进行覆盖。
If I understand your attempt correctly, the solution should look more like:
如果我正确理解您的尝试,解决方案应该更像是:
Pattern pattern = Pattern.compile(
"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)");
try(BufferedReader br=new BufferedReader(System.console().reader())) {
br.lines()
.flatMap(line -> StreamSupport.stream(new MatchItr(pattern.matcher(line)), false))
.collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
.forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}
Java?9 provides a method Stream<MatchResult> results()
directly on the Matcher
. But for finding matches within a stream, there's an even more convenient method on Scanner
. With that, the implementation simplifies to
Java?9Stream<MatchResult> results()
直接在Matcher
. 但是为了在流中查找匹配项,在 上有一个更方便的方法Scanner
。这样,实现就简化为
try(Scanner s = new Scanner(System.console().reader())) {
s.findAll(pattern)
.collect(Collectors.groupingBy(MatchResult::group,TreeMap::new,Collectors.counting()))
.forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}
This answercontains a back-port of Scanner.findAll
that can be used with Java?8.
此答案包含Scanner.findAll
可与 Java?8 一起使用的后向端口。
回答by dimo414
Going off of Holger's solution, we can support arbitrary Matcher
operations (such as getting the nth group) by having the user provide a Function<Matcher, String>
operation. We can also hide the Spliterator
as an implementation detail, so that callers can just work with the Stream
directly. As a rule of thumb StreamSupport
should be used by library code, rather than users.
脱离 Holger 的解决方案,我们可以通过让用户提供操作来支持任意Matcher
操作(例如获取第n个组)Function<Matcher, String>
。我们也可以隐藏Spliterator
作为一个实现细节,这样调用者就可以Stream
直接使用。经验法则StreamSupport
应该由库代码使用,而不是用户。
public class MatcherStream {
private MatcherStream() {}
public static Stream<String> find(Pattern pattern, CharSequence input) {
return findMatches(pattern, input).map(MatchResult::group);
}
public static Stream<MatchResult> findMatches(
Pattern pattern, CharSequence input) {
Matcher matcher = pattern.matcher(input);
Spliterator<MatchResult> spliterator = new Spliterators.AbstractSpliterator<MatchResult>(
Long.MAX_VALUE, Spliterator.ORDERED|Spliterator.NONNULL) {
@Override
public boolean tryAdvance(Consumer<? super MatchResult> action) {
if(!matcher.find()) return false;
action.accept(matcher.toMatchResult());
return true;
}};
return StreamSupport.stream(spliterator, false);
}
}
You can then use it like so:
然后你可以像这样使用它:
MatcherStream.find(Pattern.compile("\w+"), "foo bar baz").forEach(System.out::println);
Or for your specific task (borrowing again from Holger):
或者对于您的特定任务(再次从 Holger 借用):
try(BufferedReader br = new BufferedReader(System.console().reader())) {
br.lines()
.flatMap(line -> MatcherStream.find(pattern, line))
.collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
.forEach((k, v) -> System.out.printf("%s\t%s\n", k, v));
}
回答by gil.fernandes
If you want to use a Scanner
together with regular expressions using the findWithinHorizon
method you could also convert a regular expression into a stream of strings.
Here we use a stream builder which is very convenient to use during a conventional while
loop.
如果您想使用方法将 aScanner
与正则表达式一起使用,findWithinHorizon
您还可以将正则表达式转换为字符串流。这里我们使用了一个流构建器,它在常规while
循环中使用起来非常方便。
Here is an example:
下面是一个例子:
private Stream<String> extractRulesFrom(String text, Pattern pattern, int group) {
Stream.Builder<String> builder = Stream.builder();
try(Scanner scanner = new Scanner(text)) {
while (scanner.findWithinHorizon(pattern, 0) != null) {
builder.accept(scanner.match().group(group));
}
}
return builder.build();
}