Java 如何将字符串拆分为字符串流?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40932813/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 23:01:04  来源:igfitidea点击:

How to split a String into a Stream of Strings?

javaregexsplitjava-stream

提问by slartidan

What is the bestmethod of splitting a String into a Stream?

将字符串拆分为流的最佳方法是什么?

I saw these variations:

我看到了这些变化:

  1. Arrays.stream("b,l,a".split(","))
  2. Stream.of("b,l,a".split(","))
  3. Pattern.compile(",").splitAsStream("b,l,a")
  1. Arrays.stream("b,l,a".split(","))
  2. Stream.of("b,l,a".split(","))
  3. Pattern.compile(",").splitAsStream("b,l,a")

My priorities are:

我的优先事项是:

  • Robustness
  • Readability
  • Performance
  • 稳健性
  • 可读性
  • 表现

A complete, compilable example:

一个完整的、可编译的例子

import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.stream.Stream;

public class HelloWorld {

    public static void main(String[] args) {
        stream1().forEach(System.out::println);
        stream2().forEach(System.out::println);
        stream3().forEach(System.out::println);
    }

    private static Stream<String> stream1() {
        return Arrays.stream("b,l,a".split(","));
    }

    private static Stream<String> stream2() {
        return Stream.of("b,l,a".split(","));
    }

    private static Stream<String> stream3() {
        return Pattern.compile(",").splitAsStream("b,l,a");
    }

}

采纳答案by Holger

Arrays.stream/String.split

Arrays.stream/String.split

Since String.splitreturns an array String[], I always recommend Arrays.streamas the canonical idiom for streaming over an array.

由于String.split返回一个数组String[],我总是推荐Arrays.stream作为通过数组流式传输的规范习语。

String input = "dog,cat,bird";
Stream<String> stream = Arrays.stream(input.split( "," ));
stream.forEach(System.out::println);

Stream.of/String.split

Stream.of/String.split

Stream.ofis a varargsmethod which just happens to accept an array, due to the fact that varargs methods are implemented via arrays and there were compatibility concerns when varargs were introduced to Java and existing methods retrofitted to accept variable arguments.

Stream.of是一个可变参数方法,它恰好接受一个数组,因为可变参数方法是通过数组实现的,并且当可变参数被引入 Java 并且现有方法被改造为接受可变参数时存在兼容性问题。

Stream<String> stream = Stream.of(input.split(","));     // works, but is non-idiomatic
Stream<String> stream = Stream.of("dog", "cat", "bird"); // intended use case

Pattern.splitAsStream

Pattern.splitAsStream

Pattern.compile(",").splitAsStream(string)has the advantage of streaming directly rather than creating an intermediate array. So for a large number of sub-strings, this can have a performance benefit. On the other hand, if the delimiter is trivial, i.e. a single literal character, the String.splitimplementation will go through a fast path instead of using the regex engine. So in this case, the answer is not trivial.

Pattern.compile(",").splitAsStream(string)具有直接流式传输而不是创建中间数组的优点。因此,对于大量子字符串,这可以带来性能优势。另一方面,如果分隔符很简单,即单个文字字符,则String.split实现将通过快速路径而不是使用正则表达式引擎。所以在这种情况下,答案并非微不足道。

Stream<String> stream = Pattern.compile(",").splitAsStream(input);

If the streaming happens inside another stream, e.g. .flatMap(Pattern.compile(pattern) ::splitAsStream)there is the advantage that the pattern has to be analyzed only once, rather than for every string of the outer stream.

如果流发生在另一个流内部,例如.flatMap(Pattern.compile(pattern) ::splitAsStream),优点是模式必须只分析一次,而不是对外部流的每个字符串进行分析。

Stream<String> stream = Stream.of("a,b", "c,d,e", "f", "g,h,i,j")
    .flatMap(Pattern.compile(",")::splitAsStream);

This is a property of method references of the form expression::name, which will evaluate the expression and capture the result when creating the instance of the functional interface, as explained in What is the equivalent lambda expression for System.out::printlnand java.lang.NullPointerException is thrown using a method-reference but not a lambda expression

这是表单的方法引用的一个属性expression::name,它将在创建函数式接口的实例时评估表达式并捕获结果,如System.out::printlnjava.lang的等效 lambda 表达式是什么中所述。使用方法引用而不是 lambda 表达式抛出 NullPointerException

回答by Alexey Soshin

Regarding (1) and (2) there shouldn't be much difference, as your code is almost the same.
Regarding (3), that would be much more effective it terms of memory (not necessarily CPU), but in my opinion, a bit harder to read.

关于 (1) 和 (2) 应该没有太大区别,因为您的代码几乎相同。
关于(3),这在内存(不一定是CPU)方面会更有效,但在我看来,阅读起来有点困难。

回答by Stephen C

Robustness

稳健性

I can see no difference in the robustness of the three approaches.

我看不出这三种方法的稳健性有什么不同。

Readability

可读性

I am not aware of any credible scientific studies on code readability involving experienced Java programmers, so readability is a matter of opinion. Even then, you never know if someone giving their opinion is making an objective distinction between actual readability, what they have been taught about readability, and their own personal taste.

我不知道有任何关于代码可读性的可靠科学研究涉及有经验的 Java 程序员,所以可读性是一个见仁见智的问题。即便如此,您也永远不知道发表意见的人是否客观区分了实际可读性、他们所学的可读性以及他们自己的个人品味。

So I will leave it to you to make your own judgements on readability ... noting that you do consider this to be a high priority.

所以我会让你自己对可读性做出判断……注意到你确实认为这是一个高优先级。

FWIW, the only people whose opinions on this matter are you and your team.

FWIW,唯一对此事发表意见的人是您和您的团队。

Performance

表现

I think that the answer to that is to carefully benchmark the three alternatives. Holger provides an analysis based on his study of some versions of Java. But:

我认为这个问题的答案是仔细地对三个备选方案进行基准测试。Holger 提供了基于他对某些 Java 版本的研究的分析。但:

  1. He was not able to come to a definite conclusion on which was fastest.
  2. Strictly speaking, his analysis only applies to the versions of Java he looked at. (Some aspects of his analysis couldbe different on (say) Android Java, or some future Oracle / OpenJDK version.)
  3. The relative performance is likely depend on the length of the string being split, the number of fields, and the complexity of the separator regex.
  4. In a real application, the relative performance may also depend what you do with the Streamobject, what garbage collector you have selected (since the different versions apparently generate different amounts of garbage), and other issues.
  1. 至于哪个最快,他无法得出明确的结论。
  2. 严格来说,他的分析只适用于他看过的Java版本。(他的分析的某些方面可能在(例如)Android Java 或某些未来的 Oracle / OpenJDK 版本上有所不同。)
  3. 相对性能可能取决于要拆分的字符串的长度、字段数和分隔符正则表达式的复杂性。
  4. 在实际应用程序中,相对性能还可能取决于您对Stream对象的处理方式、您选择的垃圾收集器(因为不同版本显然会产生不同数量的垃圾)以及其他问题。

So if you (or anyone else) are really concerned with the performance, you should write a micro-benchmark and run it on your production platform(s). Then do some application specific benchmarking. And you should consider looking at solutions that don't involve streams.

因此,如果您(或其他任何人)真的很关心性能,您应该编写一个微基准测试并在您的生产平台上运行它。然后进行一些特定于应用程序的基准测试。您应该考虑查看不涉及流的解决方案。