Java 我应该返回一个集合还是一个流?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24676877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 14:03:25  来源:igfitidea点击:

Should I return a Collection or a Stream?

javacollectionsjava-8encapsulationjava-stream

提问by fredoverflow

Suppose I have a method that returns a read-only view into a member list:

假设我有一个方法将只读视图返回到成员列表中:

class Team {
    private List < Player > players = new ArrayList < > ();

    // ...

    public List < Player > getPlayers() {
        return Collections.unmodifiableList(players);
    }
}

Further suppose that all the client does is iterate over the list once, immediately. Maybe to put the players into a JList or something. The client does notstore a reference to the list for later inspection!

进一步假设客户端所做的只是立即迭代列表一次。也许将玩家放入 JList 或其他东西。客户端就不能存储到列表的引用以便稍后进行检查!

Given this common scenario, should I return a stream instead?

鉴于这种常见情况,我应该返回一个流吗?

public Stream < Player > getPlayers() {
    return players.stream();
}

Or is returning a stream non-idiomatic in Java? Were streams designed to always be "terminated" inside the same expression they were created in?

或者在 Java 中返回一个非惯用的流?流是否设计为始终在创建它们的同一表达式中“终止”?

采纳答案by Brian Goetz

The answer is, as always, "it depends". It depends on how big the returned collection will be. It depends on whether the result changes over time, and how important consistency of the returned result is. And it depends very much on how the user is likely to use the answer.

一如既往,答案是“视情况而定”。这取决于返回的集合有多大。这取决于结果是否随时间变化,以及返回结果的一致性有多重要。这在很大程度上取决于用户可能如何使用答案。

First, note that you can always get a Collection from a Stream, and vice versa:

首先,请注意,您始终可以从 Stream 中获取 Collection,反之亦然:

// If API returns Collection, convert with stream()
getFoo().stream()...

// If API returns Stream, use collect()
Collection<T> c = getFooStream().collect(toList());

So the question is, which is more useful to your callers.

所以问题是,这对您的来电者更有用。

If your result might be infinite, there's only one choice: Stream.

如果您的结果可能是无限的,那么只有一种选择:流。

If your result might be very large, you probably prefer Stream, since there may not be any value in materializing it all at once, and doing so could create significant heap pressure.

如果您的结果可能非常大,您可能更喜欢 Stream,因为一次实现它可能没有任何价值,并且这样做可能会产生显着的堆压力。

If all the caller is going to do is iterate through it (search, filter, aggregate), you should prefer Stream, since Stream has these built-in already and there's no need to materialize a collection (especially if the user might not process the whole result.) This is a very common case.

如果调用者要做的就是遍历它(搜索、过滤、聚合),你应该更喜欢 Stream,因为 Stream 已经内置了这些,不需要具体化一个集合(特别是如果用户可能不处理整个结果。)这是一个非常常见的情况。

Even if you know that the user will iterate it multiple times or otherwise keep it around, you still may want to return a Stream instead, for the simple fact that whatever Collection you choose to put it in (e.g., ArrayList) may not be the form they want, and then the caller has to copy it anyway. if you return a stream, they can do collect(toCollection(factory))and get it in exactly the form they want.

即使您知道用户将对其进行多次迭代或以其他方式保留它,您仍然可能希望返回一个 Stream,因为一个简单的事实是,您选择将它放入的任何 Collection(例如,ArrayList)可能不是他们想要的形式,然后调用者无论如何都必须复制它。如果你返回一个流,他们可以做到collect(toCollection(factory))并以他们想要的形式获得它。

The above "prefer Stream" cases mostly derive from the fact that Stream is more flexible; you can late-bind to how you use it without incurring the costs and constraints of materializing it to a Collection.

上面的“prefer Stream”案例大多源于Stream更加灵活;您可以后期绑定到您的使用方式,而不会产生将其具体化到集合的成本和限制。

The one case where you must return a Collection is when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target. Then, you will want put the elements into a collection that will not change.

必须返回 Collection 的一种情况是存在强一致性要求,并且必须生成移动目标的一致快照。然后,您需要将元素放入一个不会更改的集合中。

So I would say that most of the time, Stream is the right answer -- it is more flexible, it doesn't impose usually-unnecessary materialization costs, and can be easily turned into the Collection of your choice if needed. But sometimes, you may have to return a Collection (say, due to strong consistency requirements), or you may want to return Collection because you know how the user will be using it and know this is the most convenient thing for them.

所以我会说,在大多数情况下,Stream 是正确的答案——它更灵活,它不会强加通常不必要的实现成本,并且可以在需要时轻松转换为您选择的 Collection。但有时,您可能不得不返回一个集合(例如,由于强一致性要求),或者您可能想要返回集合,因为您知道用户将如何使用它并且知道这对他们来说是最方便的。

回答by Peter Lawrey

Were streams designed to always be "terminated" inside the same expression they were created in?

流是否设计为始终在创建它们的同一表达式中“终止”?

That is how they are used in most examples.

这就是它们在大多数示例中的使用方式。

Note: returning a Stream is not that different to returning a Iterator (admitted with much more expressive power)

注意:返回一个 Stream 与返回一个 Iterator 并没有什么不同(承认更有表现力)

IMHO the best solution is to encapsulate why you are doing this, and not return the collection.

恕我直言,最好的解决方案是封装你为什么这样做,而不是返回集合。

e.g.

例如

public int playerCount();
public Player player(int n);

or if you intend to count them

或者如果你打算计算它们

public int countPlayersWho(Predicate<? super Player> test);

回答by gontard

I think it depends on your scenario. May be, if you make your Teamimplement Iterable<Player>, it is sufficient.

我认为这取决于您的情况。可能是,如果你制作你的Team工具Iterable<Player>,就足够了。

for (Player player : team) {
    System.out.println(player);
}

or in the a functional style:

或以功能风格:

team.forEach(System.out::println);

But if you want a more complete and fluent api, a stream could be a good solution.

但是如果你想要一个更完整、更流畅的 api,流可能是一个很好的解决方案。

回答by dkatzel

I would probably have 2 methods, one to return a Collectionand one to return the collection as a Stream.

我可能有 2 种方法,一种返回 a Collection,另一种将集合作为 a 返回Stream

class Team
{
    private List<Player> players = new ArrayList<>();

// ...

    public List<Player> getPlayers()
    {
        return Collections.unmodifiableList(players);
    }

    public Stream<Player> getPlayerStream()
    {
        return players.stream();
    }

}

This is the best of both worlds. The client can choose if they want the List or the Stream and they don't have to do the extra object creation of making an immutable copy of the list just to get a Stream.

这是两全其美的。客户端可以选择他们想要 List 还是 Stream,并且他们不必为了获得 Stream 而进行额外的对象创建,即制作列表的不可变副本。

This also only adds 1 more method to your API so you don't have too many methods

这也只会为您的 API 增加 1 个方法,因此您没有太多方法

回答by Stuart Marks

I have a few points to add to Brian Goetz' excellent answer.

对于Brian Goetz 的出色回答,我有几点要补充。

It's quite common to return a Stream from a "getter" style method call. See the Stream usage pagein the Java 8 javadoc and look for "methods... that return Stream" for the packages other than java.util.Stream. These methods are usually on classes that represent or can contain multiple values or aggregations of something. In such cases, APIs typically have returned collections or arrays of them. For all the reasons that Brian noted in his answer, it's very flexible to add Stream-returning methods here. Many of these classes have collections- or array-returning methods already, because the classes predate the Streams API. If you're designing a new API, and it makes sense to provide Stream-returning methods, it might not be necessary to add collection-returning methods as well.

从“getter”样式的方法调用中返回 Stream 是很常见的。请参阅Java 8 javadoc 中的Stream 使用页面,并查找除java.util.Stream. 这些方法通常位于表示或可以包含多个值或某物聚合的类上。在这种情况下,API 通常会返回它们的集合或数组。由于 Brian 在他的回答中指出的所有原因,在此处添加流返回方法非常灵活。其中许多类已经具有返回集合或数组的方法,因为这些类早于 Streams API。如果您正在设计一个新的 API,并且提供返回流的方法是有意义的,那么可能也不需要添加返回集合的方法。

Brian mentioned the cost of "materializing" the values into a collection. To amplify this point, there are actually two costs here: the cost of storing values in the collection (memory allocation and copying) and also the cost of creating the values in the first place. The latter cost can often be reduced or avoided by taking advantage of a Stream's laziness-seeking behavior. A good example of this are the APIs in java.nio.file.Files:

Brian 提到了将这些值“具体化”到一个集合中的成本。为了放大这一点,这里实际上有两个成本:在集合中存储值的成本(内存分配和复制)以及首先创建值的成本。通过利用 Stream 的惰性寻求行为,通常可以减少或避免后一种成本。一个很好的例子是 中的 API java.nio.file.Files

static Stream<String>  lines(path)
static List<String>    readAllLines(path)

Not only does readAllLineshave to hold the entire file contents in memory in order to store it into the result list, it also has to read the file to the very end before it returns the list. The linesmethod can return almost immediately after it has performed some setup, leaving file reading and line breaking until later when it's necessary -- or not at all. This is a huge benefit, if for example, the caller is interested only in the first ten lines:

不仅readAllLines必须将整个文件内容保存在内存中才能将其存储到结果列表中,还必须在返回列表之前将文件读到最后。该lines方法在执行了一些设置后几乎可以立即返回,将文件读取和换行留到以后需要时——或者根本不需要。这是一个巨大的好处,例如,如果调用者只对前十行感兴趣:

try (Stream<String> lines = Files.lines(path)) {
    List<String> firstTen = lines.limit(10).collect(toList());
}

Of course considerable memory space can be saved if the caller filters the stream to return only lines matching a pattern, etc.

当然,如果调用者过滤流以仅返回与模式匹配的行等,则可以节省大量内存空间。

An idiom that seems to be emerging is to name stream-returning methods after the plural of the name of the things that it represents or contains, without a getprefix. Also, while stream()is a reasonable name for a stream-returning method when there is only one possible set of values to be returned, sometimes there are classes that have aggregations of multiple types of values. For example, suppose you have some object that contains both attributes and elements. You might provide two stream-returning APIs:

一个似乎正在出现的习语是用它所代表或包含的事物名称的复数形式来命名流返回方法,不带get前缀。此外,当stream()只有一组可能的值要返回时,虽然是流返回方法的合理名称,但有时有些类具有多种类型的值的聚合。例如,假设您有一个既包含属性又包含元素的对象。您可以提供两个流返回 API:

Stream<Attribute>  attributes();
Stream<Element>    elements();

回答by Vazgen Torosyan

Perhaps a Stream factory would be a better choice. The big win of only exposing collections via Stream is that it better encapsulates your domain model's data structure. It's impossible for any use of your domain classes to affect the inner workings of your List or Set simply by exposing a Stream.

It also encourages users of your domain class to write code in a more modern Java 8 style. It's possible to incrementally refactor to this style by keeping your existing getters and adding new Stream-returning getters. Over time, you can rewrite your legacy code until you've finally deleted all getters that return a List or Set. This kind of refactoring feels really good once you've cleared out all the legacy code!

也许 Stream 工厂会是更好的选择。仅通过 Stream 公开集合的最大优势在于它更好地封装了域模型的数据结构。仅仅通过公开流,域类的任何使用都不可能影响 List 或 Set 的内部工作。

它还鼓励域类的用户以更现代的 Java 8 风格编写代码。通过保留现有的 getter 并添加新的 Stream-returning getter,可以逐步重构为这种风格。随着时间的推移,您可以重写旧代码,直到最终删除所有返回 List 或 Set 的 getter。一旦您清除了所有遗留代码,这种重构感觉非常好!

回答by designbygravity

If the stream is finite, and there is an expected/normal operation on the returned objects which will throw a checked exception, I always return a Collection. Because if you are going to be doing something on each of the objects that can throw a check exception, you will hate the stream. One real lack with streams i there inability to deal with checked exceptions elegantly.

如果流是有限的,并且在返回的对象上有一个预期/正常的操作会抛出一个检查异常,我总是返回一个集合。因为如果你要对每个可能抛出检查异常的对象做一些事情,你会讨厌流。流的一个真正缺陷是无法优雅地处理检查异常。

Now, perhaps that is a sign that you don't need the checked exceptions, which is fair, but sometimes they are unavoidable.

现在,也许这表明您不需要受检异常,这是公平的,但有时它们是不可避免的。

回答by tkruse

In contrast to collections, streams have additional characteristics. A stream returned by any method might be:

与集合相反,流具有额外的特性。任何方法返回的流可能是:

  • finite or infinite
  • parallelor sequential (with a default globally shared threadpool that can impact any other part of an application)
  • ordered or non-ordered
  • 有限或无限
  • 并行或顺序(具有默认的全局共享线程池,可以影响应用程序的任何其他部分)
  • 有序或无序

These differences also exists in collections, but there they are part of the obvious contract:

这些差异也存在于集合中,但它们是明显契约的一部分:

  • All Collections have size, Iterator/Iterable can be infinite.
  • Collections are explicitly ordered or non-ordered
  • Parallelity is thankfully not something the collection care about beyond thread-safety.
  • 所有集合都有大小,迭代器/迭代器可以是无限的。
  • 集合是明确有序或无序的
  • 幸运的是,除了线程安全之外,并行性不是集合关心的东西。

As a consumer of a stream (either from a method return or as a method parameter) this is a dangerous and confusing situation. To make sure their algorithm behaves correctly, consumers of streams need to make sure the algorithm makes no wrong assumption about the stream characteristics. And that is a very hard thing to do. In unit testing, that would mean that you have to multiply all your tests to be repeated with the same stream contents, but with streams that are

作为流的使用者(从方法返回或作为方法参数),这是一种危险且令人困惑的情况。为了确保他们的算法正确运行,流的使用者需要确保算法没有对流特征做出错误的假设。这是一件非常困难的事情。在单元测试中,这意味着您必须将所有测试相乘以重复使用相同的流内容,但使用的流是

  • (finite, ordered, sequential)
  • (finite, ordered, parallel)
  • (finite, non-ordered, sequential)...
  • (有限,有序,顺序)
  • (有限、有序、平行)
  • (有限,无序,顺序)...

Writing method guards for streamsthat throw an IllegalArgumentException if the input stream has a characteristics breaking your algorithm is difficult, because the properties are hidden.

如果输入流具有破坏算法的特征,则为抛出 IllegalArgumentException 的流编写方法保护很困难,因为这些属性是隐藏的。

That leaves Stream only as a valid choice in a method signature when none of the problems above matter, which is rarely the case.

当上述问题都不重要时,Stream 仅作为方法签名中的有效选择,这种情况很少见。

It is much safer to use other datatypes in method signatures with an explicit contract (and without implicit thread-pool processing involved) that makes it impossible to accidentally process data with wrong assumptions about orderedness, sizedness or parallelity (and threadpool usage).

在具有显式契约(并且不涉及隐式线程池处理)的方法签名中使用其他数据类型要安全得多,这使得不可能意外地处理具有关于有序性、大小或并行性(以及线程池使用)的错误假设的数据。

回答by Daniel Avery

While some of the more high-profile respondents gave great general advice, I'm surprised no one has quite stated:

虽然一些更引人注目的受访者给出了很好的一般性建议,但我很惊讶没有人明确表示:

If you already have a "materialized" Collectionin-hand (i.e. it was already created before the call - as is the case in the given example, where it is a member field), there is no point converting it to a Stream. The caller can easily do that themselves. Whereas, if the caller wants to consume the data in its original form, you converting it to a Streamforces them to do redundant work to re-materialize a copy of the original structure.

如果您Collection手头已经有一个“物化” (即它已经在调用之前创建 - 就像给定示例中的情况一样,它是一个成员字段),则没有必要将其转换为Stream. 调用者自己可以轻松地做到这一点。然而,如果调用者想要以原始形式使用数据,则将其转换为 a 会Stream迫使他们做多余的工作以重新实现原始结构的副本。