Java 8 Streams - 收集与减少
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22577197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java 8 Streams - collect vs reduce
提问by jimhooker2002
When would you use collect()
vs reduce()
? Does anyone have good, concrete examples of when it's definitely better to go one way or the other?
你什么时候使用collect()
vs reduce()
?有没有人有好的、具体的例子来说明什么时候走一种方式肯定更好?
Javadoc mentions that collect() is a mutable reduction.
Javadoc 提到 collect() 是一个可变的减少。
Given that it's a mutable reduction, I assume it requires synchronization (internally) which, in turn, can be detrimental to performance. Presumably reduce()
is more readily parallelizable at the cost of having to create a new data structure for return after every step in the reduce.
鉴于这是一个可变的减少,我认为它需要(内部)同步,这反过来可能对性能有害。大概reduce()
更容易并行化,代价是必须在 reduce 的每一步之后创建一个新的数据结构来返回。
The above statements are guesswork however and I'd love an expert to chime in here.
然而,上述陈述是猜测,我希望有专家在这里发言。
采纳答案by Boris the Spider
reduce
is a "fold" operation, it applies a binary operator to each element in the stream where the first argument to the operator is the return value of the previous application and the second argument is the current stream element.
reduce
是“折叠”操作,它将二元运算符应用于流中的每个元素,其中运算符的第一个参数是前一个应用程序的返回值,第二个参数是当前流元素。
collect
is an aggregation operation where a "collection" is created and each element is "added" to that collection. Collections in different parts of the stream are then added together.
collect
是一个聚合操作,其中创建了一个“集合”,并将每个元素“添加”到该集合中。然后将流不同部分的集合相加。
The document you linkedgives the reason for having two different approaches:
If we wanted to take a stream of strings and concatenate them into a single long string, we could achieve this with ordinary reduction:
String concatenated = strings.reduce("", String::concat)
We would get the desired result, and it would even work in parallel. However, we might not be happy about the performance! Such an implementation would do a great deal of string copying, and the run time would be O(n^2) in the number of characters. A more performant approach would be to accumulate the results into a StringBuilder, which is a mutable container for accumulating strings. We can use the same technique to parallelize mutable reduction as we do with ordinary reduction.
如果我们想获取一个字符串流并将它们连接成一个长字符串,我们可以通过普通的归约来实现:
String concatenated = strings.reduce("", String::concat)
我们会得到想要的结果,它甚至可以并行工作。但是,我们可能对性能不满意!这样的实现会做大量的字符串复制,运行时间在字符数上是 O(n^2)。一种更高效的方法是将结果累积到 StringBuilder 中,它是一个用于累积字符串的可变容器。我们可以使用与普通归约相同的技术来并行化可变归约。
So the point is that the parallelisation is the same in both cases but in the reduce
case we apply the function to the stream elements themselves. In the collect
case we apply the function to a mutable container.
所以关键是并行化在两种情况下是相同的,但在reduce
我们将函数应用于流元素本身的情况下。在这种collect
情况下,我们将该函数应用于可变容器。
回答by george
The normal reduction is meant to combine two immutablevalues such as int, double, etc. and produce a new one; it's an immutablereduction. In contrast, the collect method is designed to mutate a containerto accumulate the result it's supposed to produce.
正常的归约是将两个不可变的值(例如 int、double 等)组合起来并产生一个新值;这是一个不变的减少。相比之下, collect 方法旨在改变容器以累积它应该产生的结果。
To illustrate the problem, let's suppose you want to achieve Collectors.toList()
using a simple reduction like
为了说明这个问题,假设你想Collectors.toList()
使用一个简单的减少来实现
List<Integer> numbers = stream.reduce(
new ArrayList<Integer>(),
(List<Integer> l, Integer e) -> {
l.add(e);
return l;
},
(List<Integer> l1, List<Integer> l2) -> {
l1.addAll(l2);
return l1;
});
This is the equivalent of Collectors.toList()
. However, in this case you mutate the List<Integer>
. As we know the ArrayList
is not thread-safe, nor is safe to add/remove values from it while iterating so you will either get concurrent exception or ArrayIndexOutOfBoundsException
or any kind of exception (especially when run in parallel) when you update the list or the combiner tries to merge the lists because you are mutating the list by accumulating (adding) the integers to it. If you want to make this thread-safe you need to pass a new list each time which would impair performance.
这相当于Collectors.toList()
. 但是,在这种情况下,您将List<Integer>
. 正如我们所知,这ArrayList
不是线程安全的,也不是在迭代时从中添加/删除值是安全的,因此ArrayIndexOutOfBoundsException
当您更新列表或组合器时,您将获得并发异常或任何类型的异常(尤其是并行运行时)尝试合并列表,因为您正在通过累积(添加)整数来改变列表。如果你想让这个线程安全,你需要每次传递一个新列表,这会影响性能。
In contrast, the Collectors.toList()
works in a similar fashion. However, it guarantees thread safety when you accumulate the values into the list. From the documentation for the collect
method:
相比之下,Collectors.toList()
作品的风格类似。但是,当您将值累积到列表中时,它可以保证线程安全。从该collect
方法的文档中:
Performs a mutable reduction operation on the elements of this stream using a Collector. If the stream is parallel, and the Collector is concurrent, and either the stream is unordered or the collector is unordered, then a concurrent reduction will be performed. When executed in parallel, multiple intermediate results may be instantiated, populated, and merged so as to maintain isolation of mutable data structures.Therefore, even when executed in parallel with non-thread-safe data structures (such as ArrayList), no additional synchronization is needed for a parallel reduction.
使用收集器对此流的元素执行可变归约操作。如果流是并行的,并且收集器是并发的,并且流是无序的或收集器是无序的,那么将执行并发减少。当并行执行时,可以实例化、填充和合并多个中间结果,以保持可变数据结构的隔离。因此,即使与非线程安全的数据结构(例如 ArrayList)并行执行时,也不需要额外的同步来进行并行缩减。
So to answer your question:
所以要回答你的问题:
When would you use
collect()
vsreduce()
?
你什么时候使用
collect()
vsreduce()
?
if you have immutable values such as ints
, doubles
, Strings
then normal reduction works just fine. However, if you have to reduce
your values into say a List
(mutable data structure) then you need to use mutable reduction with the collect
method.
如果您有诸如ints
, 之类的不可变值doubles
,Strings
那么正常的缩减就可以正常工作。但是,如果您必须将reduce
值转换为List
(可变数据结构),那么您需要在该collect
方法中使用可变归约。
回答by averasko
They are verydifferent in the potential memory footprint during the runtime. While collect()
collects and puts alldata into the collection, reduce()
explicitly asks you to specify how to reduce the data that made it through the stream.
它们在运行时的潜在内存占用非常不同。在collect()
收集所有数据并将其放入集合时,reduce()
明确要求您指定如何减少通过流传输的数据。
For example, if you want to read some data from a file, process it, and put it into some database, you might end up with java stream code similar to this:
例如,如果您想从文件中读取一些数据,对其进行处理,然后将其放入某个数据库中,您最终可能会得到类似于以下的 Java 流代码:
streamDataFromFile(file)
.map(data -> processData(data))
.map(result -> database.save(result))
.collect(Collectors.toList());
In this case, we use collect()
to force java to stream data through and make it save the result into the database. Without collect()
the data is never read and never stored.
在这种情况下,我们使用collect()
强制 java 流传输数据并将结果保存到数据库中。没有collect()
数据永远不会被读取,也永远不会被存储。
This code happily generates a java.lang.OutOfMemoryError: Java heap space
runtime error, if the file size is large enough or the heap size is low enough. The obvious reason is that it tries to stack all the data that made it through the stream (and, in fact, has already been stored in the database) into the resulting collection and this blows up the heap.
java.lang.OutOfMemoryError: Java heap space
如果文件大小足够大或堆大小足够小,此代码会很高兴地生成运行时错误。显而易见的原因是它试图将所有通过流(实际上已经存储在数据库中)的数据堆叠到结果集合中,这会炸毁堆。
However, if you replace collect()
with reduce()
-- it won't be a problem anymore as the latter will reduce and discard all the data that made it through.
但是,如果您替换collect()
为reduce()
-- 这将不再是问题,因为后者将减少并丢弃所有通过的数据。
In the presented example, just replace collect()
with something with reduce
:
在所提出的例子,只需更换collect()
的东西有reduce
:
.reduce(0L, (aLong, result) -> aLong, (aLong1, aLong2) -> aLong1);
You do not need even to care to make the calculation depend on the result
as Java is not a pure FP (functional programming) language and cannot optimize out the data that is not being used at the bottom of the stream because of the possible side-effects.
您甚至不需要关心使计算依赖于 ,result
因为 Java 不是纯 FP(函数式编程)语言,并且由于可能的副作用而无法优化流底部未使用的数据.
回答by Sandro
The reason is simply that:
原因很简单:
collect()
can only workwith mutableresult objects.reduce()
is designed to workwith immutableresult objects.
collect()
只能工作使用可变结果对象。reduce()
在设计工作与不变的结果对象。
"reduce()
with immutable" example
“reduce()
具有不可变”示例
public class Employee {
private Integer salary;
public Employee(String aSalary){
this.salary = new Integer(aSalary);
}
public Integer getSalary(){
return this.salary;
}
}
@Test
public void testReduceWithImmutable(){
List<Employee> list = new LinkedList<>();
list.add(new Employee("1"));
list.add(new Employee("2"));
list.add(new Employee("3"));
Integer sum = list
.stream()
.map(Employee::getSalary)
.reduce(0, (Integer a, Integer b) -> Integer.sum(a, b));
assertEquals(Integer.valueOf(6), sum);
}
"collect()
with mutable" example
“collect()
可变”示例
E.g. if you would like to manually calculate a sum using collect()
it can not work with BigDecimal
but only with MutableInt
from org.apache.commons.lang.mutable
for example. See:
例如,如果您想手动计算总和collect()
,则不能使用它,BigDecimal
而只能使用MutableInt
fromorg.apache.commons.lang.mutable
例如。看:
public class Employee {
private MutableInt salary;
public Employee(String aSalary){
this.salary = new MutableInt(aSalary);
}
public MutableInt getSalary(){
return this.salary;
}
}
@Test
public void testCollectWithMutable(){
List<Employee> list = new LinkedList<>();
list.add(new Employee("1"));
list.add(new Employee("2"));
MutableInt sum = list.stream().collect(
MutableInt::new,
(MutableInt container, Employee employee) ->
container.add(employee.getSalary().intValue())
,
MutableInt::add);
assertEquals(new MutableInt(3), sum);
}
This works because the accumulatorcontainer.add(employee.getSalary().intValue());
is not supposed to return a new object with the result but to change the state of the mutable container
of type MutableInt
.
这是有效的,因为累加器container.add(employee.getSalary().intValue());
不应该返回带有结果的新对象,而是更改container
类型可变的状态MutableInt
。
If you would like to use BigDecimal
instead for the container
you could not use the collect()
method as container.add(employee.getSalary());
would not change the container
because BigDecimal
it is immutable.
(Apart from this BigDecimal::new
would not work as BigDecimal
has no empty constructor)
如果您想使用BigDecimal
forcontainer
您不能使用该collect()
方法,因为它container.add(employee.getSalary());
不会更改,container
因为BigDecimal
它是不可变的。(除此之外,BigDecimal::new
由于BigDecimal
没有空的构造函数,因此不起作用)
回答by Yan Ng
Let the stream be a <- b <- c <- d
让流为 a <- b <- c <- d
In reduction,
在减少,
you will have ((a # b) # c) # d
你将有 ((a # b) # c) # d
where # is that interesting operation that you would like to do.
其中 # 是您想要执行的有趣操作。
In collection,
在收藏中,
your collector will have some kind of collecting structure K.
你的收藏家会有某种收藏结构K。
K consumes a. K then consumes b. K then consumes c. K then consumes d.
K消耗a。K然后消耗b。K然后消耗c。K然后消耗d。
At the end, you ask K what the final result is.
最后,你问 K 最后的结果是什么。
K then gives it to you.
K然后把它给你。
回答by rogerdpack
According to the docs
根据文档
The reducing() collectors are most useful when used in a multi-level reduction, downstream of groupingBy or partitioningBy. To perform a simple reduction on a stream, use Stream.reduce(BinaryOperator) instead.
当在 groupingBy 或 partitioningBy 下游的多级归约中使用时,reducing() 收集器最有用。要对流执行简单的归约,请改用 Stream.reduce(BinaryOperator)。
So basically you'd use reducing()
only when forced within a collect.
Here's another example:
所以基本上你reducing()
只会在强制收集时使用。这是另一个例子:
For example, given a stream of Person, to calculate the longest last name
of residents in each city:
Comparator<String> byLength = Comparator.comparing(String::length);
Map<String, String> longestLastNameByCity
= personList.stream().collect(groupingBy(Person::getCity,
reducing("", Person::getLastName, BinaryOperator.maxBy(byLength))));
According to this tutorialreduce is sometimes less efficient
根据本教程,reduce 有时效率较低
The reduce operation always returns a new value. However, the accumulator function also returns a new value every time it processes an element of a stream. Suppose that you want to reduce the elements of a stream to a more complex object, such as a collection. This might hinder the performance of your application. If your reduce operation involves adding elements to a collection, then every time your accumulator function processes an element, it creates a new collection that includes the element, which is inefficient. It would be more efficient for you to update an existing collection instead. You can do this with the Stream.collect method, which the next section describes...
reduce 操作总是返回一个新值。但是,累加器函数每次处理流的元素时也会返回一个新值。假设您要将流的元素减少为更复杂的对象,例如集合。这可能会影响应用程序的性能。如果您的reduce 操作涉及向集合添加元素,那么每次您的累加器函数处理一个元素时,它都会创建一个包含该元素的新集合,这是低效的。改为更新现有集合会更有效。您可以使用 Stream.collect 方法执行此操作,下一节将介绍该方法...
So the identity is "re-used" in a reduce scenario, so slightly more efficient to go with .reduce
if possible.
因此,身份在减少场景中被“重用”,因此.reduce
如果可能的话,使用效率会稍微提高一些。
回答by JetQin
Here is the code example
这是代码示例
List<Integer> list = Arrays.asList(1,2,3,4,5,6,7);
int sum = list.stream().reduce((x,y) -> {
System.out.println(String.format("x=%d,y=%d",x,y));
return (x + y);
}).get();
System.out.println(sum);
System.out.println(sum);
Here is the execute result:
下面是执行结果:
x=1,y=2
x=3,y=3
x=6,y=4
x=10,y=5
x=15,y=6
x=21,y=7
28
Reduce function handle two parameters, the first parameter is the previous return value int the stream, the second parameter is the current calculate value in the stream, it sum the first value and current value as the first value in next caculation.
Reduce函数处理两个参数,第一个参数是流中的前一个返回值,第二个参数是流中的当前计算值,将第一个值和当前值相加作为下一次计算的第一个值。