scala fold 和 foldLeft 方法的区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11319111/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 04:19:31  来源:igfitidea点击:

Fold and foldLeft method difference

scala

提问by Karel Bílek

I am not sure what is the difference between foldand foldLeftin Scala.

我不确定ScalafoldfoldLeftScala之间有什么区别。

The question Difference between fold and foldLeft or foldRight?has an answer that talks about ordering. That is understandable. But I still don't understand why this works (from REPL):

这个问题倍和foldLeft或foldRight之间的区别?有一个关于订购的答案。这是可以理解的。但我仍然不明白为什么会这样(来自 REPL):

scala> Array("1","2","3").foldLeft(0)(_ + _.toInt)
res6: Int = 6

but this does not:

但这不会:

scala> Array("1","2","3").fold(0)(_ + _.toInt)
<console>:8: error: value toInt is not a member of Any
              Array("1","2","3").fold(0)(_ + _.toInt)
                                               ^

What does this error message mean?

这个错误信息是什么意思?

This line from the documentation is also confusing to me.

文档中的这一行也让我感到困惑。

z- a neutral element for the fold operation; may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.)

z- 折叠操作的中性元素;可以向结果添加任意次数,并且不得更改结果(例如,Nil 表示列表连接,0 表示加法,或 1 表示乘法。)

Why will it be added an arbitrary number of times? I thought folding works differently.

为什么会添加任意次数?我认为折叠的工作方式不同。

回答by Rex Kerr

As defined by Scala, foldLeftis a linear operation while foldis allowed to be a tree operation. For example:

正如Scala定义的那样,foldLeft是线性操作,而fold允许是树操作。例如:

List(1,2,3,4,5).foldLeft(0)(_ + _)
// This is the only valid order of operations
0+1 = 1
      1+2 = 3
            3+3 = 6
                  6+4 = 10
                        10 + 5 = 15
                                 15  // done

List(1,2,3,4,5).fold(0)(_ + _)
// This is valid
0+1 = 1             0+3 = 3           0+5 = 5
      1+2 = 3             3+4 = 7           5
            3         +         7=10        5
                                  10    +   5 = 15
                                                15  // done

In order to allow arbitrary tree decompositions of a sequential list, you must have a zero that doesn't do anything (so you can add it wherever you need it in the tree) and you must create the same sort of thing that you take as your binary arguments so the types don't change on you depending on how you decompose the tree.

为了允许顺序列表的任意树分解,您必须有一个不做任何事情的零(因此您可以将它添加到树中任何需要它的地方)并且您必须创建与您相同的东西您的二进制参数,因此类型不会因您分解树的方式而改变。

(Being able to evaluate as a tree is nice for parallelization. If you want to be able to transform your output time as you go, you need both a combination operator anda standard start-value-transforms-sequence-element-to-desired-type function just like foldLefthas. Scala has this and calls it aggregate, but in some ways this is more like foldLeftthan foldis.)

(能够作为树进行评估对于并行化非常有用。如果您希望能够随时转换输出时间,您需要一个组合运算符一个标准的 start-value-transforms-sequence-element-to-desired -type 函数就像foldLefthas一样。Scala 有 this 并调用了它aggregate,但在某些方面,这foldLeftfold实际更像。)

回答by Heatsink

I am not familiar with Scala, but Scala's collection library has a similar design to Haskell's. This answer is based on Haskell and is probably accurate for Scala as well.

我对 Scala 不熟悉,但 Scala 的集合库与 Haskell 的设计类似。这个答案基于 Haskell,对于 Scala 也可能是准确的。

Because foldLeftprocesses its inputs from left to right, it can have different input and output types. On the other hand, foldcan process its inputs in various orders and so the inputs and output must all have the same type. This is easiest to see by expanding out the fold expressions. foldLeftoperates in a specific order:

因为foldLeft从左到右处理它的输入,所以它可以有不同的输入和输出类型。另一方面,fold可以按各种顺序处理其输入,因此输入和输出必须具有相同的类型。通过展开折叠表达式最容易看出这一点。 foldLeft以特定顺序运行:

Array("1","2","3").foldLeft(0)(_ + _.toInt)
= ((0 + "1".toInt) + "2".toInt) + "3".toInt

Note that array elements are never used as the first parameter to the combining function. They always appear on the right of the +.

请注意,数组元素永远不会用作组合函数的第一个参数。它们总是出现在+.

folddoes not guarantee a specific order. It could do various things, such as:

fold不保证特定的顺序。它可以做各种事情,例如:

Array("1","2","3").fold(0)(_ + _.toInt)
=  ((0 + "1".toInt) + "2".toInt) + "3".toInt
or (0 + "1".toInt) + ("2" + "3".toInt).toInt
or "1" + ("2" + ("3" + 0.toInt).toInt).toInt

Array elements can appear in either parameter of the combining function. But your combining function expects its first argument to be an int. If you don't respect that constraint, you end up adding strings to ints! This error is caught by the type system.

数组元素可以出现在组合函数的任一参数中。但是您的组合函数期望它的第一个参数是一个 int。如果您不遵守该约束,您最终会向整数添加字符串!此错误由类型系统捕获。

The neutral element may be introduced multiple times because, generally, a parallel fold is implemented by splitting up the input and executing multiple sequential folds. A sequential fold introduces the neutral element once. Imagine one particular execution of Array(1,2,3,4).fold(0)(_ + _)where the array is split into two separate arrays, and these are folded sequentially in two threads. (Of course, the real foldfunction does not spit the array into multiple arrays.) One thread executes Array(1,2).fold(0)(_ + _), computing 0 + 1 + 2. The other thread executes Array(3,4).fold(0)(_ + _), computing 0 + 3 + 4. Finally, the partial sums from the two threads are added together. Note that the neutral element, 0, appears twice.

中性元素可能会被多次引入,因为通常并行折叠是通过拆分输入并执行多个顺序折叠来实现的。顺序折叠一次引入中性元素。想象一个特定的执行,Array(1,2,3,4).fold(0)(_ + _)其中将数组拆分为两个单独的数组,并在两个线程中顺序折叠这些数组。(当然,真正的fold函数不会把数组吐成多个数组。)一个线程执行Array(1,2).fold(0)(_ + _),计算0 + 1 + 2。另一个线程执行Array(3,4).fold(0)(_ + _),计算0 + 3 + 4。最后,将来自两个线程的部分和相加在一起。请注意,中性元素0出现两次。

回答by Chris Shain

NOTE: I could be completely wrong here. My scala is less than perfect.

注意:我在这里可能完全错了。我的 Scala 并不完美。

I think that the difference is in the signature of the methods:

我认为区别在于方法的签名:

def fold[A1 >: A](z: A1)(op: (A1, A1) ? A1): A1

vs

对比

def foldLeft[B](z: B)(op: (B, T) ? B): B

In short, fold is defined as operating on some type A1 which is a supertype of the array's type, which for your string array the compiler defines as "Any" (probably because it needs a type that can store your String oran int- notice that the combiner method passed to fold Fold takes two parameters of the same type?) That's also what the documentation means when it talks about z- the implementation of Fold could be such that it combines your inputs in parallel, for instance:

简而言之,fold 被定义为对某种类型 A1 进行操作,该类型是数组类型的超类型,对于您的字符串数组,编译器将其定义为“Any”(可能是因为它需要一种可以存储您的字符串int-notice 的类型)传递给 fold Fold 的组合器方法是否需要两个相同类型的参数?)这也是文档在谈论 z 时的意思 - Fold 的实现可能是这样的,它可以并行组合您的输入,例如:

"1" + "2" --\
             --> 3 + 3 -> 6
"3" + *z* --/

On the other hand, foldLeft operates on type B (unconstrained) and only asks that you provide a combiner method that takes a parameter of type B and another of your array's type (String, in your case), and produces a B.

另一方面,foldLeft 对类型 B(无约束)进行操作,并且只要求您提供一个组合器方法,该方法接受一个类型为 B 的参数和另一个数组类型(在您的情况下为字符串),并生成一个 B。

回答by axel22

The error.You are getting a compile-time error because the signature of foldonly allows folding values of type which is the supertype of the type of the values in the collection, and the only supertype of String(your collection type) and Int(the type of your provided zero element) is Any. So, the type of the fold result is inferred to be Any- and Anydoes not have a method toInt.

错误。您收到编译时错误,因为 的签名fold仅允许折叠类型的值,该类型是集合中值类型的超类型,以及String(您的集合类型)和Int(您提供的零的类型)的唯一超类型元素)是Any。因此,折叠结果的类型被推断为Any- 并且Any没有 method toInt

Note that the two versions of foldhave different signatures:

请注意, 的两个版本fold具有不同的签名:

fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1

foldLeft[B](z: B)(f: (B, A) => B): B

Why do they have different signatures? This is because foldcould be implemented in parallel, as is the case with parallel collections. When multiple processors fold over the values in the collections, each one of the processors takes a subset of elements of type Aand produces the folded value of type A1, by consecutively applying op. The results produced by those processors must be combined together into a final folding value - this is done using the opfunction, which does exactly that.

为什么他们有不同的签名?这是因为fold可以并行实现,就像并行集合的情况一样。当多个处理器折叠集合中的值时,每个处理器都采用 type 元素的子集,AA1通过连续应用op. 这些处理器产生的结果必须组合在一起形成最终的折叠值——这是使用op函数完成的,它正是这样做的。

Now, note that this cannot be done using the fin foldLeft, because each of the processors produces a folded value of type B. Several values of type Bcannot be combined using f, because fonly combines value Bwith another value of type A- there is no correspondance between types Aand B.

现在,请注意这不能使用fin来完成foldLeft,因为每个处理器都会产生一个折叠的 type 值B。type 的几个值B不能用 using 组合f,因为f只将 valueB与另一个 type 值组合A- 类型A和之间没有对应关系B

Example.In your example, assume the 1st processor takes elements "1", "2"and the 2nd takes element "3". The first one will produce the folded value 3, and the second will produce another folded value 3. Now they have to combine their results to get the final folded value - this is impossible, because the closure _ + _.toIntonly knows how to combine an Intand String, and not 2 Intvalues.

例子。在您的示例中,假设第一个处理器采用元素"1", "2",第二个处理器采用 element "3"。第一个将产生折叠值3,第二个将产生另一个折叠值3。现在他们必须结合他们的结果来得到最终的折叠值——这是不可能的,因为闭包_ + _.toInt只知道如何结合一个IntString,而不是两个Int值。

For situations where these types differ, use aggregate, in which you have to define how to combine two values of type B:

对于这些类型不同的情况,请使用aggregate,您必须定义如何组合两个类型的值B

def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B

The combopabove defines how to do the final step when the fold result and the elements in the collection have different types.

combop上面定义了如何做最后的步骤中,当折叠结果和所述集合中的元件具有不同的类型。

The neutral element.As described above, multiple processors may fold over subsets of elements in the collection. Each one of them will start its folded value by adding the neutral element.

中性元素。如上所述,多个处理器可以折叠集合中元素的子集。它们中的每一个都将通过添加中性元素来开始其折叠值。

In the following example:

在以下示例中:

List(1, 2, 3).foldLeft(4)(_ + _)

always returns 10 = 4 + 1 + 2 + 3.

总是返回10 = 4 + 1 + 2 + 3

However, 4should not be used with fold, as it is not a neutral element:

但是,4不应与 一起使用fold,因为它不是中性元素:

List(1, 2, 3).fold(4)(_ + _)

The above may return (4 + 1 + 2) + (4 + 3) = 14or (4 + 1) + (4 + 2) + (4 + 3) = 18. If you don't use the neutral element for the fold, the results are nondeterministic. In the same way, you can use the Nilas a neutral element, but not a non-empty list.

以上可能返回(4 + 1 + 2) + (4 + 3) = 14(4 + 1) + (4 + 2) + (4 + 3) = 18。如果您不为 使用中性元素fold,则结果是不确定的。同样,您可以将Nil用作中性元素,但不能用作非空列表。

回答by Christopher Chiche

General difference

一般区别

Here are the prototypes of the methods

以下是这些方法的原型

fold[A1 >: A](z: A1)(op: (A1, A1) ? A1): A1
foldLeft[B](z: B)(f: (B, A) ? B): B

So, for fold the result is of type A1 >: Ainstead of any B. Moreover, as specified in the doc, for foldthe order is not

因此,对于 fold 结果是 typeA1 >: A而不是 any B。此外,如文档中所述,对于fold订单不是

About your error

关于你的错误

When typing scala> Array("1","2","3").fold(0)(_ + _.toInt)you assume that 0, an intis a subtype of String. This is why the compiler throws an error.

键入时,scala> Array("1","2","3").fold(0)(_ + _.toInt)您假设0anint是 的子类型String。这就是编译器抛出错误的原因。

About the weird z in fold

关于奇怪的 z 折叠

Here we have to see the implementationof foldto understand what happens. Here is what we get:

下面我们就来看看执行fold理解发生了什么。这是我们得到的:

def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1 = foldLeft(z)(op)

So basically, foldis an implementation of foldleftwith a constraint on the output type. We can now see that zwill in practice be used the same way as in foldleft. So we can just conclude that this comment was made because nothing assures that behavior in future implementations. We can already see it now, with parallels:

所以基本上,fold是一个foldleft对输出类型有约束的实现。我们现在可以看到,它z在实践中的使用方式与foldleft. 因此,我们可以得出结论,之所以做出此评论,是因为在未来的实现中没有任何东西可以保证这种行为。我们现在已经可以看到它了,有相似之处

def fold[U >: T](z: U)(op: (U, U) => U): U = {
  executeAndWaitResult(new Fold(z, op, splitter))
}

回答by Travis Brown

As another answer points out, the foldmethod is primarily there to support a parallel fold. You can see this as follows. First we can define a kind of wrapper for integers that allows us to keep track of the operations that have been performed on its instances.

正如另一个答案指出的那样,该fold方法主要用于支持平行折叠。你可以看到如下。首先,我们可以为整数定义一种包装器,它允许我们跟踪已在其实例上执行的操作。

case class TrackInt(v: Int) {
  val log = collection.mutable.Buffer.empty[Int]
  def plus(that: TrackInt) = {
    this.log += that.v
    that.log += this.v
    new TrackInt(this.v + that.v)
  }
}

Next we can create a parallel collection of these things and an identity element:

接下来我们可以创建这些东西的并行集合和一个标识元素:

val xs = (1 to 10).map(TrackInt(_)).par
val zero = TrackInt(0)

First we'll try foldLeft:

首先我们会尝试foldLeft

scala> xs.foldLeft(zero)(_ plus _)
res0: TrackInt = TrackInt(55)

scala> zero.log
res1: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1)

So our zero value is only used once, as we'd expect, since foldLeftperforms a sequential fold. Next we can clear the log and try fold:

所以我们的零值只使用一次,正如我们所期望的,因为foldLeft执行顺序折叠。接下来我们可以清除日志并尝试fold

scala> zero.log.clear()

scala> xs.fold(zero)(_ plus _)
res2: TrackInt = TrackInt(55)

scala> zero.log
res3: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1, 6, 2, 7, 8)

So we can see that the fold has been parallelized in such a way that the zero value is used multiple times. If we ran this again we'd be likely to see different values in the log.

所以我们可以看到折叠已经被并行化了,零值被多次使用。如果我们再次运行它,我们可能会在日志中看到不同的值。

回答by serv-inc

It has been mentioned, but without example: If you want to allow parallelism with different data types for output and input, you could use aggregate:

已经提到过,但没有例子:如果你想允许输出和输入的不同数据类型的并行性,你可以使用aggregate

Array("1","2","3").aggregate(0)(_ + _.toInt, _ + _)

The first function gets called first. Its results are then reduced with the second function. See Explanation of the aggregate scala function.

第一个函数首先被调用。然后用第二个函数减少它的结果。请参阅聚合 scala 函数的说明