在 Scala 中对集合求和的最快方法是什么

Question

提问by Tala

I've tried different collections in Scala to sum it's elements and they are much slower than Java sums it's arrays (with forcycle). Is there a way for Scala to be as fast as Java arrays?

我在 Scala 中尝试了不同的集合来求和它的元素，它们比 Java 求和它的数组慢得多（带for循环）。有没有办法让 Scala 和 Java 数组一样快？

I've heard that in scala 2.8 arrays will be same as in java, but they are much slower in practice

我听说在 Scala 2.8 中的数组将与在 Java 中相同，但实际上它们要慢得多

Answer 1

回答by Rex Kerr

Indexing into arrays in a while loop is as fast in Scala as in Java. (Scala's "for" loop is not the low-level construct that Java's is, so that won't work the way you want.)

在 while 循环中索引数组在 Scala 中与在 Java 中一样快。（Scala 的“for”循环不是 Java 的低级结构，因此它不会以您想要的方式工作。）

Thus if in Java you see

因此，如果在 Java 中你看到

for (int i=0 ; i < array.length ; i++) sum += array(i)

in Scala you should write

在 Scala 中你应该写

var i=0
while (i < array.length) {
  sum += array(i)
  i += 1
}

and if you do your benchmarks appropriately, you'll find no difference in speed.

如果你适当地进行基准测试，你会发现速度没有差异。

If you have iterators anyway, then Scala is as fast as Java in most things. For example, if you have an ArrayList of doubles and in Java you add them using

如果你有迭代器，那么 Scala 在大多数情况下都和 Java 一样快。例如，如果您有一个双精度数组列表，并且在 Java 中使用添加它们

for (double d : arraylist) { sum += d }

then in Scala you'll be approximately as fast--if using an equivalent data structure like ArrayBuffer--with

那么在 Scala 中，您将大约同样快——如果使用像 ArrayBuffer 这样的等效数据结构——

arraybuffer.foreach( sum += _ )

and not too far off the mark with either of

并且不太离谱

sum = (0 /: arraybuffer)(_ + _)
sum = arraybuffer.sum  // 2.8 only

Keep in mind, though, that there's a penalty to mixing high-level and low-level constructs. For example, if you decide to start with an array but then use "foreach" on it instead of indexing into it, Scala has to wrap it in a collection (ArrayOpsin 2.8) to get it to work, and often will have to box the primitives as well.

但是请记住，混合高级和低级构造会受到惩罚。例如，如果你决定从一个数组开始，然后在它上面使用“foreach”而不是索引到它，Scala 必须将它包装在一个集合中（ArrayOps在 2.8 中）以使其工作，并且通常必须将原始人也是如此。

Anyway, for benchmark testing, these two functions are your friends:

不管怎样，对于基准测试，这两个函数是你的朋友：

def time[F](f: => F) = {
  val t0 = System.nanoTime
  val ans = f
  printf("Elapsed: %.3f\n",1e-9*(System.nanoTime-t0))
  ans
}

def lots[F](n: Int, f: => F): F = if (n <= 1) f else { f; lots(n-1,f) }

For example:

例如：

val a = Array.tabulate(1000000)(_.toDouble)
val ab = new collection.mutable.ArrayBuffer[Double] ++ a
def adSum(ad: Array[Double]) = {
  var sum = 0.0
  var i = 0
  while (i<ad.length) { sum += ad(i); i += 1 }
  sum
}

// Mixed array + high-level; convenient, not so fast
scala> lots(3, time( lots(100,(0.0 /: a)(_ + _)) ) )
Elapsed: 2.434
Elapsed: 2.085
Elapsed: 2.081
res4: Double = 4.999995E11

// High-level container and operations, somewhat better
scala> lots(3, time( lots(100,(0.0 /: ab)(_ + _)) ) )    
Elapsed: 1.694
Elapsed: 1.679
Elapsed: 1.635
res5: Double = 4.999995E11

// High-level collection with simpler operation
scala> lots(3, time( lots(100,{var s=0.0;ab.foreach(s += _);s}) ) )
Elapsed: 1.171
Elapsed: 1.166
Elapsed: 1.162
res7: Double = 4.999995E11

// All low level operations with primitives, no boxing, fast!
scala> lots(3, time( lots(100,adSum(a)) ) )              
Elapsed: 0.185
Elapsed: 0.183
Elapsed: 0.186
res6: Double = 4.999995E11

Answer 2

回答by BAR

You can now simply use sum.

您现在可以简单地使用 sum。

val values = Array.fill[Double](numValues)(0)

val sumOfValues = values.sum

Answer 3

回答by Daniel C. Sobral

It is very difficult to explain why some code you haven't shown performs worse than some other code you haven't shown in some benchmark you haven't shown.

很难解释为什么你没有展示的一些代码比你在一些没有展示的基准测试中没有展示的其他代码性能更差。

You may be interested in this questionand its accepted answer, for one thing. But benchmarking JVM code is hard, because the JIT will optimize code in ways that are difficult to predict (which is why JIT beats traditional optimization at compile time).

一方面，您可能对这个问题及其公认的答案感兴趣。但是对 JVM 代码进行基准测试很困难，因为 JIT 将以难以预测的方式优化代码（这就是 JIT 在编译时击败传统优化的原因）。

Answer 4

回答by ayushn21

The proper scala or functional was to do this would be:

正确的 Scala 或函数式是这样做的：

val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)

Check out this link for the full explanation of the syntax: http://www.codecommit.com/blog/scala/quick-explanation-of-scalas-syntax

查看此链接以获取语法的完整解释：http: //www.codecommit.com/blog/scala/quick-explanation-of-scalas-syntax

I doubt this would be faster than doing it in the ways described in the other answers but I haven't tested it so I'm not sure. In my opinion this is the proper way to do it though since Scala is a functional language.

我怀疑这会比其他答案中描述的方式更快，但我还没有测试过，所以我不确定。在我看来，这是正确的方法，因为 Scala 是一种函数式语言。

Answer 5

回答by Randall Schulz

Scala 2.8 ArrayareJVM / Java arrays and as such have identical performance characteristics. But that means they cannot directly have extra methods that unify them with the rest of the Scala collections. To provide the illusion that arrays have these methods, there are implicit conversions to wrapper classes that add those capabilities. If you are not careful you'll incur inordinate overhead using those features.

Scala 2.8Array是JVM / Java 数组，因此具有相同的性能特征。但这意味着它们不能直接拥有额外的方法来将它们与 Scala 集合的其余部分统一起来。为了提供数组具有这些方法的错觉，对添加这些功能的包装类进行了隐式转换。如果您不小心使用这些功能，您将招致过多的开销。

In those cases where iteration overhead is critical, you can explicitly get an iterator (or maintain an integer index, for indexed sequential structures like Arrayor other IndexedSeq) and use a whileloop, which is a language-level construct that need not operate on functions (literals or otherwise) but can compile in-line code blocks.

在迭代开销至关重要的情况下，您可以显式获取迭代器（或维护整数索引，用于索引顺序结构，例如Array或 other IndexedSeq）并使用while循环，这是一种不需要对函数（文字或其他）但可以编译内嵌代码块。

val l1 = List(...) // or any Iteralbe
val i1 = l1.iterator
while (i1.hasNext) {
  val e = i1.next
  // Do stuff with e
}

Such code will execute essentially as fast as a Java counterpart.

此类代码的执行速度基本上与 Java 对应的代码一样快。

Answer 6

回答by rvazquezglez

Timing is not the only concern. With sumyou might find an overflow issue:

时间不是唯一的问题。随着sum你可能会发现一个溢出问题：

scala> Array(2147483647,2147483647).sum
res0: Int = -2

in this case seeding foldLeftwith a Longis preferable

在这种情况下foldLeft，Long最好用 a播种

scala> Array(2147483647,2147483647).foldLeft(0L)(_+_)
res1: Long = 4294967294

EDIT:Longcan be used from beginning:

编辑：Long可以从一开始就使用：

scala> Array(2147483647L,2147483647L).sum
res1: Long = 4294967294

在 Scala 中对集合求和的最快方法是什么

提问by Tala

回答by Rex Kerr

回答by BAR

回答by Daniel C. Sobral

回答by ayushn21

回答by Randall Schulz

回答by rvazquezglez

相关推荐

最近更新

标签

在 Scala 中对集合求和的最快方法是什么

提问by Tala

回答by Rex Kerr

回答by BAR

回答by Daniel C. Sobral

回答by ayushn21

回答by Randall Schulz

回答by rvazquezglez

相关推荐

如何在 Scala 中从导入中排除/重命名某些类？

在 Scala 中得分和求和的最佳方式？

scala 如何在scala中获取方法列表

scala 使用 Simple Build Tool 制作独立 jar

相关推荐

最近更新

标签