在 Scala 中对集合求和的最快方法是什么
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3102872/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the fastest way to sum a collection in Scala
提问by Tala
I've tried different collections in Scala to sum it's elements and they are much slower than Java sums it's arrays (with forcycle). Is there a way for Scala to be as fast as Java arrays?
我在 Scala 中尝试了不同的集合来求和它的元素,它们比 Java 求和它的数组慢得多(带for循环)。有没有办法让 Scala 和 Java 数组一样快?
I've heard that in scala 2.8 arrays will be same as in java, but they are much slower in practice
我听说在 Scala 2.8 中的数组将与在 Java 中相同,但实际上它们要慢得多
回答by Rex Kerr
Indexing into arrays in a while loop is as fast in Scala as in Java. (Scala's "for" loop is not the low-level construct that Java's is, so that won't work the way you want.)
在 while 循环中索引数组在 Scala 中与在 Java 中一样快。(Scala 的“for”循环不是 Java 的低级结构,因此它不会以您想要的方式工作。)
Thus if in Java you see
因此,如果在 Java 中你看到
for (int i=0 ; i < array.length ; i++) sum += array(i)
in Scala you should write
在 Scala 中你应该写
var i=0
while (i < array.length) {
sum += array(i)
i += 1
}
and if you do your benchmarks appropriately, you'll find no difference in speed.
如果你适当地进行基准测试,你会发现速度没有差异。
If you have iterators anyway, then Scala is as fast as Java in most things. For example, if you have an ArrayList of doubles and in Java you add them using
如果你有迭代器,那么 Scala 在大多数情况下都和 Java 一样快。例如,如果您有一个双精度数组列表,并且在 Java 中使用添加它们
for (double d : arraylist) { sum += d }
then in Scala you'll be approximately as fast--if using an equivalent data structure like ArrayBuffer--with
那么在 Scala 中,您将大约同样快——如果使用像 ArrayBuffer 这样的等效数据结构——
arraybuffer.foreach( sum += _ )
and not too far off the mark with either of
并且不太离谱
sum = (0 /: arraybuffer)(_ + _)
sum = arraybuffer.sum // 2.8 only
Keep in mind, though, that there's a penalty to mixing high-level and low-level constructs. For example, if you decide to start with an array but then use "foreach" on it instead of indexing into it, Scala has to wrap it in a collection (ArrayOpsin 2.8) to get it to work, and often will have to box the primitives as well.
但是请记住,混合高级和低级构造会受到惩罚。例如,如果你决定从一个数组开始,然后在它上面使用“foreach”而不是索引到它,Scala 必须将它包装在一个集合中(ArrayOps在 2.8 中)以使其工作,并且通常必须将原始人也是如此。
Anyway, for benchmark testing, these two functions are your friends:
不管怎样,对于基准测试,这两个函数是你的朋友:
def time[F](f: => F) = {
val t0 = System.nanoTime
val ans = f
printf("Elapsed: %.3f\n",1e-9*(System.nanoTime-t0))
ans
}
def lots[F](n: Int, f: => F): F = if (n <= 1) f else { f; lots(n-1,f) }
For example:
例如:
val a = Array.tabulate(1000000)(_.toDouble)
val ab = new collection.mutable.ArrayBuffer[Double] ++ a
def adSum(ad: Array[Double]) = {
var sum = 0.0
var i = 0
while (i<ad.length) { sum += ad(i); i += 1 }
sum
}
// Mixed array + high-level; convenient, not so fast
scala> lots(3, time( lots(100,(0.0 /: a)(_ + _)) ) )
Elapsed: 2.434
Elapsed: 2.085
Elapsed: 2.081
res4: Double = 4.999995E11
// High-level container and operations, somewhat better
scala> lots(3, time( lots(100,(0.0 /: ab)(_ + _)) ) )
Elapsed: 1.694
Elapsed: 1.679
Elapsed: 1.635
res5: Double = 4.999995E11
// High-level collection with simpler operation
scala> lots(3, time( lots(100,{var s=0.0;ab.foreach(s += _);s}) ) )
Elapsed: 1.171
Elapsed: 1.166
Elapsed: 1.162
res7: Double = 4.999995E11
// All low level operations with primitives, no boxing, fast!
scala> lots(3, time( lots(100,adSum(a)) ) )
Elapsed: 0.185
Elapsed: 0.183
Elapsed: 0.186
res6: Double = 4.999995E11
回答by BAR
You can now simply use sum.
您现在可以简单地使用 sum。
val values = Array.fill[Double](numValues)(0)
val sumOfValues = values.sum
回答by Daniel C. Sobral
It is very difficult to explain why some code you haven't shown performs worse than some other code you haven't shown in some benchmark you haven't shown.
很难解释为什么你没有展示的一些代码比你在一些没有展示的基准测试中没有展示的其他代码性能更差。
You may be interested in this questionand its accepted answer, for one thing. But benchmarking JVM code is hard, because the JIT will optimize code in ways that are difficult to predict (which is why JIT beats traditional optimization at compile time).
一方面,您可能对这个问题及其公认的答案感兴趣。但是对 JVM 代码进行基准测试很困难,因为 JIT 将以难以预测的方式优化代码(这就是 JIT 在编译时击败传统优化的原因)。
回答by ayushn21
The proper scala or functional was to do this would be:
正确的 Scala 或函数式是这样做的:
val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)
Check out this link for the full explanation of the syntax: http://www.codecommit.com/blog/scala/quick-explanation-of-scalas-syntax
查看此链接以获取语法的完整解释:http: //www.codecommit.com/blog/scala/quick-explanation-of-scalas-syntax
I doubt this would be faster than doing it in the ways described in the other answers but I haven't tested it so I'm not sure. In my opinion this is the proper way to do it though since Scala is a functional language.
我怀疑这会比其他答案中描述的方式更快,但我还没有测试过,所以我不确定。在我看来,这是正确的方法,因为 Scala 是一种函数式语言。
回答by Randall Schulz
Scala 2.8 ArrayareJVM / Java arrays and as such have identical performance characteristics. But that means they cannot directly have extra methods that unify them with the rest of the Scala collections. To provide the illusion that arrays have these methods, there are implicit conversions to wrapper classes that add those capabilities. If you are not careful you'll incur inordinate overhead using those features.
Scala 2.8Array是JVM / Java 数组,因此具有相同的性能特征。但这意味着它们不能直接拥有额外的方法来将它们与 Scala 集合的其余部分统一起来。为了提供数组具有这些方法的错觉,对添加这些功能的包装类进行了隐式转换。如果您不小心使用这些功能,您将招致过多的开销。
In those cases where iteration overhead is critical, you can explicitly get an iterator (or maintain an integer index, for indexed sequential structures like Arrayor other IndexedSeq) and use a whileloop, which is a language-level construct that need not operate on functions (literals or otherwise) but can compile in-line code blocks.
在迭代开销至关重要的情况下,您可以显式获取迭代器(或维护整数索引,用于索引顺序结构,例如Array或 other IndexedSeq)并使用while循环,这是一种不需要对函数(文字或其他)但可以编译内嵌代码块。
val l1 = List(...) // or any Iteralbe
val i1 = l1.iterator
while (i1.hasNext) {
val e = i1.next
// Do stuff with e
}
Such code will execute essentially as fast as a Java counterpart.
此类代码的执行速度基本上与 Java 对应的代码一样快。
回答by rvazquezglez
Timing is not the only concern.
With sumyou might find an overflow issue:
时间不是唯一的问题。随着sum你可能会发现一个溢出问题:
scala> Array(2147483647,2147483647).sum
res0: Int = -2
in this case seeding foldLeftwith a Longis preferable
在这种情况下foldLeft,Long最好用 a播种
scala> Array(2147483647,2147483647).foldLeft(0L)(_+_)
res1: Long = 4294967294
EDIT:Longcan be used from beginning:
编辑:Long可以从一开始就使用:
scala> Array(2147483647L,2147483647L).sum
res1: Long = 4294967294

