如何在 Scala 中优化 for-comprehensions 和循环？

Question

提问by Luigi Plinge

So Scala is supposed to be as fast as Java. I'm revisiting some Project Eulerproblems in Scala that I originally tackled in Java. Specifically Problem 5: "What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?"

所以 Scala 应该和 Java 一样快。我正在重温最初在 Java 中解决的 Scala 中的一些Project Euler问题。具体问题 5：“能被 1 到 20 的所有数字整除的最小正数是多少？”

Here's my Java solution, which takes 0.7 seconds to complete on my machine:

这是我的 Java 解决方案，在我的机器上需要 0.7 秒才能完成：

public class P005_evenly_divisible implements Runnable{
    final int t = 20;

    public void run() {
        int i = 10;
        while(!isEvenlyDivisible(i, t)){
            i += 2;
        }
        System.out.println(i);
    }

    boolean isEvenlyDivisible(int a, int b){
        for (int i = 2; i <= b; i++) {
            if (a % i != 0) 
                return false;
        }
        return true;
    }

    public static void main(String[] args) {
        new P005_evenly_divisible().run();
    }
}

Here's my "direct translation" into Scala, which takes 103 seconds (147 times longer!)

这是我对 Scala 的“直接翻译”，耗时 103 秒（长 147 倍！）

object P005_JavaStyle {
    val t:Int = 20;
    def run {
        var i = 10
        while(!isEvenlyDivisible(i,t))
            i += 2
        println(i)
    }
    def isEvenlyDivisible(a:Int, b:Int):Boolean = {
        for (i <- 2 to b)
            if (a % i != 0)
                return false
        return true
    }
    def main(args : Array[String]) {
        run
    }
}

Finally here's my attempt at functional programming, which takes 39 seconds (55 times longer)

最后，这是我对函数式编程的尝试，耗时 39 秒（长 55 倍）

object P005 extends App{
    def isDivis(x:Int) = (1 to 20) forall {x % _ == 0}
    def find(n:Int):Int = if (isDivis(n)) n else find (n+2)
    println (find (2))
}

Using Scala 2.9.0.1 on Windows 7 64-bit. How do I improve performance? Am I doing something wrong? Or is Java just a lot faster?

在 Windows 7 64 位上使用 Scala 2.9.0.1。如何提高性能？难道我做错了什么？或者Java只是快得多？

Answer 1

采纳答案by Martin Odersky

The problem in this particular case is that you return from within the for-expression. That in turn gets translated into a throw of a NonLocalReturnException, which is caught at the enclosing method. The optimizer can eliminate the foreach but cannot yet eliminate the throw/catch. And throw/catch is expensive. But since such nested returns are rare in Scala programs, the optimizer did not yet address this case. There is work going on to improve the optimizer which hopefully will solve this issue soon.

这种特殊情况下的问题是您从 for 表达式中返回。这反过来被转换为一个 NonLocalReturnException 的抛出，它在封闭方法中被捕获。优化器可以消除 foreach 但还不能消除 throw/catch。而且扔/接很贵。但是由于这种嵌套返回在 Scala 程序中很少见，优化器还没有解决这种情况。正在进行改进优化器的工作，希望很快就能解决这个问题。

Answer 2

回答by Kipton Barros

The problem is most likely the use of a forcomprehension in the method isEvenlyDivisible. Replacing forby an equivalent whileloop should eliminate the performance difference with Java.

问题很可能是for在方法中使用了理解isEvenlyDivisible。for用等效while循环替换应该可以消除与 Java 的性能差异。

As opposed to Java's forloops, Scala's forcomprehensions are actually syntactic sugar for higher-order methods; in this case, you're calling the foreachmethod on a Rangeobject. Scala's foris very general, but sometimes leads to painful performance.

与 Java 的for循环相反，Scala 的for推导式实际上是高阶方法的语法糖；在这种情况下，您正在调用对象foreach上的方法Range。Scala 的for非常通用，但有时会导致性能不佳。

You might want to try the -optimizeflag in Scala version 2.9. Observed performance may depend on the particular JVM in use, and the JIT optimizer having sufficient "warm up" time to identify and optimize hot-spots.

您可能想尝试-optimizeScala 2.9 版中的标志。观察到的性能可能取决于正在使用的特定 JVM，以及 JIT 优化器有足够的“预热”时间来识别和优化热点。

Recent discussions on the mailing list indicate that the Scala team is working on improving forperformance in simple cases:

最近关于邮件列表的讨论表明 Scala 团队正在努力提高for简单情况下的性能：

Here is the issue in the bug tracker: https://issues.scala-lang.org/browse/SI-4633

这是错误跟踪器中的问题：https: //issues.scala-lang.org/browse/SI-4633

Update 5/28:

5/28 更新：

As a short term solution, the ScalaCLplugin (alpha) will transform simple Scala loops into the equivalent of whileloops.
As a potential longer term solution, teams from the EPFL and Stanford are collaborating on a projectenabling run-time compilation of "virtual" Scalafor very high performance. For example, multiple idiomatic functional loops can be fused at run-timeinto optimal JVM bytecode, or to another target such as a GPU. The system is extensible, allowing user defined DSLs and transformations. Check out the publicationsand Stanford course notes. Preliminary code is available on Github, with a release intended in the coming months.

作为短期解决方案，ScalaCL插件 (alpha) 会将简单的 Scala 循环转换为等效的while循环。
作为一个潜在的长期解决方案，EPFL 和斯坦福大学的团队正在合作开展一个项目，该项目支持“虚拟”Scala 的运行时编译，以获得非常高的性能。例如，多个惯用的功能循环可以在运行时融合到最佳 JVM 字节码中，或融合到另一个目标（如 GPU）中。该系统是可扩展的，允许用户定义 DSL 和转换。查看出版物和斯坦福课程笔记。初步代码可在 Github 上获得，并计划在未来几个月内发布。

Answer 3

回答by Luigi Plinge

As a follow-up, I tried the -optimize flag and it reduced running time from 103 to 76 seconds, but that's still 107x slower than Java or a while loop.

作为后续，我尝试了 -optimize 标志，它将运行时间从 103 秒减少到 76 秒，但这仍然比 Java 或 while 循环慢 107 倍。

Then I was looking at the "functional" version:

然后我在看“功能”版本：

object P005 extends App{
  def isDivis(x:Int) = (1 to 20) forall {x % _ == 0}
  def find(n:Int):Int = if (isDivis(n)) n else find (n+2)
  println (find (2))
}

and trying to figure out how to get rid of the "forall" in a concise manner. I failed miserably and came up with

并试图弄清楚如何以简洁的方式摆脱“forall”。我惨败并想出了

object P005_V2 extends App {
  def isDivis(x:Int):Boolean = {
    var i = 1
    while(i <= 20) {
      if (x % i != 0) return false
      i += 1
    }
    return true
  }
  def find(n:Int):Int = if (isDivis(n)) n else find (n+2)
  println (find (2))
}

whereby my cunning 5-line solution has balooned to 12 lines. However, this version runs in 0.71 seconds, the same speed as the original Java version, and 56 times faster than the version above using "forall" (40.2 s)! (see EDIT below for why this is faster than Java)

我狡猾的 5 行解决方案已经膨胀到 12 行。但是，这个版本的运行时间为0.71 秒，与原始 Java 版本的速度相同，比使用“forall”（40.2 秒）的上述版本快 56 倍！（有关为什么这比 Java 快，请参阅下面的编辑）

Obviously my next step was to translate the above back into Java, but Java can't handle it and throws a StackOverflowError with n around the 22000 mark.

显然，我的下一步是将上述内容转换回 Java，但 Java 无法处理它并在 22000 标记附近抛出一个带有 n 的 StackOverflowError。

I then scratched my head for a bit and replaced the "while" with a bit more tail recursion, which saves a couple of lines, runs just as fast, but let's face it, is more confusing to read:

然后我挠了挠头，用更多的尾递归替换了“while”，这样可以节省几行，运行速度也一样快，但让我们面对现实，读起来更令人困惑：

object P005_V3 extends App {
  def isDivis(x:Int, i:Int):Boolean = 
    if(i > 20) true
    else if(x % i != 0) false
    else isDivis(x, i+1)

  def find(n:Int):Int = if (isDivis(n, 2)) n else find (n+2)
  println (find (2))
}

So Scala's tail recursion wins the day, but I'm surprised that something as simple as a "for" loop (and the "forall" method) is essentially broken and has to be replaced by inelegant and verbose "whiles", or tail recursion. A lot of the reason I'm trying Scala is because of the concise syntax, but it's no good if my code is going to run 100 times slower!

所以 Scala 的尾递归赢得了胜利，但令我惊讶的是，像“for”循环（和“forall”方法）这样简单的东西基本上被破坏了，必须用不优雅和冗长的“whiles”或尾递归代替. 我尝试 Scala 的很多原因是因为它的语法简洁，但如果我的代码运行速度慢 100 倍，那就不好了！

EDIT: (deleted)

编辑：（已删除）

EDIT OF EDIT: Former discrepancies between run times of 2.5s and 0.7s were entirely due to whether the 32-bit or 64-bit JVMs were being used. Scala from the command line uses whatever is set by JAVA_HOME, while Java uses 64-bit if available regardless. IDEs have their own settings. Some measurements here: Scala execution times in Eclipse

EDIT OF EDIT：之前 2.5s 和 0.7s 运行时间之间的差异完全是由于使用的是 32 位还是 64 位 JVM。命令行中的 Scala 使用 JAVA_HOME 设置的任何内容，而 Java 使用 64 位（如果可用）。IDE 有自己的设置。此处的一些测量：Eclipse 中的 Scala 执行时间

Answer 4

回答by juancn

The answer about for comprehension is right, but it's not the whole story. You should note note that the use of returnin isEvenlyDivisibleis not free. The use of return inside the for, forces the scala compiler to generate a non-local return (i.e. to return outside it's function).

关于理解的答案是正确的，但这不是全部。您应该注意，使用returninisEvenlyDivisible不是免费的。在 , 内部使用 returnfor强制 scala 编译器生成非本地返回（即返回到它的函数之外）。

This is done through the use of an exception to exit the loop. The same happens if you build your own control abstractions, for example:

这是通过使用异常退出循环来完成的。如果您构建自己的控件抽象，也会发生同样的情况，例如：

def loop[T](times: Int, default: T)(body: ()=>T) : T = {
    var count = 0
    var result: T = default
    while(count < times) {
        result = body()
        count += 1
    }
    result
}

def foo() : Int= {
    loop(5, 0) {
        println("Hi")
        return 5
    }
}

foo()

This prints "Hi" only once.

这只会打印“Hi”一次。

Note that the returnin fooexits foo(which is what you would expect). Since the bracketed expression is a function literal, which you can see in the signature of loopthis forces the compiler to generate a non local return, that is, the returnforces you to exit foo, not just the body.

请注意returninfoo退出foo（这是您所期望的）。由于括号中的表达式是一个函数字面量，您可以在loopthis的签名中看到它强制编译器生成非本地返回，即return强制您退出foo，而不仅仅是body.

In Java (i.e. the JVM) the only way to implement such behavior is to throw an exception.

在 Java（即 JVM）中，实现这种行为的唯一方法是抛出异常。

Going back to isEvenlyDivisible:

回到isEvenlyDivisible：

def isEvenlyDivisible(a:Int, b:Int):Boolean = {
  for (i <- 2 to b) 
    if (a % i != 0) return false
  return true
}

The if (a % i != 0) return falseis a function literal that has a return, so each time the return is hit, the runtime has to throw and catch an exception, which causes quite a bit of GC overhead.

这if (a % i != 0) return false是一个有返回值的函数字面量，所以每次命中返回值时，运行时都必须抛出并捕获异常，这会导致相当多的 GC 开销。

Answer 5

回答by Luigi Plinge

Some ways to speed up the forallmethod I discovered:

一些加快forall我发现的方法的方法：

The original: 41.3 s

原版：41.3 秒

def isDivis(x:Int) = (1 to 20) forall {x % _ == 0}

Pre-instantiating the range, so we don't create a new range every time: 9.0 s

预先实例化范围，所以我们不会每次都创建一个新范围：9.0 s

val r = (1 to 20)
def isDivis(x:Int) = r forall {x % _ == 0}

Converting to a List instead of a Range: 4.8 s

转换为列表而不是范围：4.8 秒

val rl = (1 to 20).toList
def isDivis(x:Int) = rl forall {x % _ == 0}

I tried a few other collections but List was fastest (although still 7x slower than if we avoid the Range and higher-order function altogether).

我尝试了其他一些集合，但 List 是最快的（尽管仍然比我们完全避免 Range 和高阶函数时慢 7 倍）。

While I am new to Scala, I'd guess the compiler could easily implement a quick and significant performance gain by simply automatically replacing Range literals in methods (as above) with Range constants in the outermost scope. Or better, intern them like Strings literals in Java.

虽然我是 Scala 的新手，但我猜想编译器可以通过简单地将方法中的 Range 文字（如上）自动替换为最外层作用域中的 Range 常量，从而轻松实现快速且显着的性能提升。或者更好的是，将它们像 Java 中的字符串文字一样实习。

footnote: Arrays were about the same as Range, but interestingly, pimping a new forallmethod (shown below) resulted in 24% faster execution on 64-bit, and 8% faster on 32-bit. When I reduced the calculation size by reducing the number of factors from 20 to 15 the difference disappeared, so maybe it's a garbage collection effect. Whatever the cause, it's significant when operating under full load for extended periods.

脚注：数组与 Range 大致相同，但有趣的是，使用一种新forall方法（如下所示）在 64 位上的执行速度提高了 24%，在 32 位上的执行速度提高了 8%。当我通过将因子数从 20 减少到 15 来减少计算大小时，差异消失了，所以可能是垃圾收集效应。无论是什么原因，在长时间满负荷运行时都很重要。

A similar pimp for List also resulted in about 10% better performance.

List 的类似皮条客也使性能提高了约 10%。

  val ra = (1 to 20).toArray
  def isDivis(x:Int) = ra forall2 {x % _ == 0}

  case class PimpedSeq[A](s: IndexedSeq[A]) {
    def forall2 (p: A => Boolean): Boolean = {      
      var i = 0
      while (i < s.length) {
        if (!p(s(i))) return false
        i += 1
      }
      true
    }    
  }  
  implicit def arrayToPimpedSeq[A](in: Array[A]): PimpedSeq[A] = PimpedSeq(in)

Answer 6

回答by Ara Vartanian

I just wanted to comment for people who might lose faith in Scala over issues like this that these kinds of issues come up in the performance of just about all functional languages. If you are optimizing a fold in Haskell, you will often have to re-write it as a recursive tail-call-optimized loop, or else you'll have performance and memory issues to contend with.

我只是想为那些可能会因为此类问题而对 Scala 失去信心的人发表评论，即几乎所有函数式语言的性能都会出现此类问题。如果您在 Haskell 中优化折叠，您通常必须将其重写为递归尾调用优化循环，否则您将面临性能和内存问题。

I know it's unfortunate that FPs aren't yet optimized to the point where we don't have to think about things like this, but this is not at all a problem particular to Scala.

我知道很遗憾 FP 还没有优化到我们不必考虑这样的事情的地步，但这根本不是 Scala 特有的问题。

Answer 7

回答by Display Name

Problems specific to Scala have already been discussed, but the main problem is that using a brute-force algorithm is not very cool. Consider this (much faster than the original Java code):

已经讨论了特定于 Scala 的问题，但主要问题是使用蛮力算法不是很酷。考虑一下（比原始 Java 代码快得多）：

def gcd(a: Int, b: Int): Int = {
    if (a == 0)
        b
    else
        gcd(b % a, a)
}
print (1 to 20 reduce ((a, b) => {
  a / gcd(a, b) * b
}))

Answer 8

回答by eivindw

Try the one-liner given in the solution Scala for Project Euler

尝试解决方案Scala for Project Euler 中给出的单行

The time given is at least faster than yours, though far from the while loop.. :)

给定的时间至少比你的快，虽然远离 while 循环.. :)

如何在 Scala 中优化 for-comprehensions 和循环？

提问by Luigi Plinge

采纳答案by Martin Odersky

回答by Kipton Barros

回答by Luigi Plinge

回答by juancn

回答by Luigi Plinge

回答by Ara Vartanian

回答by Display Name

回答by eivindw

相关推荐

最近更新

标签

如何在 Scala 中优化 for-comprehensions 和循环？

提问by Luigi Plinge

采纳答案by Martin Odersky

回答by Kipton Barros

回答by Luigi Plinge

回答by juancn

回答by Luigi Plinge

回答by Ara Vartanian

回答by Display Name

回答by eivindw

相关推荐

我可以在 Scala 中对类型参数执行匹配以查看它是否实现了特征吗？

scala：如何将扩展列表作为可变参数传递给方法？

Scala http 操作

如何在 Scala 中的过滤器、映射、平面映射期间轻松地从一种集合类型转换为另一种集合类型？

相关推荐

最近更新

标签