Scala 惯用的编码风格只是编写低效代码的一个很酷的陷阱吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7084212/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is Scala idiomatic coding style just a cool trap for writing inefficient code?
提问by Adrian
I sense that the Scala community has a little big obsession with writing "concise", "cool", "scala idiomatic", "one-liner" -if possible- code. This is immediately followed by a comparison to Java/imperative/ugly code.
我觉得 Scala 社区对编写“简洁”、“酷”、“scala 惯用语”、“单行”(如果可能)的代码有点痴迷。紧接着是与 Java/命令式/丑陋代码的比较。
While this (sometimes) leads to easy to understand code, it also leads to inefficient code for 99% of developers. And this is where Java/C++ is not easy to beat.
虽然这(有时)会导致代码易于理解,但它也会导致 99% 的开发人员代码效率低下。这就是 Java/C++ 不容易被击败的地方。
Consider this simple problem: Given a list of integers, remove the greatest element.Ordering does not need to be preserved.
考虑这个简单的问题:给定一个整数列表,删除最大的元素。排序不需要保留。
Here is my version of the solution (It may not be the greatest, but it's what the average non-rockstar developer would do).
这是我的解决方案版本(它可能不是最好的,但这是普通的非摇滚明星开发人员会做的)。
def removeMaxCool(xs: List[Int]) = {
val maxIndex = xs.indexOf(xs.max);
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.
它是 Scala 惯用的、简洁的,并使用了一些不错的列表函数。它也非常低效。它至少遍历列表 3 或 4 次。
Here is my totally uncool, Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.
这是我完全不酷的类似 Java 的解决方案。这也是一个合理的 Java 开发人员(或 Scala 新手)会写的东西。
def removeMaxFast(xs: List[Int]) = {
var res = ArrayBuffer[Int]()
var max = xs.head
var first = true;
for (x <- xs) {
if (first) {
first = false;
} else {
if (x > max) {
res.append(max)
max = x
} else {
res.append(x)
}
}
}
res.toList
}
Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!
完全不是 Scala 惯用的、非功能性的、非简洁的,但它非常有效。它只遍历列表一次!
So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge obstacle to cross for greater Scala adoption. Is there a way out of this trap?
因此,如果 99% 的 Java 开发人员编写的代码效率高于 99% 的 Scala 开发人员,那么这对于 Scala 的更多采用来说是一个巨大的障碍。有没有办法摆脱这个陷阱?
I am looking for practical advice to avoid such "inefficiency traps" while keeping implementation clear ans concise.
我正在寻找实用的建议,以避免这种“低效率陷阱”,同时保持实施的清晰和简洁。
Clarification:This question comes from a real-life scenario: I had to write a complex algorithm. First I wrote it in Scala, then I "had to" rewrite it in Java. The Java implementation was twice as long, and not that clear, but at the same time it was twice as fast. Rewriting the Scala code to be efficient would probably take some time and a somewhat deeper understanding of scala internal efficiencies (for vs. map vs. fold, etc)
澄清:这个问题来自一个现实生活场景:我必须编写一个复杂的算法。首先我用 Scala 编写它,然后我“不得不”用 Java 重写它。Java 实现的时间是原来的两倍,而且不是那么清楚,但同时它的速度却是原来的两倍。重写 Scala 代码以提高效率可能需要一些时间和对 Scala 内部效率的更深入理解(对于 vs. map vs. fold 等)
回答by Daniel C. Sobral
Let's discuss a fallacy in the question:
让我们讨论一下问题中的一个谬误:
So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge obstacle to cross for greater Scala adoption. Is there a way out of this trap?
因此,如果 99% 的 Java 开发人员编写的代码效率高于 99% 的 Scala 开发人员,那么这对于 Scala 的更多采用来说是一个巨大的障碍。有没有办法摆脱这个陷阱?
This is presumed, with absolutely no evidence backing it up. If false, the question is moot.
这是假设,绝对没有证据支持。如果为假,则问题没有实际意义。
Is there evidence to the contrary? Well, let's consider the question itself -- it doesn't prove anything, but shows things are not that clear.
有相反的证据吗?好吧,让我们考虑一下问题本身——它不能证明任何事情,但表明事情并不那么清楚。
Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!
完全不是 Scala 惯用的、非功能性的、非简洁的,但它非常有效。它只遍历列表一次!
Of the four claims in the first sentence, the first three are true, and the fourth, as shown by user unknown, is false! And why it is false? Because, contrary to what the second sentence states, it traverses the list more than once.
在第一句中的四个声明中,前三个是正确的,第四个,如用户 unknown所示,是错误的!为什么它是假的?因为,与第二句所述相反,它不止一次地遍历列表。
The code calls the following methods on it:
代码对其调用以下方法:
res.append(max)
res.append(x)
and
和
res.toList
Let's consider first append.
让我们先考虑一下append。
appendtakes a vararg parameter. That meansmaxandxare first encapsulated into a sequence of some type (aWrappedArray, in fact), and then passed as parameter. A better method would have been+=.Ok,
appendcalls++=, which delegates to+=. But, first, it callsensureSize, which is the second mistake (+=calls that too --++=just optimizes that for multiple elements). Because anArrayis a fixed size collection, which means that, at each resize, the wholeArraymust be copied!
append采用可变参数。这意味着max和x首先封装成某种类型的序列(WrappedArray实际上是 a ),然后作为参数传递。更好的方法是+=.好的,
append调用++=,它委托给+=。但是,首先,它调用ensureSize,这是第二个错误(也+=调用它 -++=只是针对多个元素优化它)。因为 anArray是一个固定大小的集合,这意味着在每次调整大小时,都Array必须复制整个!
So let's consider this. When you resize, Java first clears the memory by storing 0 in each element, then Scala copies each element of the previous array over to the new array. Since size doubles each time, this happens log(n) times, with the number of elements being copied increasing each time it happens.
所以让我们考虑一下。当您调整大小时,Java 首先通过在每个元素中存储 0 来清除内存,然后 Scala 将前一个数组的每个元素复制到新数组中。由于大小每次都翻倍,这会发生 log(n) 次,每次发生时复制的元素数量都会增加。
Take for example n = 16. It does this four times, copying 1, 2, 4 and 8 elements respectively. Since Java has to clear each of these arrays, and each element must be read andwritten, each element copied represents 4 traversals of an element. Adding all we have (n - 1) * 4, or, roughly, 4 traversals of the complete list. If you count read and write as a single pass, as people often erroneously do, then it's still three traversals.
以 n = 16 为例。它这样做了四次,分别复制了 1、2、4 和 8 个元素。由于Java必须清除这些数组中的每一个,并且必须读取和写入每个元素,因此复制的每个元素代表对元素的4次遍历。将我们所有的 (n - 1) * 4 相加,或者粗略地说,完整列表的 4 次遍历。如果你把读写算作一次遍历,就像人们经常错误地做的那样,那么它仍然是三遍遍历。
One can improve on this by initializing the ArrayBufferwith an initial size equal to the list that will be read, minus one, since we'll be discarding one element. To get this size, we need to traverse the list once, though.
可以通过ArrayBuffer使用等于将要读取的列表减去一的初始大小初始化 来改进这一点,因为我们将丢弃一个元素。不过,为了获得这个大小,我们需要遍历列表一次。
Now let's consider toList. To put it simply, it traverses the whole list to create a new list.
现在让我们考虑一下toList。简单来说,就是遍历整个链表来创建一个新链表。
So, we have 1 traversal for the algorithm, 3 or 4 traversals for resize, and 1 additional traversal for toList. That's 4 or 5 traversals.
因此,我们有 1 次算法遍历,3 或 4 次调整大小遍历,以及 1 次额外遍历toList。那是 4 或 5 次遍历。
The original algorithm is a bit difficult to analyse, because take, dropand :::traverse a variable number of elements. Adding all together, however, it does the equivalent of 3 traversals. If splitAtwas used, it would be reduced to 2 traversals. With 2 more traversals to get the maximum, we get 5 traversals -- the same number as the non-functional, non-concise algorithm!
原算法是有点难以分析,因为take,drop和:::横穿元件的可变数量。然而,将所有这些加在一起,它相当于 3 次遍历。如果splitAt使用,它将减少到 2 次遍历。再进行 2 次遍历以获得最大值,我们得到 5 次遍历——与非功能性、非简洁算法的数量相同!
So, let's consider improvements.
所以,让我们考虑改进。
On the imperative algorithm, if one uses ListBufferand +=, then all methods are constant-time, which reduces it to a single traversal.
在命令式算法上,如果使用ListBufferand +=,则所有方法都是常数时间的,这将其减少为单次遍历。
On the functional algorithm, it could be rewritten as:
在函数算法上,它可以改写为:
val max = xs.max
val (before, _ :: after) = xs span (max !=)
before ::: after
That reduces it to a worst case of three traversals. Of course, there are other alternatives presented, based on recursion or fold, that solve it in one traversal.
这将它减少到三个遍历的最坏情况。当然,还有基于递归或折叠的其他替代方案,可以在一次遍历中解决它。
And, most interesting of all, all of these algorithms are O(n), and the only one which almost incurred (accidentally) in worst complexity was the imperative one (because of array copying). On the other hand, the cachecharacteristics of the imperative one might well make it faster, because the data is contiguous in memory. That, however, is unrelated to either big-Oh or functional vs imperative, and it is just a matter of the data structures that were chosen.
而且,最有趣的是,所有这些算法都是O(n),唯一一个几乎(偶然)导致最差复杂性的算法是命令式算法(因为数组复制)。另一方面,命令式的缓存特性可能会使其更快,因为数据在内存中是连续的。然而,这与 big-Oh 或函数式 vs 命令式无关,这只是选择的数据结构的问题。
So, if we actually go to the trouble of benchmarking, analyzing the results, considering performance of methods, and looking into ways of optimizing it, then we can find faster ways to do this in an imperative manner than in a functional manner.
因此,如果我们真的在进行基准测试、分析结果、考虑方法的性能并研究优化方法的过程中遇到麻烦,那么我们可以找到比函数式方式更快的方法来以命令式方式执行此操作。
But all this effort is very different from saying the average Java programmer code will be faster than the average Scala programmer code -- if the question is an example, that is simply false. And even discounting the question, we have seen no evidence that the fundamental premise of the question is true.
但是所有这些努力与说普通 Java 程序员代码将比普通 Scala 程序员代码更快的说法大不相同——如果问题是一个例子,那完全是错误的。即使不考虑这个问题,我们也没有看到任何证据表明这个问题的基本前提是正确的。
EDIT
编辑
First, let me restate my point, because it seems I wasn't clear. My point is that the code the average Java programmer writes may seem to be more efficient, but actually isn't. Or, put another way, traditional Java style doesn't gain you performance -- only hard work does, be it Java or Scala.
首先,让我重申我的观点,因为我似乎没有说清楚。我的观点是,普通 Java 程序员编写的代码似乎更有效率,但实际上并非如此。或者,换句话说,传统的 Java 风格不会让您获得性能——只有努力工作才能做到,无论是 Java 还是 Scala。
Next, I have a benchmark and results too, including almost all solutions suggested. Two interesting points about it:
接下来,我也有一个基准和结果,包括几乎所有建议的解决方案。关于它的两个有趣的点:
Depending on list size, the creation of objects can have a bigger impact than multiple traversals of the list. The original functional code by Adrian takes advantage of the fact that lists are persistent data structures by not copyingthe elements right of the maximum element at all. If a
Vectorwas used instead, both left and right sides would be mostly unchanged, which might lead to even better performance.Even though user unknown and paradigmatic have similar recursive solutions, paradigmatic's is way faster. The reason for that is that he avoids pattern matching. Pattern matching can be really slow.
根据列表大小,对象的创建可能比列表的多次遍历产生更大的影响。Adrian 的原始功能代码利用了列表是持久数据结构这一事实,根本不复制最大元素右侧的元素。如果使用 a
Vector代替,则左右两侧将基本保持不变,这可能会带来更好的性能。尽管用户未知和范式具有相似的递归解决方案,但范式的速度要快得多。原因是他避免了模式匹配。模式匹配可能真的很慢。
回答by user unknown
def removeOneMax (xs: List [Int]) : List [Int] = xs match {
case x :: Nil => Nil
case a :: b :: xs => if (a < b) a :: removeOneMax (b :: xs) else b :: removeOneMax (a :: xs)
case Nil => Nil
}
Here is a recursive method, which only iterates once. If you need performance, you have to think about it, if not, not.
这是一个递归方法,它只迭代一次。如果您需要性能,则必须考虑它,如果不需要,则不需要。
You can make it tail-recursive in the standard way: giving an extra parameter carry, which is per default the empty List, and collects the result while iterating. That is, of course, a bit longer, but if you need performance, you have to pay for it:
您可以以标准方式使其尾递归:提供一个额外的参数carry,默认情况下为空列表,并在迭代时收集结果。那当然要长一点,但是如果您需要性能,则必须为此付出代价:
import annotation.tailrec
@tailrec
def removeOneMax (xs: List [Int], carry: List [Int] = List.empty) : List [Int] = xs match {
case a :: b :: xs => if (a < b) removeOneMax (b :: xs, a :: carry) else removeOneMax (a :: xs, b :: carry)
case x :: Nil => carry
case Nil => Nil
}
I don't know what the chances are, that later compilers will improve slower map-calls to be as fast as while-loops. However: You rarely need high speed solutions, but if you need them often, you will learn them fast.
我不知道可能性有多大,以后的编译器会将较慢的映射调用改进为与 while 循环一样快。但是:您很少需要高速解决方案,但如果您经常需要它们,您将很快学会它们。
Do you know how big your collection has to be, to use a whole second for your solution on your machine?
你知道你的收藏有多大,才能在你的机器上使用整整一秒的时间来解决你的问题吗?
As oneliner, similar to Daniel C. Sobrals solution:
作为 oneliner,类似于 Daniel C. Sobrals 的解决方案:
((Nil : List[Int], xs(0)) /: xs.tail) ((p, x)=> if (p._2 > x) (x :: p._1, p._2) else ((p._2 :: p._1), x))._1
but that is hard to read, and I didn't measure the effective performance. The normal pattern is (x /: xs) ((a, b) => /* something */). Here, x and a are pairs of List-so-far and max-so-far, which solves the problem to bring everything into one line of code, but isn't very readable. However, you can earn reputation on CodeGolf this way, and maybe someone likes to make a performance measurement.
但这很难阅读,而且我没有衡量有效性能。正常模式是 (x /: xs) ((a, b) => /* something */)。这里,x 和 a 是一对 List-so-far 和 max-so-far,解决了将所有内容都放在一行代码中的问题,但不是很可读。但是,您可以通过这种方式在 CodeGolf 上赢得声誉,也许有人喜欢进行性能测量。
And now to our big surprise, some measurements:
现在让我们大吃一惊的是,一些测量结果:
An updated timing-method, to get the garbage collection out of the way, and have the hotspot-compiler warm up, a main, and many methods from this thread, together in an Object named
更新的计时方法,用于清除垃圾收集,并使热点编译器预热,该线程中的一个主要方法和许多方法,一起放在一个名为的对象中
object PerfRemMax {
def timed (name: String, xs: List [Int]) (f: List [Int] => List [Int]) = {
val a = System.currentTimeMillis
val res = f (xs)
val z = System.currentTimeMillis
val delta = z-a
println (name + ": " + (delta / 1000.0))
res
}
def main (args: Array [String]) : Unit = {
val n = args(0).toInt
val funs : List [(String, List[Int] => List[Int])] = List (
"indexOf/take-drop" -> adrian1 _,
"arraybuf" -> adrian2 _, /* out of memory */
"paradigmatic1" -> pm1 _, /**/
"paradigmatic2" -> pm2 _,
// "match" -> uu1 _, /*oom*/
"tailrec match" -> uu2 _,
"foldLeft" -> uu3 _,
"buf-=buf.max" -> soc1 _,
"for/yield" -> soc2 _,
"splitAt" -> daniel1,
"ListBuffer" -> daniel2
)
val r = util.Random
val xs = (for (x <- 1 to n) yield r.nextInt (n)).toList
// With 1 Mio. as param, it starts with 100 000, 200k, 300k, ... 1Mio. cases.
// a) warmup
// b) look, where the process gets linear to size
funs.foreach (f => {
(1 to 10) foreach (i => {
timed (f._1, xs.take (n/10 * i)) (f._2)
compat.Platform.collectGarbage
});
println ()
})
}
I renamed all the methods, and had to modify uu2 a bit, to fit to the common method declaration (List [Int] => List [Int]).
我重命名了所有方法,并且不得不稍微修改 uu2,以适应通用方法声明(List [Int] => List [Int])。
From the long result, i only provide the output for 1M invocations:
从长结果来看,我只提供 1M 次调用的输出:
scala -Dserver PerfRemMax 2000000
indexOf/take-drop: 0.882
arraybuf: 1.681
paradigmatic1: 0.55
paradigmatic2: 1.13
tailrec match: 0.812
foldLeft: 1.054
buf-=buf.max: 1.185
for/yield: 0.725
splitAt: 1.127
ListBuffer: 0.61
The numbers aren't completly stable, depending on the sample size, and a bit varying from run to run. For example, for 100k to 1M runs, in steps of 100k, the timing for splitAt was as follows:
这些数字并不完全稳定,具体取决于样本大小,并且每次运行都会有所不同。例如,对于 100k 到 1M 的运行,以 100k 为步长,splitAt 的时序如下:
splitAt: 0.109
splitAt: 0.118
splitAt: 0.129
splitAt: 0.139
splitAt: 0.157
splitAt: 0.166
splitAt: 0.749
splitAt: 0.752
splitAt: 1.444
splitAt: 1.127
The initial solution is already pretty fast. splitAtis a modification from Daniel, often faster, but not always.
最初的解决方案已经相当快了。splitAt是 Daniel 的修改,通常更快,但并非总是如此。
The measurement was done on a single core 2Ghz Centrino, running xUbuntu Linux, Scala-2.8 with Sun-Java-1.6 (desktop).
测量是在单核 2Ghz Centrino 上完成的,运行 xUbuntu Linux、Scala-2.8 和 Sun-Java-1.6(桌面)。
The two lessons for me are:
给我的两个教训是:
- always measure your performance improvements; it is very hard to estimate it, if you don't do it on a daily basis
- it is not only fun, to write functional code - sometimes the result is even faster
- 始终衡量您的绩效改进;很难估计,如果你不是每天都这样做
- 编写函数式代码不仅有趣 - 有时结果甚至更快
Here is a link to my benchmarkcode, if somebody is interested.
回答by paradigmatic
First of all, the behavior of the methods you presented is not the same. The first one keeps the element ordering, while the second one doesn't.
首先,您提供的方法的行为是不一样的。第一个保持元素顺序,而第二个没有。
Second, among all the possible solution which could be qualified as "idiomatic", some are more efficient than others. Staying very close to your example, you can for instance use tail-recursion to eliminate variables and manual state management:
其次,在所有可以被称为“惯用”的可能解决方案中,有些比其他的更有效。非常接近您的示例,例如,您可以使用尾递归来消除变量和手动状态管理:
def removeMax1( xs: List[Int] ) = {
def rec( max: Int, rest: List[Int], result: List[Int]): List[Int] = {
if( rest.isEmpty ) result
else if( rest.head > max ) rec( rest.head, rest.tail, max :: result)
else rec( max, rest.tail, rest.head :: result )
}
rec( xs.head, xs.tail, List() )
}
or fold the list:
或折叠列表:
def removeMax2( xs: List[Int] ) = {
val result = xs.tail.foldLeft( xs.head -> List[Int]() ) {
(acc,x) =>
val (max,res) = acc
if( x > max ) x -> ( max :: res )
else max -> ( x :: res )
}
result._2
}
If you want to keep the original insertion order, you can (at the expense of having two passes, rather than one) without any effort write something like:
如果您想保留原始插入顺序,您可以(以两次传递为代价,而不是一次)毫不费力地编写如下内容:
def removeMax3( xs: List[Int] ) = {
val max = xs.max
xs.filterNot( _ == max )
}
which is more clear than your first example.
这比你的第一个例子更清楚。
回答by Chuck
The biggest inefficiency when you're writing a program is worrying about the wrong things. This is usually the wrong thing to worry about. Why?
编写程序时最大的低效率是担心错误的事情。这通常是错误的担心。为什么?
Developer time is generally much more expensive than CPU time — in fact, there is usually a dearth of the former and a surplus of the latter.
Most code does not need to be very efficient because it will never be running on million-item datasets multiple times every second.
Most code does need to bug free, and less code is less room for bugs to hide.
开发人员时间通常比 CPU 时间昂贵得多——事实上,通常前者缺乏而后者过剩。
大多数代码不需要非常高效,因为它永远不会每秒在百万项数据集上运行多次。
大多数代码确实需要无错误,而代码越少,隐藏错误的空间就越小。
回答by Daniel C. Sobral
The example you gave is not very functional, actually. Here's what you are doing:
实际上,您给出的示例并不是很实用。这是你在做什么:
// Given a list of Int
def removeMaxCool(xs: List[Int]): List[Int] = {
// Find the index of the biggest Int
val maxIndex = xs.indexOf(xs.max);
// Then take the ints before and after it, and then concatenate then
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
Mind you, it is not bad, but you know when functional code is at its best when it describes what you want, instead of how you want it. As a minor criticism, if you used splitAtinstead of takeand dropyou could improve it slightly.
请注意,这还不错,但是当函数式代码描述您想要的而不是您想要的方式时,它会处于最佳状态。作为未成年人的批评,如果你使用的splitAt,而不是take和drop你可以稍微改善它。
Another way of doing it is this:
另一种方法是这样的:
def removeMaxCool(xs: List[Int]): List[Int] = {
// the result is the folding of the tail over the head
// and an empty list
xs.tail.foldLeft(xs.head -> List[Int]()) {
// Where the accumulated list is increased by the
// lesser of the current element and the accumulated
// element, and the accumulated element is the maximum between them
case ((max, ys), x) =>
if (x > max) (x, max :: ys)
else (max, x :: ys)
// and of which we return only the accumulated list
}._2
}
Now, let's discuss the main issue. Is this code slower than the Java one? Most certainly! Is the Java code slower than a C equivalent? You can bet it is, JIT or no JIT. And if you write it directly in assembler, you can make it even faster!
现在,让我们讨论主要问题。这段代码比 Java 慢吗?最肯定的!Java 代码比 C 代码慢吗?您可以打赌它是 JIT 或不 JIT。如果你直接在汇编程序中编写它,你可以使它更快!
But the cost of that speed is that you get more bugs, you spend more time trying to understand the code to debug it, and you have less visibility of what the overall program is doing as opposed to what a little piece of code is doing -- which might result in performance problems of its own.
但是这种速度的代价是你会遇到更多的错误,你花更多的时间试图理解代码来调试它,并且与一小段代码在做什么相比,你对整个程序正在做什么的可见性更少—— - 这可能会导致其自身的性能问题。
So my answer is simple: if you think the speed penalty of programming in Scala is not worth the gains it brings, you should program in assembler. If you think I'm being radical, then I counter that you just chose the familiar as being the "ideal" trade off.
所以我的回答很简单:如果你认为用 Scala 编程带来的速度损失不值得它带来的收益,你应该用汇编程序编程。如果你认为我是激进的,那么我反驳说你只是选择了熟悉的作为“理想”的权衡。
Do I think performance doesn't matter? Not at all! I think one of the main advantages of Scala is leveraging gains often found in dynamically typed languages with the performance of a statically typed language! Performance matters, algorithm complexity matters a lot, and constant costs matters too.
我认为性能不重要吗?一点也不!我认为 Scala 的主要优势之一是利用动态类型语言中常见的收益和静态类型语言的性能!性能很重要,算法复杂性很重要,而恒定成本也很重要。
But, whenever there is a choicebetween performance and readability and maintainability, the latter is preferable. Sure, if performance mustbe improved, then there isn't a choice: you have to sacrifice something to it. And if there's no lost in readability/maintainability -- such as Scala vs dynamically typed languages -- sure, go for performance.
但是,无论何时在性能、可读性和可维护性之间进行选择时,后者都是可取的。当然,如果必须提高性能,那么别无选择:您必须为此牺牲一些东西。如果在可读性/可维护性方面没有损失——比如 Scala 与动态类型语言——当然,去追求性能。
Lastly, to gain performance out of functional programming you have to know functional algorithms and data structures. Sure, 99% of Java programmers with 5-10 years experience will beat the performance of 99% of Scala programmers with 6 months experience. The same was true for imperative programming vs object oriented programming a couple of decades ago, and history shows it didn't matter.
最后,要从函数式编程中获得性能,您必须了解函数式算法和数据结构。当然,99% 拥有 5-10 年经验的 Java 程序员的表现会超过 99% 拥有 6 个月经验的 Scala 程序员的表现。几十年前,命令式编程与面向对象编程也是如此,历史表明这并不重要。
EDIT
编辑
As a side note, your "fast" algorithm suffer from a serious problem: you use ArrayBuffer. That collection does not have constant time append, and has linear time toList. If you use ListBufferinstead, you get constant time append andtoList.
附带说明一下,您的“快速”算法存在一个严重问题:您使用ArrayBuffer. 该集合没有固定时间追加,并且具有线性时间toList。如果您ListBuffer改为使用,则会得到恒定时间 append和toList。
回答by Kipton Barros
For reference, here's how splitAtis defined in TraversableLikein the Scala standard library,
作为参考,这里是如何在 Scala 标准库splitAt中的TraversableLike中定义的,
def splitAt(n: Int): (Repr, Repr) = {
val l, r = newBuilder
l.sizeHintBounded(n, this)
if (n >= 0) r.sizeHint(this, -n)
var i = 0
for (x <- this) {
(if (i < n) l else r) += x
i += 1
}
(l.result, r.result)
}
It's not unlike your example code of what a Java programmer might come up with.
它与 Java 程序员可能想出的示例代码没有什么不同。
I like Scala because, where performance matters, mutability is a reasonable way to go. The collections library is a great example; especially how it hides this mutability behind a functional interface.
我喜欢 Scala,因为在性能很重要的地方,可变性是一种合理的方法。收藏库就是一个很好的例子;特别是它如何将这种可变性隐藏在功能接口后面。
Where performance isn't as important, such as some application code, the higher order functions in Scala's library allow great expressivity and programmer efficiency.
在性能不那么重要的地方,比如一些应用程序代码,Scala 库中的高阶函数允许很好的表达能力和程序员的效率。
Out of curiosity, I picked an arbitrary large file in the Scala compiler (scala.tools.nsc.typechecker.Typers.scala) and counted something like 37 for loops, 11 while loops, 6 concatenations (++), and 1 fold (it happens to be a foldRight).
出于好奇,我在 Scala 编译器 ( scala.tools.nsc.typechecker.Typers.scala) 中选择了一个任意大文件,并计算了 37 个 for 循环、11 个 while 循环、6 个串联 ( ++) 和 1 个折叠(它发生成为一个foldRight)。
回答by soc
What about this?
那这个呢?
def removeMax(xs: List[Int]) = {
val buf = xs.toBuffer
buf -= (buf.max)
}
A bit more ugly, but faster:
有点丑,但速度更快:
def removeMax(xs: List[Int]) = {
var max = xs.head
for ( x <- xs.tail )
yield {
if (x > max) { val result = max; max = x; result}
else x
}
}
回答by Bruno Reis
Try this:
试试这个:
(myList.foldLeft((List[Int](), None: Option[Int]))) {
case ((_, None), x) => (List(), Some(x))
case ((Nil, Some(m), x) => (List(Math.min(x, m)), Some(Math.max(x, m))
case ((l, Some(m), x) => (Math.min(x, m) :: l, Some(Math.max(x, m))
})._1
Idiomatic, functional, traverses only once. Maybe somewhat cryptic if you are not used to functional-programming idioms.
惯用的、函数式的,只遍历一次。如果您不习惯函数式编程习语,可能有点神秘。
Let's try to explain what is happening here. I will try to make it as simple as possible, lacking some rigor.
让我们试着解释一下这里发生了什么。我会尽量使它简单,缺乏一些严谨。
A foldis an operation on a List[A](that is, a list that contains elements of type A) that will take an initial state s0: S(that is, an instance of a type S) and a function f: (S, A) => S(that is, a function that takes the current state and an element from the list, and gives the next state, ie, it updates the state according to the next element).
阿倍是上的操作的List[A](即,包含类型的元素的列表A),将采取的初始状态s0: S(即,一种类型的一个实例S)和一个函数f: (S, A) => S(即,接受当前状态的函数,并且列表中的一个元素,并给出下一个状态,即根据下一个元素更新状态)。
The operation will then iterate over the elements of the list, using each one to update the state according to the given function. In Java, it would be something like:
然后该操作将迭代列表的元素,使用每个元素根据给定的函数更新状态。在 Java 中,它将类似于:
interface Function<T, R> { R apply(T t); }
class Pair<A, B> { ... }
<State> State fold(List<A> list, State s0, Function<Pair<A, State>, State> f) {
State s = s0;
for (A a: list) {
s = f.apply(new Pair<A, State>(a, s));
}
return s;
}
For example, if you want to add all the elements of a List[Int], the state would be the partial sum, that would have to be initialized to 0, and the new state produced by a function would simply add the current state to the current element being processed:
例如,如果要添加 a 的所有元素List[Int],则状态将是部分总和,必须将其初始化为 0,并且函数生成的新状态将简单地将当前状态添加到当前元素处理:
myList.fold(0)((partialSum, element) => partialSum + element)
Try to write a fold to multiply the elements of a list, then another one to find extreme values (max, min).
尝试编写一个折叠以将列表的元素相乘,然后再编写一个以查找极值(最大值,最小值)。
Now, the fold presented above is a bit more complex, since the state is composed of the new list being created along with the maximum element found so far. The function that updates the state is more or less straightforward once you grasp these concepts. It simply puts into the new list the minimum between the current maximum and the current element, while the other value goes to the current maximum of the updated state.
现在,上面呈现的折叠有点复杂,因为状态由创建的新列表以及迄今为止找到的最大元素组成。一旦你掌握了这些概念,更新状态的函数或多或少是直接的。它只是将当前最大值和当前元素之间的最小值放入新列表中,而另一个值则变为更新状态的当前最大值。
What is a bit more complex than to understand this (if you have no FP background) is to come up with this solution. However, this is only to show you that it exists, can be done. It's just a completely different mindset.
比理解这一点(如果您没有 FP 背景)更复杂的是提出这个解决方案。然而,这只是为了向你表明它存在,可以做到。这只是一种完全不同的心态。
EDIT: As you see, the first and second casein the solution I proposed are used to setupthe fold. It is equivalent to what you see in other answers when they do xs.tail.fold((xs.head, ...)) {...}. Note that the solutions proposed until now using xs.tail/xs.headdon't cover the case in which xsis List(), and will throw an exception. The solution above will return List()instead. Since you didn't specify the behavior of the function on empty lists, both are valid.
编辑:如您所见,case我提出的解决方案中的第一个和第二个用于设置折叠。它相当于您在其他答案中看到的内容xs.tail.fold((xs.head, ...)) {...}。请注意,到目前为止所提出的解决方案 usingxs.tail/xs.head不涵盖xsis的情况List(),并且会抛出异常。上面的解决方案将返回List()。由于您没有在空列表上指定函数的行为,因此两者都是有效的。
回答by Ed Staub
Another contender. This uses a ListBuffer, like Daniel's second offering, but shares the post-max tail of the original list, avoiding copying it.
另一个竞争者。这使用了一个 ListBuffer,就像 Daniel 的第二个产品一样,但共享原始列表的 post-max 尾部,避免复制它。
def shareTail(xs: List[Int]): List[Int] = {
var res = ListBuffer[Int]()
var maxTail = xs
var first = true;
var x = xs
while ( x != Nil ) {
if (x.head > maxTail.head) {
while (!(maxTail.head == x.head)) {
res += maxTail.head
maxTail = maxTail.tail
}
}
x = x.tail
}
res.prependToList(maxTail.tail)
}
回答by guilhebl
Another option would be:
另一种选择是:
package code.array
object SliceArrays {
def main(args: Array[String]): Unit = {
println(removeMaxCool(Vector(1,2,3,100,12,23,44)))
}
def removeMaxCool(xs: Vector[Int]) = xs.filter(_ < xs.max)
}
Using Vector instead of List, the reason is that Vector is more versatile and has a better general performance and time complexity if compared to List.
使用 Vector 而不是 List,原因是 Vector 更通用,并且与 List 相比具有更好的通用性能和时间复杂度。
Consider the following collections operations: head, tail, apply, update, prepend, append
考虑以下集合操作: head、tail、apply、update、prepend、append
Vector takes an amortized constant time for all operations, as per Scala docs: "The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys"
根据 Scala 文档,向量对所有操作都需要一个摊销常数时间:“该操作需要有效的常数时间,但这可能取决于一些假设,例如向量的最大长度或散列键的分布”
While List takes constant time only for head, tail and prepend operations.
而 List 仅在 head、tail 和 prepend 操作中花费恒定时间。
Using
使用
scalac -print
scalac 打印
generates:
产生:
package code.array {
object SliceArrays extends Object {
def main(args: Array[String]): Unit = scala.Predef.println(SliceArrays.this.removeMaxCool(scala.`package`.Vector().apply(scala.Predef.wrapIntArray(Array[Int]{1, 2, 3, 100, 12, 23, 44})).$asInstanceOf[scala.collection.immutable.Vector]()));
def removeMaxCool(xs: scala.collection.immutable.Vector): scala.collection.immutable.Vector = xs.filter({
((x: Int) => SliceArrays.this.$anonfun$removeMaxCool(xs, x))
}).$asInstanceOf[scala.collection.immutable.Vector]();
final <artifact> private[this] def $anonfun$removeMaxCool(xs: scala.collection.immutable.Vector, x: Int): Boolean = x.<(scala.Int.unbox(xs.max(scala.math.Ordering$Int)));
def <init>(): code.array.SliceArrays.type = {
SliceArrays.super.<init>();
()
}
}
}

