用 C、Clojure、Python、Ruby、Scala 等解释基准
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11641098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Interpreting a benchmark in C, Clojure, Python, Ruby, Scala and others
提问by defhlt
Disclaimer
免责声明
I know that artificial benchmarks are evil. They can show results only for very specific narrow situation. I don't assume that one language is better than the other because of the some stupid bench. However I wonder why results is so different. Please see my questions at the bottom.
我知道人工基准是邪恶的。它们只能显示非常具体的狭窄情况的结果。我不认为一种语言比另一种更好,因为一些愚蠢的板凳。但是我想知道为什么结果如此不同。请在底部查看我的问题。
Math benchmark description
数学基准描述
Benchmark is simple math calculations to find pairs of prime numbers which differs by 6 (so called sexy primes)
E.g. sexy primes below 100 would be: (5 11) (7 13) (11 17) (13 19) (17 23) (23 29) (31 37) (37 43) (41 47) (47 53) (53 59) (61 67) (67 73) (73 79) (83 89) (97 103)
基准是简单的数学计算,以找到相差 6 的素数对(所谓的性感素数),例如,低于 100 的性感素数将是:(5 11) (7 13) (11 17) (13 19) (17 23) (23 29) (31 37) (37 43) (41 47) (47 53) (53 59) (61 67) (67 73) (73 79) (83 89) (97 103)
Results table
结果表
In table: calculation time in secondsRunning: all except Factor was running in VirtualBox (Debian unstable amd64 guest, Windows 7 x64 host) CPU: AMD A4-3305M
在表中:以秒为单位的计算时间运行:除 Factor 之外的所有运行在 VirtualBox(Debian 不稳定 amd64 来宾,Windows 7 x64 主机) CPU:AMD A4-3305M
Sexy primes up to: 10k 20k 30k 100k
Bash 58.00 200.00 [*1] [*1]
C 0.20 0.65 1.42 15.00
Clojure1.4 4.12 8.32 16.00 137.93
Clojure1.4 (optimized) 0.95 1.82 2.30 16.00
Factor n/a n/a 15.00 180.00
Python2.7 1.49 5.20 11.00 119
Ruby1.8 5.10 18.32 40.48 377.00
Ruby1.9.3 1.36 5.73 10.48 106.00
Scala2.9.2 0.93 1.41 2.73 20.84
Scala2.9.2 (optimized) 0.32 0.79 1.46 12.01
[*1] - I'm afraid to imagine how much time will it take
[*1] - 我不敢想象需要多少时间
Code listings
代码清单
C:
C:
int isprime(int x) {
int i;
for (i = 2; i < x; ++i)
if (x%i == 0) return 0;
return 1;
}
void findprimes(int m) {
int i;
for ( i = 11; i < m; ++i)
if (isprime(i) && isprime(i-6))
printf("%d %d\n", i-6, i);
}
main() {
findprimes(10*1000);
}
Ruby:
红宝石:
def is_prime?(n)
(2...n).all?{|m| n%m != 0 }
end
def sexy_primes(x)
(9..x).map do |i|
[i-6, i]
end.select do |j|
j.all?{|j| is_prime? j}
end
end
a = Time.now
p sexy_primes(10*1000)
b = Time.now
puts "#{(b-a)*1000} mils"
Scala:
斯卡拉:
def isPrime(n: Int) =
(2 until n) forall { n % _ != 0 }
def sexyPrimes(n: Int) =
(11 to n) map { i => List(i-6, i) } filter { _ forall(isPrime(_)) }
val a = System.currentTimeMillis()
println(sexyPrimes(100*1000))
val b = System.currentTimeMillis()
println((b-a).toString + " mils")
Scala opimized isPrime(the same idea like in Clojure optimization):
Scala 优化isPrime(与 Clojure 优化中的想法相同):
import scala.annotation.tailrec
@tailrec // Not required, but will warn if optimization doesn't work
def isPrime(n: Int, i: Int = 2): Boolean =
if (i == n) true
else if (n % i != 0) isPrime(n, i + 1)
else false
Clojure:
Clojure:
(defn is-prime? [n]
(every? #(> (mod n %) 0)
(range 2 n)))
(defn sexy-primes [m]
(for [x (range 11 (inc m))
:let [z (list (- x 6) x)]
:when (every? #(is-prime? %) z)]
z))
(let [a (System/currentTimeMillis)]
(println (sexy-primes (* 10 1000)))
(let [b (System/currentTimeMillis)]
(println (- b a) "mils")))
Clojure optimized is-prime?:
Clojure 优化is-prime?:
(defn ^:static is-prime? [^long n]
(loop [i (long 2)]
(if (= (rem n i) 0)
false
(if (>= (inc i) n) true (recur (inc i))))))
Python
Python
import time as time_
def is_prime(n):
return all((n%j > 0) for j in xrange(2, n))
def primes_below(x):
return [[j-6, j] for j in xrange(9, x+1) if is_prime(j) and is_prime(j-6)]
a = int(round(time_.time() * 1000))
print(primes_below(10*1000))
b = int(round(time_.time() * 1000))
print(str((b-a)) + " mils")
Factor
因素
MEMO:: prime? ( n -- ? )
n 1 - 2 [a,b] [ n swap mod 0 > ] all? ;
MEMO: sexyprimes ( n n -- r r )
[a,b] [ prime? ] filter [ 6 + ] map [ prime? ] filter dup [ 6 - ] map ;
5 10 1000 * sexyprimes . .
Bash(zsh):
重击(zsh):
#!/usr/bin/zsh
function prime {
for (( i = 2; i < ; i++ )); do
if [[ $[%i] == 0 ]]; then
echo 1
exit
fi
done
echo 0
}
function sexy-primes {
for (( i = 9; i <= ; i++ )); do
j=$[i-6]
if [[ $(prime $i) == 0 && $(prime $j) == 0 ]]; then
echo $j $i
fi
done
}
sexy-primes 10000
Questions
问题
- Why Scala is so fast? Is it because of static typing? Or it is just using JVM very efficiently?
Why such a huge difference between Ruby and Python? I thought these two are not somewhat totally different. Maybe my code is wrong. Please enlighten me! Thanks.UPDYes, that was error in my code. Python and Ruby 1.9 are pretty equal.- Really impressive jump in productivity between Ruby versions.
- Can I optimize Clojure code by adding type declarations? Will it help?
- 为什么Scala这么快?是因为静态类型吗?或者它只是非常有效地使用JVM?
为什么 Ruby 和 Python 之间有如此巨大的差异?我认为这两者并没有完全不同。也许我的代码是错误的。请赐教!谢谢。UPD是的,这是我的代码中的错误。Python 和 Ruby 1.9 相当。- Ruby 版本之间的生产力提升确实令人印象深刻。
- 我可以通过添加类型声明来优化 Clojure 代码吗?会有帮助吗?
采纳答案by mikera
Rough answers:
粗略回答:
- Scala's static typing is helping it quite a bit here - this means that it uses the JVM pretty efficiently without too much extra effort.
- I'm not exactly sure on the Ruby/Python difference, but I suspect that
(2...n).all?in the functionis-prime?is likely to be quite well optimised in Ruby (EDIT: sounds like this is indeed the case, see Julian's answer for more detail...) - Ruby 1.9.3 is just much better optimised
- Clojure code can certainly be accelerated a lot! While Clojure is dynamic by default, you can use type hints, primitive maths etc. to get close to Scala / pure Java speed in many cases when you need to.
- Scala 的静态类型在这里有很大帮助 - 这意味着它可以非常有效地使用 JVM,而无需付出太多额外的努力。
- 我不太确定 Ruby/Python 的区别,但我怀疑
(2...n).all?函数is-prime?在 Ruby 中可能会得到很好的优化(编辑:听起来确实如此,有关更多详细信息,请参阅 Julian 的回答......) - Ruby 1.9.3 优化得更好
- Clojure 代码当然可以加速很多!虽然 Clojure 默认情况下是动态的,但在许多情况下,您可以使用类型提示、原始数学等来接近 Scala/纯 Java 的速度。
Most important optimisation in the Clojure code would be to use typed primitive maths within is-prime?, something like:
Clojure 代码中最重要的优化是在 中使用类型化的原始数学is-prime?,例如:
(set! *unchecked-math* true) ;; at top of file to avoid using BigIntegers
(defn ^:static is-prime? [^long n]
(loop [i (long 2)]
(if (zero? (mod n i))
false
(if (>= (inc i) n) true (recur (inc i))))))
With this improvement, I get Clojure completing 10k in 0.635 secs (i.e. the second fastest on your list, beating Scala)
通过这项改进,我让 Clojure 在 0.635 秒内完成了 10k(即你列表中第二快的,击败了 Scala)
P.S.note that you have printing code inside your benchmark in some cases - not a good idea as it will distort the results, especially if using a function like printfor the first time causes initialisation of IO subsystems or something like that!
PS请注意,在某些情况下,您在基准测试中打印代码 - 不是一个好主意,因为它会扭曲结果,特别是如果print第一次使用类似的函数会导致 IO 子系统的初始化或类似的东西!
回答by Justin Kramer
Here's a fast Clojure version, using the same basic algorithms:
这是一个快速的 Clojure 版本,使用相同的基本算法:
(set! *unchecked-math* true)
(defn is-prime? [^long n]
(loop [i 2]
(if (zero? (unchecked-remainder-int n i))
false
(if (>= (inc i) n)
true
(recur (inc i))))))
(defn sexy-primes [m]
(for [x (range 11 (inc m))
:when (and (is-prime? x) (is-prime? (- x 6)))]
[(- x 6) x]))
It runs about 20x faster than your original on my machine. And here's a version that leverages the new reducers library in 1.5 (requires Java 7 or JSR 166):
在我的机器上,它的运行速度比原来的快 20 倍。这是一个利用 1.5 中新的 reducers 库的版本(需要 Java 7 或 JSR 166):
(require '[clojure.core.reducers :as r]) ;'
(defn sexy-primes [m]
(->> (vec (range 11 (inc m)))
(r/filter #(and (is-prime? %) (is-prime? (- % 6))))
(r/map #(list (- % 6) %))
(r/fold (fn ([] []) ([a b] (into a b))) conj)))
This runs about 40x faster than your original. On my machine, that's 100k in 1.5 seconds.
这比原来的运行速度快约 40 倍。在我的机器上,这是 1.5 秒内 100k。
回答by Julian
I'll answer just #2, since it's the only one I've got anything remotely intelligent to say, but for your Python code, you're creating an intermediate list in is_prime, whereas you're using .mapin your allin Ruby which is just iterating.
我只回答 #2,因为它是我唯一可以说的远程智能,但是对于您的 Python 代码,您正在 中创建一个中间列表is_prime,而.map您all在 Ruby 中使用它只是迭代。
If you change your is_primeto:
如果您更改is_prime为:
def is_prime(n):
return all((n%j > 0) for j in range(2, n))
they're on par.
他们是相提并论的。
I could optimize the Python further, but my Ruby isn't good enough to know when I've given more of an advantage (e.g., using xrangemakes Python win on my machine, but I don't remember if the Ruby range you used creates an entire range in memory or not).
我可以进一步优化 Python,但我的 Ruby 不够好,无法知道何时我提供了更多优势(例如,使用xrange使 Python 在我的机器上获胜,但我不记得您使用的 Ruby 范围是否创建了内存中的整个范围与否)。
EDIT:Without being too silly, making the Python code look like:
编辑:不要太傻,让 Python 代码看起来像:
import time
def is_prime(n):
return all(n % j for j in xrange(2, n))
def primes_below(x):
return [(j-6, j) for j in xrange(9, x + 1) if is_prime(j) and is_prime(j-6)]
a = int(round(time.time() * 1000))
print(primes_below(10*1000))
b = int(round(time.time() * 1000))
print(str((b-a)) + " mils")
which doesn't change much more, puts it at 1.5s for me, and, with being extra silly, running it with PyPy puts it at .3s for 10K, and 21s for 100K.
这并没有太大变化,对我来说是 1.5 秒,而且,由于更加愚蠢,使用 PyPy 运行它的 10K 是 0.3 秒,100K 是 21 秒。
回答by Luigi Plinge
You can make the Scala a lot faster by modifying your isPrimemethod to
您可以通过修改您的isPrime方法来使 Scala 更快
def isPrime(n: Int, i: Int = 2): Boolean =
if (i == n) true
else if (n % i != 0) isPrime(n, i + 1)
else false
Not quite as concise but the program runs in 40% of the time!
不那么简洁,但程序运行的时间为 40%!
We cut out the superfluous Rangeand anonymous Functionobjects, the Scala compiler recognizes the tail-recursion and turns it into a while-loop, which the JVM can turn into more or less optimal machine code, so it shouldn't be too far off the C version.
我们去掉多余的Range匿名Function对象,Scala编译器识别尾递归并将其转化为while循环,JVM可以将其转化为或多或少最优的机器码,所以它应该不会离C太远版本。
See also: How to optimize for-comprehensions and loops in Scala?
回答by Eastsun
Here is my scala version in both parallel and no-parallel, just for fun: (In my dual core compute, the parallel version takes 335ms while the no-parallel version takes 655ms)
这是我的并行和非并行 Scala 版本,只是为了好玩:(在我的双核计算中,并行版本需要 335 毫秒,而非并行版本需要 655 毫秒)
object SexyPrimes {
def isPrime(n: Int): Boolean =
(2 to math.sqrt(n).toInt).forall{ n%_ != 0 }
def isSexyPrime(n: Int): Boolean = isPrime(n) && isPrime(n-6)
def findPrimesPar(n: Int) {
for(k <- (11 to n).par)
if(isSexyPrime(k)) printf("%d %d\n",k-6,k)
}
def findPrimes(n: Int) {
for(k <- 11 to n)
if(isSexyPrime(k)) printf("%d %d\n",k-6,k)
}
def timeOf(call : =>Unit) {
val start = System.currentTimeMillis
call
val end = System.currentTimeMillis
println((end-start)+" mils")
}
def main(args: Array[String]) {
timeOf(findPrimes(100*1000))
println("------------------------")
timeOf(findPrimesPar(100*1000))
}
}
EDIT: According to Emil H's suggestion, I have changed my code to avoid the effects of IO and jvm warmup:
编辑:根据Emil H的建议,我更改了代码以避免 IO 和 jvm 预热的影响:
The result shows in my compute:
结果显示在我的计算中:
List(3432, 1934, 3261, 1716, 3229, 1654, 3214, 1700)
列表(3432, 1934, 3261, 1716, 3229, 1654, 3214, 1700)
object SexyPrimes {
def isPrime(n: Int): Boolean =
(2 to math.sqrt(n).toInt).forall{ n%_ != 0 }
def isSexyPrime(n: Int): Boolean = isPrime(n) && isPrime(n-6)
def findPrimesPar(n: Int) {
for(k <- (11 to n).par)
if(isSexyPrime(k)) ()//printf("%d %d\n",k-6,k)
}
def findPrimes(n: Int) {
for(k <- 11 to n)
if(isSexyPrime(k)) ()//printf("%d %d\n",k-6,k)
}
def timeOf(call : =>Unit): Long = {
val start = System.currentTimeMillis
call
val end = System.currentTimeMillis
end - start
}
def main(args: Array[String]) {
val xs = timeOf(findPrimes(1000*1000))::timeOf(findPrimesPar(1000*1000))::
timeOf(findPrimes(1000*1000))::timeOf(findPrimesPar(1000*1000))::
timeOf(findPrimes(1000*1000))::timeOf(findPrimesPar(1000*1000))::
timeOf(findPrimes(1000*1000))::timeOf(findPrimesPar(1000*1000))::Nil
println(xs)
}
}
回答by mgilson
Don't forget Fortran! (Mostly joking, but I would expect similar performance to C). The statements with exclamation points are optional, but good style. (!is a comment character in fortran 90)
不要忘记Fortran!(主要是开玩笑,但我希望性能与 C 相似)。带有感叹号的语句是可选的,但风格很好。(!是fortran 90中的注释字符)
logical function isprime(n)
IMPLICIT NONE !
integer :: n,i
do i=2,n
if(mod(n,i).eq.0)) return .false.
enddo
return .true.
end
subroutine findprimes(m)
IMPLICIT NONE !
integer :: m,i
logical, external :: isprime
do i=11,m
if(isprime(i) .and. isprime(i-6))then
write(*,*) i-6,i
endif
enddo
end
program main
findprimes(10*1000)
end
回答by steveha
Never mind the benchmarks; the problem got me interested and I made some fast tweaks. This uses the lru_cachedecorator, which memoizes a function; so when we call is_prime(i-6)we basically get that prime check for free. This change cuts the work roughly in half. Also, we can make the range()calls step through just the odd numbers, cutting the work roughly in half again.
别介意基准;这个问题引起了我的兴趣,我做了一些快速的调整。这使用了lru_cache装饰器,它记住了一个函数;所以当我们打电话时,is_prime(i-6)我们基本上可以免费得到那张主要支票。此更改将工作大致减半。此外,我们可以让range()调用仅通过奇数,再次将工作大致减半。
http://en.wikipedia.org/wiki/Memoization
http://en.wikipedia.org/wiki/Memoization
http://docs.python.org/dev/library/functools.html
http://docs.python.org/dev/library/functools.html
This requires Python 3.2 or newer to get lru_cache, but could work with an older Python if you install a Python recipe that provides lru_cache. If you are using Python 2.x you should really use xrange()instead of range().
这需要 Python 3.2 或更高版本才能获取lru_cache,但如果您安装提供lru_cache. 如果您使用的是 Python 2.x,您应该真正使用xrange()而不是range().
http://code.activestate.com/recipes/577479-simple-caching-decorator/
http://code.activestate.com/recipes/577479-simple-caching-decorator/
from functools import lru_cache
import time as time_
@lru_cache()
def is_prime(n):
return n%2 and all(n%i for i in range(3, n, 2))
def primes_below(x):
return [(i-6, i) for i in range(9, x+1, 2) if is_prime(i) and is_prime(i-6)]
correct100 = [(5, 11), (7, 13), (11, 17), (13, 19), (17, 23), (23, 29),
(31, 37), (37, 43), (41, 47), (47, 53), (53, 59), (61, 67), (67, 73),
(73, 79), (83, 89)]
assert(primes_below(100) == correct100)
a = time_.time()
print(primes_below(30*1000))
b = time_.time()
elapsed = b - a
print("{} msec".format(round(elapsed * 1000)))
The above took only a very short time to edit. I decided to take it one step further, and make the primes test only try prime divisors, and only up to the square root of the number being tested. The way I did it only works if you check numbers in order, so it can accumulate all the primes as it goes; but this problem was already checking the numbers in order so that was fine.
上面的编辑只用了很短的时间。我决定更进一步,让素数测试只尝试素数除数,并且只测试被测试数的平方根。我这样做的方法只有在您按顺序检查数字时才有效,因此它可以累积所有的质数;但是这个问题已经在按顺序检查数字了,所以没问题。
On my laptop (nothing special; processor is a 1.5 GHz AMD Turion II "K625") this version produced an answer for 100K in under 8 seconds.
在我的笔记本电脑上(没什么特别的;处理器是 1.5 GHz AMD Turion II“K625”)这个版本在 8 秒内产生了 100K 的答案。
from functools import lru_cache
import math
import time as time_
known_primes = set([2, 3, 5, 7])
@lru_cache(maxsize=128)
def is_prime(n):
last = math.ceil(math.sqrt(n))
flag = n%2 and all(n%x for x in known_primes if x <= last)
if flag:
known_primes.add(n)
return flag
def primes_below(x):
return [(i-6, i) for i in range(9, x+1, 2) if is_prime(i) and is_prime(i-6)]
correct100 = [(5, 11), (7, 13), (11, 17), (13, 19), (17, 23), (23, 29),
(31, 37), (37, 43), (41, 47), (47, 53), (53, 59), (61, 67), (67, 73),
(73, 79), (83, 89)]
assert(primes_below(100) == correct100)
a = time_.time()
print(primes_below(100*1000))
b = time_.time()
elapsed = b - a
print("{} msec".format(round(elapsed * 1000)))
The above code is pretty easy to write in Python, Ruby, etc. but would be more of a pain in C.
上面的代码很容易用 Python、Ruby 等编写,但在 C 中会更痛苦。
You can't compare the numbers on this version against the numbers from the other versions without rewriting the others to use similar tricks. I'm not trying to prove anything here; I just thought the problem was fun and I wanted to see what sort of easy performance improvements I could glean.
如果不重写其他版本以使用类似的技巧,您就无法将这个版本的数字与其他版本的数字进行比较。我不想在这里证明任何事情;我只是觉得这个问题很有趣,我想看看我可以收集哪些简单的性能改进。
回答by x4u
I couldn't resist to do a few of the most obvious optimizations for the C version which made the 100k test now take 0.3s on my machine (5 times faster than the C version in the question, both compiled with MSVC 2010 /Ox).
我忍不住对 C 版本进行了一些最明显的优化,这使得 100k 测试现在在我的机器上花费 0.3 秒(比问题中的 C 版本快 5 倍,均使用 MSVC 2010 / Ox 编译) .
int isprime( int x )
{
int i, n;
for( i = 3, n = x >> 1; i <= n; i += 2 )
if( x % i == 0 )
return 0;
return 1;
}
void findprimes( int m )
{
int i, s = 3; // s is bitmask of primes in last 3 odd numbers
for( i = 11; i < m; i += 2, s >>= 1 ) {
if( isprime( i ) ) {
if( s & 1 )
printf( "%d %d\n", i - 6, i );
s |= 1 << 3;
}
}
}
main() {
findprimes( 10 * 1000 );
}
Here is the identical implemention in Java:
这是 Java 中的相同实现:
public class prime
{
private static boolean isprime( final int x )
{
for( int i = 3, n = x >> 1; i <= n; i += 2 )
if( x % i == 0 )
return false;
return true;
}
private static void findprimes( final int m )
{
int s = 3; // s is bitmask of primes in last 3 odd numbers
for( int i = 11; i < m; i += 2, s >>= 1 ) {
if( isprime( i ) ) {
if( ( s & 1 ) != 0 )
print( i );
s |= 1 << 3;
}
}
}
private static void print( int i )
{
System.out.println( ( i - 6 ) + " " + i );
}
public static void main( String[] args )
{
// findprimes( 300 * 1000 ); // for some JIT training
long time = System.nanoTime();
findprimes( 10 * 1000 );
time = System.nanoTime() - time;
System.err.println( "time: " + ( time / 10000 ) / 100.0 + "ms" );
}
}
With Java 1.7.0_04 this runs almost exactly as fast as the C version. Client or server VM doesn't show much difference, except that JIT training seems to help the server VM a bit (~3%) while it has almost no effect with the client VM. The output in Java seems to be slower than in C. If the output is replaced with a static counter in both versions, the Java version runs a little faster than the C version.
在 Java 1.7.0_04 中,它的运行速度几乎与 C 版本一样快。客户端或服务器 VM 没有显示出太大区别,只是 JIT 培训似乎对服务器 VM 有一点帮助(~3%),而它对客户端 VM 几乎没有影响。Java 中的输出似乎比 C 中的要慢。如果在两个版本中都将输出替换为静态计数器,则 Java 版本的运行速度会比 C 版本快一些。
These are my times for the 100k run:
这些是我的 100k 跑步时间:
- 319ms C compiled with /Ox and output to >NIL:
- 312ms C compiled with /Ox and static counter
- 324ms Java client VM with output to >NIL:
- 299ms Java client VM with static counter
- 319ms C 用 /Ox 编译并输出到 >NIL:
- 使用 /Ox 和静态计数器编译的 312ms C
- 324 毫秒 Java 客户端 VM,输出到 >NIL:
- 具有静态计数器的 299 毫秒 Java 客户端 VM
and the 1M run (16386 results):
和 1M 运行(16386 个结果):
- 24.95s C compiled with /Ox and static counter
- 25.08s Java client VM with static counter
- 24.86s Java server VM with static counter
- 使用 /Ox 和静态计数器编译的 24.95 秒 C
- 具有静态计数器的 25.08 秒 Java 客户端 VM
- 具有静态计数器的 24.86 秒 Java 服务器 VM
While this does not really answer your questions, it shows that small tweaks can have a noteworthy impact on performance. So to be able to really compare languages you should try to avoid all algorithmic differences as much as possible.
虽然这并不能真正回答您的问题,但它表明小调整可以对性能产生显着影响。因此,为了能够真正比较语言,您应该尽量避免所有算法差异。
It also gives a hint why Scala seems rather fast. It runs on the Java VM and thus benefits from its impressive performance.
它还暗示了为什么 Scala 看起来相当快。它在 Java VM 上运行,因此受益于其令人印象深刻的性能。
回答by Tomas Lazaro
In Scala try using Tuple2 instead of List, it should go faster. Just remove the word 'List' since (x, y) is a Tuple2.
在 Scala 中尝试使用 Tuple2 而不是 List,它应该会更快。因为 (x, y) 是一个 Tuple2,所以删除“List”这个词。
Tuple2 is specialized for Int, Long and Double meaning it won't have to box/unbox those raw datatypes. Tuple2 source. List isn't specialized. List source.
Tuple2 专门用于 Int、Long 和 Double,这意味着它不必装箱/拆箱这些原始数据类型。Tuple2 源。列表不是专门的。列出来源。
回答by Sebastián Grignoli
Here's the code for the Go (golang.org) version:
这是 Go (golang.org) 版本的代码:
package main
import (
"fmt"
)
func main(){
findprimes(10*1000)
}
func isprime(x int) bool {
for i := 2; i < x; i++ {
if x%i == 0 {
return false
}
}
return true
}
func findprimes(m int){
for i := 11; i < m; i++ {
if isprime(i) && isprime(i-6) {
fmt.Printf("%d %d\n", i-6, i)
}
}
}
It ran just as fast as the C version.
它的运行速度与 C 版本一样快。
Using an Asus u81aIntel Core 2 Duo T6500 2.1GHz, 2MB L2 cache, 800MHz FSB. 4GB RAM
使用 Asus u81aIntel Core 2 Duo T6500 2.1GHz,2MB L2 缓存,800MHz FSB。4GB 内存
The 100k version: C: 2.723sGo: 2.743s
100k版本: C: 2.723sGo: 2.743s
With 1000000 (1M instead of 100K): C: 3m35.458sGo: 3m36.259s
使用 1000000(1M 而不是 100K): C: 3m35.458sGo: 3m36.259s
But I think that it would be fair to use Go's built in multithreading capabilities and compare that version with the regular C version (without multithreading), just because it's almost too easy to do multithreading with Go.
但我认为使用 Go 的内置多线程功能并将该版本与常规 C 版本(没有多线程)进行比较是公平的,因为用 Go 进行多线程几乎太容易了。
Update: I did a parallel version using Goroutines in Go:
更新:我在 Go 中使用 Goroutines 做了一个并行版本:
package main
import (
"fmt"
"runtime"
)
func main(){
runtime.GOMAXPROCS(4)
printer := make(chan string)
printer2 := make(chan string)
printer3 := make(chan string)
printer4 := make(chan string)
finished := make(chan int)
var buffer, buffer2, buffer3 string
running := 4
go findprimes(11, 30000, printer, finished)
go findprimes(30001, 60000, printer2, finished)
go findprimes(60001, 85000, printer3, finished)
go findprimes(85001, 100000, printer4, finished)
for {
select {
case i := <-printer:
// batch of sexy primes received from printer channel 1, print them
fmt.Printf(i)
case i := <-printer2:
// sexy prime list received from channel, store it
buffer = i
case i := <-printer3:
// sexy prime list received from channel, store it
buffer2 = i
case i := <-printer4:
// sexy prime list received from channel, store it
buffer3 = i
case <-finished:
running--
if running == 0 {
// all goroutines ended
// dump buffer to stdout
fmt.Printf(buffer)
fmt.Printf(buffer2)
fmt.Printf(buffer3)
return
}
}
}
}
func isprime(x int) bool {
for i := 2; i < x; i++ {
if x%i == 0 {
return false
}
}
return true
}
func findprimes(from int, to int, printer chan string, finished chan int){
str := ""
for i := from; i <= to; i++ {
if isprime(i) && isprime(i-6) {
str = str + fmt.Sprintf("%d %d\n", i-6, i)
}
}
printer <- str
//fmt.Printf("Finished %d to %d\n", from, to)
finished <- 1
}
The parallelized version used in average 2.743 seconds, the exact same time that the regular version used.
并行版本平均使用 2.743 秒,与常规版本使用的时间完全相同。
The parallelized version completed in 1.706 seconds. It used less than 1.5 Mb RAM.
并行化版本在 1.706 秒内完成。它使用不到 1.5 Mb 的 RAM。
One odd thing: My dual core kubuntu 64bit never peaked in both cores. It looked like Go was using just one core.Fixed with a call to runtime.GOMAXPROCS(4)
一件奇怪的事情:我的双核 kubuntu 64 位从未在两个内核中达到峰值。看起来 Go 只使用了一个内核。通过调用修复runtime.GOMAXPROCS(4)
Update: I ran the paralellized version up to 1M numbers. One of My CPU cores was at 100% all the time, while the other wasn't used at all (odd). It took a whole minute more than the C and the regular Go versions. :(
更新:我运行了多达 100 万个数字的并行版本。 我的一个 CPU 内核一直处于 100% 的状态,而另一个根本没有使用(奇怪)。它比 C 和常规 Go 版本多花了整整一分钟。:(
With 1000000 (1M instead of 100K):
使用 1000000(1M 而不是 100K):
C: 3m35.458sGo: 3m36.259sGo using goroutines:3m27.137s2m16.125s
C: 3m35.458sGo: 3m36.259sGo using goroutines:3m27.137s2m16.125s
The 100k version:
100k版本:
C: 2.723sGo: 2.743sGo using goroutines: 1.706s
C: 2.723sGo: 2.743sGo using goroutines: 1.706s

