C# sum 的更快实现(用于 Codility 测试)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2338402/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-07 01:33:14  来源:igfitidea点击:

faster implementation of sum ( for Codility test )

c#javac++algorithmoptimization

提问by OscarRyz

How can the following simple implementation of sumbe faster?

下面的简单实现如何sum更快?

private long sum( int [] a, int begin, int end ) {
    if( a == null   ) {
        return 0;
    }
    long r = 0;
    for( int i =  begin ; i < end ; i++ ) {
       r+= a[i];
    }
    return r;
}

EDIT

编辑

Background is in order.

背景是有序的。

Reading latest entry on coding horror, I came to this site: http://codility.comwhich has this interesting programming test.

阅读有关编码恐怖的最新条目,我来到了这个站点:http: //codility.com,它有这个有趣的编程测试。

Anyway, I got 60 out of 100 in my submission, and basically ( I think ) is because this implementation of sum, because those parts where I failed are the performance parts. I'm getting TIME_OUT_ERROR's

无论如何,我在提交中获得了 100 分中的 60 分,基本上(我认为)是因为 sum 的实现,因为我失败的那些部分是性能部分。我收到 TIME_OUT_ERROR 的

So, I was wondering if an optimization in the algorithm is possible.

所以,我想知道是否可以优化算法。

So, no built in functions or assembly would be allowed. This my be done in C, C++, C#, Java or pretty much in any other.

因此,不允许内置函数或程序集。这可以用 C、C++、C#、Java 或几乎任何其他语言完成。

EDIT

编辑

As usual, mmyers was right. I did profile the code and I saw most of the time was spent on that function, but I didn't understand why. So what I did was to throw away my implementation and start with a new one.

像往常一样,mmyers 是对的。我确实分析了代码,我看到大部分时间都花在了该功能上,但我不明白为什么。所以我所做的就是放弃我的实现并从一个新的开始。

This time I've got an optimal solution [ according to San JacintoO(n) -see comments to MSN below - ]

这次我有一个最佳解决方案 [根据San JacintoO(n) - 请参阅下面对 MSN 的评论 - ]

This time I've got 81% on Codility which I think is good enough. The problem is that I didn't take the 30 mins. but around 2 hrs. but I guess that leaves me still as a good programmer, for I could work on the problem until I found an optimal solution:

这次我在 Codility 上获得了 81% 的评分,我认为这已经足够了。问题是我没有花30分钟。但大约2小时。但我想这让我仍然是一名优秀的程序员,因为我可以解决这个问题,直到找到最佳解决方案:

Here's my result.

这是我的结果。

my result on codility

我对编码的结果

I never understood what is those "combinations of..." nor how to test "extreme_first"

我从来不明白什么是“...的组合”,也不明白如何测试“extreme_first”

采纳答案by Guildencrantz

I don't think your problem is with the function that's summing the array, it's probably that you're summing the array WAY to frequently. If you simply sum the WHOLE array once, and then step through the array until you find the first equilibrium point you should decrease the execution time sufficiently.

我不认为您的问题出在对数组求和的函数上,可能是您经常对数组 WAY 求和。如果您只是对整个数组求和一次,然后逐步遍历数组直到找到第一个平衡点,则应充分减少执行时间。

int equi ( int[] A ) {
    int equi = -1;

    long lower = 0;
    long upper = 0;
    foreach (int i in A)
        upper += i;

    for (int i = 0; i < A.Length; i++)
    {
        upper -= A[i];
        if (upper == lower)
        {
            equi = i;
            break;
        }
        else
            lower += A[i];
    }

    return equi;
}

回答by Vlad

In C++, the following:

在 C++ 中,以下内容:

int* a1 = a + begin;
for( int i = end - begin - 1; i >= 0 ; i-- )
{
    r+= a1[i];
}

might be faster. The advantage is that we compare against zero in the loop.

可能会更快。优点是我们在循环中与零进行比较。

Of course, with a reallygood optimizer there should be no difference at all.

当然,对于一个非常好的优化器,应该没有任何区别。

Another possibility would be

另一种可能性是

int* a2 = a + end - 1;
for( int i = -(end - begin - 1); i <= 0 ; i++ )
{
    r+= a2[i];
}

here we traversing the items in the same order, just not comparing to end.

在这里,我们以相同的顺序遍历项目,只是不与end.

回答by Antti Huima

I don't believe the problem is in the code you provided, but somehow the bigger solution must be suboptimal. This code looks good for calculating the sum of one slice of the array, but maybe it's not what you need to solve the whole problem.

我不相信问题出在您提供的代码中,但不知何故,更大的解决方案必须是次优的。这段代码看起来很适合计算数组的一个切片的总和,但也许这不是解决整个问题所需要的。

回答by Otto Allmendinger

If you are using C or C++ and develop for modern desktop systems and are willing to learn some assembler or learn about GCC intrinsics, you could use SIMD instructions.

如果您使用 C 或 C++ 并为现代桌面系统开发,并且愿意学习一些汇编程序或了解 GCC 内在函数,则可以使用SIMD 指令

This libraryis an example of what is possible for floatand doublearrays, similar results should be possible for integer arithmetic since SSE has integer instructions as well.

这个库是一个什么是可能的例子floatdouble阵列,类似的结果应该整数运算是可能的,因为SSE有整数指令为好。

回答by Jerry Coffin

This code is simple enough that unless ais quitesmall, it's probably going to be limited primarily by memory bandwidth. As such, you probably can't hope for any significant gain by working on the summing part itself (e.g., unrolling the loop, counting down instead of up, executing sums in parallel -- unless they're on separate CPUs, each with its own access to memory). The biggest gain will probably come from issuing some preload instructions so most of the data will already be in the cache by the time you need it. The rest will just (at best) get the CPU to hurry up more, so it waits longer.

这段代码是很简单的,除非a相当小的,它可能会被内存带宽主要限于。因此,您可能无法通过处理求和部分本身来获得任何显着收益(例如,展开循环、倒计时而不是向上、并行执行总和——除非它们位于单独的 CPU 上,每个 CPU 都有自己的自己访问内存)。最大的好处可能来自于发出一些预加载指令,因此大部分数据在您需要的时候已经在缓存中了。其余的只会(充其量)让 CPU 加快速度,因此等待的时间更长。

Edit: It appears that most of what's above has little to do with the real question. It's kind of small, so it may be difficult to read, but, I tried just using std::accumulate()for the initial addition, and it seemed to think that was all right:

编辑:似乎上面的大部分内容与真正的问题几乎没有关系。它有点小,所以可能很难阅读,但是,我尝试仅std::accumulate()用于初始添加,似乎认为还可以:

Codility Results

Codility 结果

回答by Ron Warholic

Probably the fastest you could get would be to have your int array 16-byte aligned, stream 32 bytes into two __m128ivariables (VC++) and call _mm_add_epi32(again, a VC++ intrinsic) on the chunks. Reuse one of the chunks to keep adding into it and on the final chunk extract your four ints and add them the old fashioned way.

可能你能得到的最快的方法是让你的 int 数组对齐 16 字节,将 32 字节流式传输到两个__m128i变量(VC++)中,然后_mm_add_epi32在块上调用(同样是 VC++ 内在函数)。重用其中一个块继续添加,并在最后一个块上提取您的四个整数并以老式方式添加它们。

The bigger question is why simple addition is a worthy candidate for optimization.

更大的问题是为什么简单的加法是一个值得优化的候选者。

Edit: I see it's mostly an academic exercise. Perhaps I'll give it a go tomorrow and post some results...

编辑:我认为这主要是一种学术练习。也许我明天会试一试并发布一些结果......

回答by MSN

If this is based on the actual sample problem, your issue isn't the sum. Your issue is how you calculate the equilibrium index. A naive implementation is O(n^2). An optimal solution is much much better.

如果这是基于实际样本问题,那么您的问题不是总和。您的问题是如何计算均衡指数。一个简单的实现是 O(n^2)。最佳解决方案要好得多。

回答by Fadrian Sudaman

Just some thought, not sure if accessing the pointer directly be faster

只是一些想法,不确定直接访问指针是否更快

    int* pStart = a + begin;
    int* pEnd = a + end;
    while (pStart != pEnd)
    {
        r += *pStart++;
    }

回答by ggf31416

In C# 3.0, mycomputer and myOS this is faster as long as you can guarantee that 4 consecutive numbers won't overflow the range of an int, probably because most additions are done using 32-bit math. However using a better algorithm usually provides higher speed up than any micro-optimization.

在 C# 3.0、我的计算机和我的操作系统中,只要你能保证 4 个连续的数字不会溢出 int 的范围,这就会更快,这可能是因为大多数加法都是使用 32 位数学完成的。然而,使用更好的算法通常比任何微优化提供更高的速度。

Time for a 100 millon elements array:

时间为 1 亿个元素数组:

4999912596452418 -> 233ms (sum)

4999912596452418 -> 233ms(总和)

4999912596452418 -> 126ms (sum2)

4999912596452418 -> 126ms (sum2)

    private static long sum2(int[] a, int begin, int end)
    {
        if (a == null) { return 0; }
        long r = 0;
        int i = begin;
        for (; i < end - 3; i+=4)
        {
            //int t = ;
            r += a[i] + a[i + 1] + a[i + 2] + a[i + 3];
        }
        for (; i < end; i++) { r += a[i]; }
        return r;
    }

回答by Eric Lippert

Some tips:

一些技巧:

  • Use a profiler to identify where you're spending a lot of time.

  • Write good performance tests so that you can tell the exact effect of every single change you make. Keep careful notes.

  • If it turns out that the bottleneck is the checks to ensure that you're dereferencing a legal address inside the array, and you can guarantee that begin and end are in fact both inside the array, then consider fixing the array, making a pointer to the array, and doing the algorithm in pointers rather than arrays. Pointers are unsafe; they do not spend any time checking to make sure you're still inside the array, so therefore they can be somewhat faster. But youtake responsibility then for ensuring that you do not corrupt every byte of memory in the address space.

  • 使用分析器来确定您花费大量时间的地方。

  • 编写良好的性能测试,以便您可以判断所做的每一个更改的确切效果。仔细记笔记。

  • 如果事实证明瓶颈是检查以确保您正在取消引用数组内的合法地址,并且您可以保证开始和结束实际上都在数组内,那么请考虑修复数组,制作指向数组,并在指针而不是数组中执行算法。指针是不安全的;他们不会花任何时间检查以确保您仍在阵列中,因此它们可以更快一些。但是要负责确保不会损坏地址空间中的每个内存字节。