为什么 .NET 中的多维数组比普通数组慢?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/468832/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 11:03:47  来源:igfitidea点击:

Why are multi-dimensional arrays in .NET slower than normal arrays?

.netperformancearrays

提问by Hosam Aly

Edit:I apologize everybody. I used the term "jagged array" when I actually meant to say "multi-dimensional array" (as can be seen in my example below). I apologize for using the incorrect name. I actually found jagged arrays to be faster than multi-dimensional ones! I have added my measurements for jagged arrays.

编辑:我向大家道歉。当我实际上想说“多维数组”时,我使用了术语“锯齿状数组”(如下面的示例所示)。我为使用不正确的名称道歉。我实际上发现锯齿状数组比多维数组更快!我已经添加了对锯齿状阵列的测量。

I was trying to use a jaggedmulti-dimensional array today, when I noticed that it's performance is not as I would have expected. Using a single-dimensional array and manually calculating indices was much faster (almost two times) than using a 2D array. I wrote a test using 1024*1024arrays (initialized to random values), for 1000 iterations, and I got the following results on my machine:

我试图使用 锯齿状今天的多维数组,当我注意到它的性能不如我预期的时候。使用一维数组并手动计算索引比使用二维数组快得多(几乎是两倍)。我使用1024*1024数组(初始化为随机值)编写了一个测试 ,迭代 1000 次,在我的机器上得到以下结果:

sum(double[], int): 2738 ms (100%)
sum(double[,]):     5019 ms (183%)
sum(double[][]):    2540 ms ( 93%)

This is my test code:

这是我的测试代码:

public static double sum(double[] d, int l1) {
    // assuming the array is rectangular
    double sum = 0;
    int l2 = d.Length / l1;
    for (int i = 0; i < l1; ++i)
        for (int j = 0; j < l2; ++j)
            sum += d[i * l2 + j];
    return sum;
}

public static double sum(double[,] d) {
    double sum = 0;
    int l1 = d.GetLength(0);
    int l2 = d.GetLength(1);
    for (int i = 0; i < l1; ++i)
        for (int j = 0; j < l2; ++j)
            sum += d[i, j];
    return sum;
}

public static double sum(double[][] d) {
    double sum = 0;
    for (int i = 0; i < d.Length; ++i)
        for (int j = 0; j < d[i].Length; ++j)
            sum += d[i][j];
    return sum;
}

public static void Main() {
    Random random = new Random();
    const int l1  = 1024, l2 = 1024;
    double[ ] d1  = new double[l1 * l2];
    double[,] d2  = new double[l1 , l2];
    double[][] d3 = new double[l1][];

    for (int i = 0; i < l1; ++i) {
        d3[i] = new double[l2];
        for (int j = 0; j < l2; ++j)
            d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
    }
    //
    const int iterations = 1000;
    TestTime(sum, d1, l1, iterations);
    TestTime(sum, d2, iterations);
    TestTime(sum, d3, iterations);
}

Further investigation showed that the IL for the second method is 23% larger than that of the first method. (Code size 68 vs 52.) This is mostly due to calls to System.Array::GetLength(int). The compiler also emits calls to Array::Getfor the jaggedmulti-dimensional array, whereas it simply calls ldelemfor the simple array.

进一步调查表明,第二种方法的 IL 比第一种方法的 IL 大 23%。(代码大小 68 与 52。)这主要是由于调用System.Array::GetLength(int). 编译器还发出呼吁Array::Get锯齿状多维数组,而它只是调用ldelem简单数组。

So I am wondering, why is access through multi-dimensional arrays slower than normal arrays? I would have assumed the compiler (or JIT) would do something similar to what I did in my first method, but this was not actually the case.

所以我想知道,为什么通过多维数组访问比普通数组慢?我会假设编译器(或 JIT)会做一些类似于我在我的第一种方法中所做的事情,但实际上并非如此。

Could you plese help me understand why this is happening the way it is?

你能帮我理解为什么会这样吗?



Update:Following Henk Holterman's suggestion, here is the implementation of TestTime:

更新:按照 Henk Holterman 的建议,这里是实现TestTime

public static void TestTime<T, TR>(Func<T, TR> action, T obj,
                                   int iterations)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < iterations; ++i)
        action(obj);
    Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}

public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1,
                                        T2 obj2, int iterations)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < iterations; ++i)
        action(obj1, obj2);
    Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}

采纳答案by Jon Skeet

Single dimensional arrays with a lower bound of 0 are a different type to either multi-dimensional or non-0 lower bound arrays within IL (vectorvs arrayIIRC). vectoris simpler to work with - to get to element x, you just do pointer + size * x. For an array, you have to do pointer + size * (x-lower bound)for a single dimensional array, and yet more arithmetic for each dimension you add.

下界为 0 的一维数组与 IL 中的多维或非 0 下界数组是不同的类型(vectorarrayIIRC)。vector使用起来更简单 - 要获取元素 x,您只需执行pointer + size * x. 对于array,您必须pointer + size * (x-lower bound)对一维数组进行操作,并为您添加的每个维度进行更多的算术运算。

Basically the CLR is optimised for the vastly more common case.

基本上,CLR 针对更常见的情况进行了优化。

回答by JeeBee

Array bounds checking?

数组边界检查?

The single-dimension array has a length member that you access directly - when compiled this is just a memory read.

单维数组有一个你可以直接访问的长度成员——编译时这只是一个内存读取。

The multidimensional array requires a GetLength(int dimension) method call that processes the argument to get the relevant length for that dimension. That doesn't compile down to a memory read, so you get a method call, etc.

多维数组需要一个 GetLength(int dimension) 方法调用来处理参数以获取该维度的相关长度。这不会编译为内存读取,所以你会得到一个方法调用等。

In addition that GetLength(int dimension) will do a bounds check on the parameter.

此外, GetLength(int dimension) 将对参数进行边界检查。

回答by Cameron

Interestly, I ran the following code from above using VS2008 NET3.5SP1 Win32 on a Vista box, and in release/optimize the difference was barely measurable, while debug/noopt the multi-dim arrays were much slower. (I ran the three tests twice to reduce JIT affects on the second set.)

有趣的是,我在 Vista 机器上使用 VS2008 NET3.5SP1 Win32 从上面运行了以下代码,在发布/优化中几乎无法测量差异,而调试/noopt 多暗阵列要慢得多。(我运行了三个测试两次以减少第二组的 JIT 影响。)

  Here are my numbers: 
    sum took 00:00:04.3356535
    sum took 00:00:04.1957663
    sum took 00:00:04.5523050
    sum took 00:00:04.0183060
    sum took 00:00:04.1785843 
    sum took 00:00:04.4933085

Look at the second set of three numbers. The difference is not enough for me to code everything in single dimension arrays.

看第二组三个数字。差异不足以让我对一维数组中的所有内容进行编码。

Although I haven't posted them, in Debug/unoptimized the multidimension vs. single/jagged does make a huge difference.

虽然我没有发布它们,但在调试/未优化中,多维与单/锯齿确实有很大的不同。

Full program:

完整程序:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace single_dimension_vs_multidimension
{
    class Program
    {


        public static double sum(double[] d, int l1) {    // assuming the array is rectangular 
            double sum = 0; 
            int l2 = d.Length / l1; 
            for (int i = 0; i < l1; ++i)   
                for (int j = 0; j < l2; ++j)   
                    sum += d[i * l2 + j];   
            return sum;
        }

        public static double sum(double[,] d)
        {
            double sum = 0;  
            int l1 = d.GetLength(0);
            int l2 = d.GetLength(1);   
            for (int i = 0; i < l1; ++i)    
                for (int j = 0; j < l2; ++j)   
                    sum += d[i, j]; 
            return sum;
        }
        public static double sum(double[][] d)
        {
            double sum = 0;   
            for (int i = 0; i < d.Length; ++i) 
                for (int j = 0; j < d[i].Length; ++j) 
                    sum += d[i][j];
            return sum;
        }
        public static void TestTime<T, TR>(Func<T, TR> action, T obj, int iterations) 
        { 
            Stopwatch stopwatch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; ++i)      
                action(obj);
            Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
        }
        public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1, T2 obj2, int iterations)
        {
            Stopwatch stopwatch = Stopwatch.StartNew(); 
            for (int i = 0; i < iterations; ++i)    
                action(obj1, obj2); 
            Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
        }
        public static void Main() {   
            Random random = new Random(); 
            const int l1  = 1024, l2 = 1024; 
            double[ ] d1  = new double[l1 * l2]; 
            double[,] d2  = new double[l1 , l2];  
            double[][] d3 = new double[l1][];   
            for (int i = 0; i < l1; ++i)
            {
                d3[i] = new double[l2];   
                for (int j = 0; j < l2; ++j)  
                    d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
            }    
            const int iterations = 1000;
            TestTime<double[], int, double>(sum, d1, l1, iterations);
            TestTime<double[,], double>(sum, d2, iterations);

            TestTime<double[][], double>(sum, d3, iterations);
            TestTime<double[], int, double>(sum, d1, l1, iterations);
            TestTime<double[,], double>(sum, d2, iterations);
            TestTime<double[][], double>(sum, d3, iterations); 
        }

    }
}

回答by Tamas Czinege

Because a multidimensional array is just a syntactic sugar as it is really just a flat array with some index calculation magic. On the other hand, a jagged array is like, an array of arrays. With a two-dimensional array, accessing an element requires reading the memory just once, while with a two level jagged array, you need to read the memory twice.

因为多维数组只是一种语法糖,因为它实际上只是一个带有一些索引计算魔法的平面数组。另一方面,锯齿状数组就像一个数组数组。对于二维数组,访问一个元素只需要读取一次内存,而对于两级交错数组,您需要读取内存两次。

EDIT:Apparently the original poster mixed up "jagged arrays" with "multi-dimensional arrays" so my reasoning doesn't exactly stand. For the real reason, check Jon Skeet's heavy artillery answer above.

编辑:显然原始海报将“锯齿状数组”与“多维数组”混合在一起,所以我的推理并不完全成立。对于真正的原因,请查看上面 Jon Skeet 的重型火炮答案。

回答by AnthonyWJones

Jagged arrays are arrays of class references (other arrays) up until the leaf array which may be an array of a primitive type. Hence memory allocated for each of the other arrays can be all over the place.

锯齿状数组是类引用(其他数组)的数组,直到叶数组,它可能是原始类型的数组。因此,为每个其他数组分配的内存可能到处都是。

Whereas a mutli-dimensional array has its memory allocated in one contigeous lump.

而多维数组的内存分配在一个连续的块中。

回答by Autodidact

I think it has got something to do for the fact that jagged arrays are actually arrays of arrays hence there are two levels of indirection to get to the actual data.

我认为这与锯齿数组实际上是数组数组的事实有关,因此有两个间接级别可以获取实际数据。

回答by Fredou

I'm with everyone else here

我和这里的其他人在一起

I had a program with three dimension array, let me tell you that when I moved the array into two dimension, I saw a huge boost and then I moved to a one dimension array.

我有一个带有三维数组的程序,让我告诉你,当我将数组移动到二维时,我看到了巨大的提升,然后我移动到了一维数组。

In the end, I think I saw over 500% performance boost in the execution time.

最后,我认为我在执行时间上看到了超过 500% 的性能提升。

only drawback was the complexity added to find out where was what in the one dimensional array, versus the three one.

唯一的缺点是增加了复杂性以找出一维数组中的内容,而不是三一。

回答by Michael Buen

I think multi-dimensional is slower, the runtime has to check two or more(three dimensional and up) bounds check.

我认为多维比较慢,运行时必须检查两个或更多(三维及以上)边界检查。

回答by Damien_The_Unbeliever

Bounds checking. Your "j" variable could exceed l2 provided "i" was less than l1. This would not be legal in the second example

边界检查。如果“i”小于 l1,您的“j”变量可能会超过 l2。这在第二个例子中是不合法的