.net 数组与列表的性能

Question

提问by Boaz

Say you need to have a list/array of integers which you need iterate frequently, and I mean extremely often. The reasons may vary, but say it's in the heart of the inner most loop of a high volume processing.

假设您需要有一个需要经常迭代的整数列表/数组，我的意思是非常频繁。原因可能会有所不同，但可以说它位于高容量处理的最内层循环的核心。

In general, one would opt for using Lists (List) due to their flexibility in size. On top of that, msdn documentation claims Lists use an array internally and should perform just as fast (a quick look with Reflector confirms this). Neverless, there is some overhead involved.

一般来说，由于其大小的灵活性，人们会选择使用列表（List）。最重要的是，msdn 文档声称列表在内部使用数组并且应该执行同样快（使用 Reflector 快速查看确认了这一点）。无论如何，这涉及一些开销。

Did anyone actually measure this? would iterating 6M times through a list take the same time as an array would?

真的有人测量过吗？遍历一个列表 6M 次会花费与数组相同的时间吗？

Answer 1

回答by Marc Gravell

Very easy to measure...

非常容易测量...

In a small number of tight-loop processing code where I know the length is fixedI use arrays for that extra tiny bit of micro-optimisation; arrays can be marginallyfaster ifyou use the indexer / for form - but IIRC believe it depends on the type of data in the array. But unless you needto micro-optimise, keep it simple and use List<T>etc.

在我知道长度固定的少数紧循环处理代码中，我使用数组来进行额外的微优化；如果您使用索引器 / for form，数组可能会稍微快一点- 但 IIRC 认为这取决于数组中的数据类型。但是除非您需要进行微优化，否则请保持简单和使用等。List<T>

Of course, this only applies if you are reading all of the data; a dictionary would be quicker for key-based lookups.

当然，这仅适用于您正在阅读所有数据的情况；对于基于键的查找，字典会更快。

Here's my results using "int" (the second number is a checksum to verify they all did the same work):

这是我使用“int”的结果（第二个数字是校验和，以验证它们都做了相同的工作）：

(edited to fix bug)

（已编辑以修复错误）

List/for: 1971ms (589725196)
Array/for: 1864ms (589725196)
List/foreach: 3054ms (589725196)
Array/foreach: 1860ms (589725196)

based on the test rig:

基于试验台：

using System;
using System.Collections.Generic;
using System.Diagnostics;
static class Program
{
    static void Main()
    {
        List<int> list = new List<int>(6000000);
        Random rand = new Random(12345);
        for (int i = 0; i < 6000000; i++)
        {
            list.Add(rand.Next(5000));
        }
        int[] arr = list.ToArray();

        int chk = 0;
        Stopwatch watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            int len = list.Count;
            for (int i = 0; i < len; i++)
            {
                chk += list[i];
            }
        }
        watch.Stop();
        Console.WriteLine("List/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            for (int i = 0; i < arr.Length; i++)
            {
                chk += arr[i];
            }
        }
        watch.Stop();
        Console.WriteLine("Array/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in list)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("List/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in arr)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("Array/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        Console.ReadLine();
    }
}

Answer 2

回答by Andrew

Summary:

概括：

Array need to use:
- So often as possible. It's fast and takes smallest RAM range for same amount information.
- If you know exact count of cells needed
- If data saved in array < 85000 b (85000/32 = 2656 elements for integer data)
- If needed high Random Access speed
List need to use:
- If needed to add cells to the end of list (often)
- If needed to add cells in the beginning/middle of the list (NOT OFTEN)
- If data saved in array < 85000 b (85000/32 = 2656 elements for integer data)
- If needed high Random Access speed
LinkedList need to use:
- If needed to add cells in the beginning/middle/end of the list (often)
- If needed only sequential access (forward/backward)
- If you need to save LARGE items, but items count is low.
- Better do not use for large amount of items, as it's use additional memory for links.

数组需要使用：
- 尽可能经常。对于相同数量的信息，它速度快并且占用最小的 RAM 范围。
- 如果您知道所需细胞的确切数量
- 如果数组中保存的数据 < 85000 b（对于整数数据，85000/32 = 2656 个元素）
- 如果需要高随机访问速度
列表需要使用：
- 如果需要将单元格添加到列表的末尾（通常）
- 如果需要在列表的开头/中间添加单元格（不经常）
- 如果数组中保存的数据 < 85000 b（对于整数数据，85000/32 = 2656 个元素）
- 如果需要高随机访问速度
LinkedList 需要使用：
- 如果需要在列表的开头/中间/结尾添加单元格（通常）
- 如果只需要顺序访问（向前/向后）
- 如果您需要保存大件物品，但物品数量很少。
- 最好不要用于大量项目，因为它会为链接使用额外的内存。

More details:

更多细节：

Much more details:

更多细节：

https://stackoverflow.com/a/29263914/4423545

Answer 3

回答by Frederik Gheysels

I think the performance will be quite similar. The overhead that is involved when using a List vs an Array is, IMHO when you add items to the list, and when the list has to increase the size of the array that it's using internally, when the capacity of the array is reached.

我认为性能将非常相似。使用列表与数组时涉及的开销是，恕我直言，当您向列表添加项目时，以及当列表必须增加它在内部使用的数组的大小时，当达到数组的容量时。

Suppose you have a List with a Capacity of 10, then the List will increase it's capacity once you want to add the 11th element. You can decrease the performance impact by initializing the Capacity of the list to the number of items it will hold.

假设您有一个容量为 10 的列表，那么一旦您想添加第 11 个元素，该列表就会增加它的容量。您可以通过将列表的容量初始化为它将容纳的项目数来降低性能影响。

But, in order to figure out if iterating over a List is as fast as iterating over an array, why don't you benchmark it ?

但是，为了确定迭代 List 是否与迭代数组一样快，为什么不对其进行基准测试？

int numberOfElements = 6000000;

List<int> theList = new List<int> (numberOfElements);
int[] theArray = new int[numberOfElements];

for( int i = 0; i < numberOfElements; i++ )
{
    theList.Add (i);
    theArray[i] = i;
}

Stopwatch chrono = new Stopwatch ();

chrono.Start ();

int j;

 for( int i = 0; i < numberOfElements; i++ )
 {
     j = theList[i];
 }

 chrono.Stop ();
 Console.WriteLine (String.Format("iterating the List took {0} msec", chrono.ElapsedMilliseconds));

 chrono.Reset();

 chrono.Start();

 for( int i = 0; i < numberOfElements; i++ )
 {
     j = theArray[i];
 }

 chrono.Stop ();
 Console.WriteLine (String.Format("iterating the array took {0} msec", chrono.ElapsedMilliseconds));

 Console.ReadLine();

On my system; iterating over the array took 33msec; iterating over the list took 66msec.

在我的系统上；遍历数组需要 33 毫秒；遍历列表需要 66 毫秒。

To be honest, I didn't expect that the variation would be that much. So, I've put my iteration in a loop: now, I execute both iteration 1000 times. The results are:

说实话，我没想到变化会这么大。所以，我把我的迭代放在一个循环中：现在，我执行这两个迭代 1000 次。结果是：

iterating the List took 67146 msec iterating the array took 40821 msec

迭代列表需要 67146 毫秒迭代数组需要 40821 毫秒

Now, the variation is not that large anymore, but still ...

现在，变化不再那么大了，但仍然......

Therefore, I've started up .NET Reflector, and the getter of the indexer of the List class, looks like this:

因此，我启动了 .NET Reflector，List 类索引器的 getter 如下所示：

public T get_Item(int index)
{
    if (index >= this._size)
    {
        ThrowHelper.ThrowArgumentOutOfRangeException();
    }
    return this._items[index];
}

As you can see, when you use the indexer of the List, the List performs a check whether you're not going out of the bounds of the internal array. This additional check comes with a cost.

如您所见，当您使用 List 的索引器时，List 会检查您是否没有超出内部数组的边界。这项额外的检查是有成本的。

Answer 4

回答by ShuggyCoUk

if you are just getting a single value out of either (not in a loop) then both do bounds checking (you're in managed code remember) it's just the list does it twice. See the notes later for why this is likely not a big deal.

如果您只是从其中一个（不在循环中）中获取单个值，那么两者都进行边界检查（记住您在托管代码中）它只是列表执行了两次。请参阅后面的注释，了解为什么这可能不是什么大问题。

If you are using your own for(int int i = 0; i < x.[Length/Count];i++) then the key difference is as follows:

如果您使用自己的 for(int int i = 0; i < x.[Length/Count];i++) 那么主要区别如下：

Array:
- bounds checking is removed
Lists
- bounds checking is performed

大批：
- 边界检查被删除
列表
- 执行边界检查

If you are using foreach then the key difference is as follows:

如果您使用 foreach，那么主要区别如下：

Array:
- no object is allocated to manage the iteration
- bounds checking is removed
List via a variable known to be List.
- the iteration management variable is stack allocated
- bounds checking is performed
List via a variable known to be IList.
- the iteration management variable is heap allocated
- bounds checking is performed also Lists values may not be altered during the foreach whereas the array's can be.

大批：
- 没有分配对象来管理迭代
- 边界检查被删除
通过已知为 List 的变量列出。
- 迭代管理变量是堆栈分配的
- 执行边界检查
通过已知为 IList 的变量列出。
- 迭代管理变量是堆分配的
- 还执行边界检查，在 foreach 期间不能更改列表值，而数组可以。

The bounds checking is often no big deal (especially if you are on a cpu with a deep pipeline and branch prediction - the norm for most these days) but only your own profiling can tell you if that is an issue. If you are in parts of your code where you are avoiding heap allocations (good examples are libraries or in hashcode implementations) then ensuring the variable is typed as List not IList will avoid that pitfall. As always profile if it matters.

边界检查通常没什么大不了的（特别是如果您使用的是具有深度管道和分支预测的 CPU——这是当今大多数情况下的规范），但只有您自己的分析才能告诉您这是否是一个问题。如果您在避免堆分配的部分代码中（很好的例子是库或哈希码实现），那么确保将变量键入为 List 而不是 IList 将避免该陷阱。如果重要的话，一如既往地配置文件。

Answer 5

回答by David Schmitt

[See also this question]

[另见这个问题]

I've modified Marc's answer to use actual random numbers and actually do the same work in all cases.

我已经修改了 Marc 的答案，以使用实际的随机数，并在所有情况下实际执行相同的工作。

Results:

结果：

         for      foreach
Array : 1575ms     1575ms (+0%)
List  : 1630ms     2627ms (+61%)
         (+3%)     (+67%)

(Checksum: -1000038876)

Compiled as Release under VS 2008 SP1. Running without debugging on a [email protected], .NET 3.5 SP1.

在 VS 2008 SP1 下编译为 Release。无需调试即可在 [email protected]、.NET 3.5 SP1 上运行。

Code:

代码：

class Program
{
    static void Main(string[] args)
    {
        List<int> list = new List<int>(6000000);
        Random rand = new Random(1);
        for (int i = 0; i < 6000000; i++)
        {
            list.Add(rand.Next());
        }
        int[] arr = list.ToArray();

        int chk = 0;
        Stopwatch watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            int len = list.Count;
            for (int i = 0; i < len; i++)
            {
                chk += list[i];
            }
        }
        watch.Stop();
        Console.WriteLine("List/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            int len = arr.Length;
            for (int i = 0; i < len; i++)
            {
                chk += arr[i];
            }
        }
        watch.Stop();
        Console.WriteLine("Array/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in list)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("List/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in arr)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("Array/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);
        Console.WriteLine();

        Console.ReadLine();
    }
}

Answer 6

回答by Fatih GüRDAL

Do not attempt to add capacity by increasing the number of elements.

不要试图通过增加元素数量来增加容量。

Performance

表现

List For Add: 1ms
Array For Add: 2397ms

    Stopwatch watch;
        #region --> List For Add <--

        List<int> intList = new List<int>();
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 60000; rpt++)
        {
            intList.Add(rand.Next());
        }
        watch.Stop();
        Console.WriteLine("List For Add: {0}ms", watch.ElapsedMilliseconds);
        #endregion

        #region --> Array For Add <--

        int[] intArray = new int[0];
        watch = Stopwatch.StartNew();
        int sira = 0;
        for (int rpt = 0; rpt < 60000; rpt++)
        {
            sira += 1;
            Array.Resize(ref intArray, intArray.Length + 1);
            intArray[rpt] = rand.Next();

        }
        watch.Stop();
        Console.WriteLine("Array For Add: {0}ms", watch.ElapsedMilliseconds);

        #endregion

Answer 7

回答by Travis

Here's one that uses Dictionaries, IEnumerable:

这是使用字典 IEnumerable 的一个：

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

static class Program
{
    static void Main()
    {
        List<int> list = new List<int>(6000000);

        for (int i = 0; i < 6000000; i++)
        {
                list.Add(i);
        }
        Console.WriteLine("Count: {0}", list.Count);

        int[] arr = list.ToArray();
        IEnumerable<int> Ienumerable = list.ToArray();
        Dictionary<int, bool> dict = list.ToDictionary(x => x, y => true);

        int chk = 0;
        Stopwatch watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            int len = list.Count;
            for (int i = 0; i < len; i++)
            {
                chk += list[i];
            }
        }
        watch.Stop();
        Console.WriteLine("List/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            for (int i = 0; i < arr.Length; i++)
            {
                chk += arr[i];
            }
        }
        watch.Stop();
        Console.WriteLine("Array/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in Ienumerable)
            {
                chk += i;
            }
        }

        Console.WriteLine("Ienumerable/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in dict.Keys)
            {
                chk += i;
            }
        }

        Console.WriteLine("Dict/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);


        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in list)
            {
                chk += i;
            }
        }

        watch.Stop();
        Console.WriteLine("List/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in arr)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("Array/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);



        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in Ienumerable)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("Ienumerable/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in dict.Keys)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("Dict/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        Console.ReadLine();
    }
}

Answer 8

回答by Cygon

I was worried that the Benchmarks posted in other answers would still leave room for the compiler to optimize, eliminate or merge loops so I wrote one that:

我担心其他答案中发布的基准测试仍然会为编译器优化、消除或合并循环留下空间，所以我写了一个：

Used unpredictable inputs (random)
Runs a calculated with the result printed to the console
Modifies the input data each repetition

使用不可预测的输入（随机）
运行一个计算结果打印到控制台
每次重复修改输入数据

The result as that a direct array has about 250% better performance than an access to an array wrapped in an IList:

结果是，直接数组的性能比访问包含在 IList 中的数组高约 250%：

1 billion array accesses: 4000 ms
1 billion list accesses: 10000 ms
100 million array accesses: 350 ms
100 million list accesses: 1000 ms

10 亿次阵列访问：4000 毫秒
10 亿次列表访问：10000 毫秒
1 亿次数组访问：350 毫秒
1 亿次列表访问：1000 毫秒

Here's the code:

这是代码：

static void Main(string[] args) {
  const int TestPointCount = 1000000;
  const int RepetitionCount = 1000;

  Stopwatch arrayTimer = new Stopwatch();
  Stopwatch listTimer = new Stopwatch();

  Point2[] points = new Point2[TestPointCount];
  var random = new Random();
  for (int index = 0; index < TestPointCount; ++index) {
    points[index].X = random.NextDouble();
    points[index].Y = random.NextDouble();
  }

  for (int repetition = 0; repetition <= RepetitionCount; ++repetition) {
    if (repetition > 0) { // first repetition is for cache warmup
      arrayTimer.Start();
    }
    doWorkOnArray(points);
    if (repetition > 0) { // first repetition is for cache warmup
      arrayTimer.Stop();
    }

    if (repetition > 0) { // first repetition is for cache warmup
      listTimer.Start();
    }
    doWorkOnList(points);
    if (repetition > 0) { // first repetition is for cache warmup
      listTimer.Stop();
    }
  }

  Console.WriteLine("Ignore this: " + points[0].X + points[0].Y);
  Console.WriteLine(
    string.Format(
      "{0} accesses on array took {1} ms",
      RepetitionCount * TestPointCount, arrayTimer.ElapsedMilliseconds
    )
  );
  Console.WriteLine(
    string.Format(
      "{0} accesses on list took {1} ms",
      RepetitionCount * TestPointCount, listTimer.ElapsedMilliseconds
    )
  );

}

private static void doWorkOnArray(Point2[] points) {
  var random = new Random();

  int pointCount = points.Length;

  Point2 accumulated = Point2.Zero;
  for (int index = 0; index < pointCount; ++index) {
    accumulated.X += points[index].X;
    accumulated.Y += points[index].Y;
  }

  accumulated /= pointCount;

  // make use of the result somewhere so the optimizer can't eliminate the loop
  // also modify the input collection so the optimizer can merge the repetition loop
  points[random.Next(0, pointCount)] = accumulated;
}

private static void doWorkOnList(IList<Point2> points) {
  var random = new Random();

  int pointCount = points.Count;

  Point2 accumulated = Point2.Zero;
  for (int index = 0; index < pointCount; ++index) {
    accumulated.X += points[index].X;
    accumulated.Y += points[index].Y;
  }

  accumulated /= pointCount;

  // make use of the result somewhere so the optimizer can't eliminate the loop
  // also modify the input collection so the optimizer can merge the repetition loop
  points[random.Next(0, pointCount)] = accumulated;
}

Answer 9

回答by Stephan Eggermont

The measurements are nice, but you are going to get significantly different results depending on what you're doing exactly in your inner loop. Measure your own situation. If you're using multi-threading, that alone is a non-trivial activity.

测量结果很好，但是根据您在内部循环中所做的工作，您将获得显着不同的结果。衡量自己的情况。如果您正在使用多线程，那么这本身就是一项重要的活动。

Answer 10

回答by Frederik Gheysels

Indeed, if you perform some complex calculations inside the loop, then the performance of the array indexer versus the list indexer may be so marginally small, that eventually, it doesn't matter.

事实上，如果您在循环内执行一些复杂的计算，那么数组索引器与列表索引器的性能可能会非常小，以至于最终都无关紧要。

.net 数组与列表的性能

提问by Boaz

回答by Marc Gravell

回答by Andrew

回答by Frederik Gheysels

回答by ShuggyCoUk

回答by David Schmitt

回答by Fatih GüRDAL

回答by Travis

回答by Cygon

回答by Stephan Eggermont

回答by Frederik Gheysels

相关推荐

最近更新

标签

.net 数组与列表的性能

提问by Boaz

回答by Marc Gravell

回答by Andrew

回答by Frederik Gheysels

回答by ShuggyCoUk

回答by David Schmitt

回答by Fatih GüRDAL

回答by Travis

回答by Cygon

回答by Stephan Eggermont

回答by Frederik Gheysels

相关推荐

.net 我可以在 XPath 表达式中使用正则表达式吗？

.net 为什么 NotImplementedException 存在？

.net 向数据集添加行

.NET：当缺少所需的配置设置时要抛出哪个异常？

相关推荐

最近更新

标签