.net List<T>.Contains() 很慢?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/823860/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 12:37:51  来源:igfitidea点击:

List<T>.Contains() is very slow?

.netarraysgenericslist

提问by DSent

Could anyone explain me why the generics List.Contains()function is so slow?

谁能解释一下为什么泛型List.Contains()函数这么慢?

I have a List<long>with about a million numbers, and the code that is constantly checking if there's a specific number within these numbers.

我有一个List<long>大约一百万个数字,以及不断检查这些数字中是否有特定数字的代码。

I tried doing the same thing using Dictionary<long, byte>and the Dictionary.ContainsKey()function, and it was about 10-20 times faster than with the List.

我尝试使用Dictionary<long, byte>Dictionary.ContainsKey()函数做同样的事情,它比使用 List 快 10-20 倍。

Of course, I don't really want to use Dictionary for that purpose, because it wasn't meant to be used that way.

当然,我并不是真的想为此目的使用 Dictionary,因为它不应该那样使用。

So, the real question here is, is there any alternative to the List<T>.Contains(), but not as whacky as Dictionary<K,V>.ContainsKey()?

所以,这里真正的问题是,是否有任何替代List<T>.Contains(),但不像Dictionary<K,V>.ContainsKey()

回答by Marc Gravell

If you are just checking for existence, HashSet<T>in .NET 3.5 is your best option - dictionary-like performance, but no key/value pair - just the values:

如果您只是检查是否存在,HashSet<T>在 .NET 3.5 中是您的最佳选择 - 类似字典的性能,但没有键/值对 - 只是值:

    HashSet<int> data = new HashSet<int>();
    for (int i = 0; i < 1000000; i++)
    {
        data.Add(rand.Next(50000000));
    }
    bool contains = data.Contains(1234567); // etc

回答by Frederik Gheysels

List.Contains is a O(n) operation.

List.Contains 是一个 O(n) 操作。

Dictionary.ContainsKey is a O(1) operation, since it uses the hashcode of the objects as a key, which gives you a quicker search ability.

Dictionary.ContainsKey 是一个 O(1) 操作,因为它使用对象的哈希码作为键,这为您提供了更快的搜索能力。

I don't think that it 's a good idea to have a List which contains a million entries. I don't think that the List class was designed for that purpose. :)

我认为拥有一个包含一百万个条目的列表不是一个好主意。我认为 List 类不是为此目的而设计的。:)

Isn't it possible to save those millon entities into a RDBMS for instance, and perform queries on that database ?

例如,是否可以将这些数以百万计的实体保存到 RDBMS 中,然后对该数据库执行查询?

If it is not possible, then I would use a Dictionary anyway.

如果不可能,那么无论如何我都会使用字典。

回答by Kevin North

I think I have the answer! Yes, it's true that Contains() on a list (array) is O(n), but if the array is short and you are using value types, it still should be quite fast. But using the CLR Profiler [free download from Microsoft], I discovered that Contains() is boxing values in order to compare them, which requires heap allocation, which is VERY expensive (slow). [Note: This is .Net 2.0; other .Net versions not tested.]

我想我有答案!是的,列表(数组)上的 Contains() 确实是 O(n),但是如果数组很短并且您使用的是值类型,它仍然应该非常快。但是使用 CLR Profiler [从微软免费下载],我发现 Contains() 是装箱值以比较它们,这需要堆分配,这是非常昂贵的(慢)。[注:这是.Net 2.0;其他 .Net 版本未测试。]

Here's the full story and solution. We have an enumeration called "VI" and made a class called "ValueIdList" which is an abstract type for a list (array) of VI objects. The original implementation was in the ancient .Net 1.1 days, and it used an encapsulated ArrayList. We discovered recently in http://blogs.msdn.com/b/joshwil/archive/2004/04/13/112598.aspxthat a generic list (List<VI>) performs much better than ArrayList on value types (like our enum VI) because the values don't have to be boxed. It's true and it worked... almost.

这是完整的故事和解决方案。我们有一个名为“VI”的枚举,并创建了一个名为“ValueIdList”的类,它是 VI 对象列表(数组)的抽象类型。最初的实现是在古老的 .Net 1.1 时代,它使用了一个封装的 ArrayList。我们最近在http://blogs.msdn.com/b/joshwil/archive/2004/04/13/112598.aspx中发现,泛型列表 (List<VI>) 在值类型上的性能比 ArrayList 好得多(就像我们的enum VI) 因为这些值不必装箱。这是真的,它奏效了……几乎。

The CLR Profiler revealed a surprise. Here's a portion of the Allocation Graph:

CLR Profiler 透露了一个惊喜。这是分配图的一部分:

  • ValueIdList::Contains bool(VI) 5.5MB (34.81%)
  • Generic.List::Contains bool(<UNKNOWN>) 5.5MB (34.81%)
  • Generic.ObjectEqualityComparer<T>::Equals bool (<UNKNOWN> <UNKNOWN>) 5.5MB (34.88%)
  • Values.VI 7.7MB (49.03%)
  • ValueIdList::Contains bool(VI) 5.5MB (34.81%)
  • Generic.List::Contains bool(<UNKNOWN>) 5.5MB (34.81%)
  • Generic.ObjectEqualityComparer<T>::Equals bool (<UNKNOWN> <UNKNOWN>) 5.5MB (34.88%)
  • Values.VI 7.7MB (49.03%)

As you can see, Contains() surprisingly calls Generic.ObjectEqualityComparer.Equals(), which apparently requires the boxing of a VI value, which requires expensive heap allocation. It's odd that Microsoft would eliminate boxing on the list, only to require it again for a simple operation such as this.

如您所见,Contains() 出人意料地调用了 Generic.ObjectEqualityComparer.Equals(),这显然需要对 VI 值进行装箱,这需要昂贵的堆分配。奇怪的是,微软会在列表中消除拳击,只是为了像这样的简单操作再次要求它。

Our solution was to re-write the Contains() implementation, which in our case was easy to do since we were already encapsulating the generic list object (_items). Here's the simple code:

我们的解决方案是重写 Contains() 实现,在我们的例子中很容易做到,因为我们已经封装了通用列表对象 (_items)。这是简单的代码:

public bool Contains(VI id) 
{
  return IndexOf(id) >= 0;
}

public int IndexOf(VI id) 
{ 
  int i, count;

  count = _items.Count;
  for (i = 0; i < count; i++)
    if (_items[i] == id)
      return i;
  return -1;
}

public bool Remove(VI id) 
{
  int i;

  i = IndexOf(id);
  if (i < 0)
    return false;
  _items.RemoveAt(i);

  return true;
}

The comparison of VI values is now being done in our own version of IndexOf() which requires no boxing, and it's very fast. Our particular program sped up by 20% after this simple re-write. O(n)... no problem! Just avoid the wasted memory usage!

VI 值的比较现在在我们自己的 IndexOf() 版本中完成,它不需要装箱,而且速度非常快。在这个简单的重写之后,我们的特定程序加速了 20%。O(n)...没问题!只是避免浪费的内存使用!

回答by Stefan Steinegger

Dictionary isn't that bad, because the keys in a dictionary are designed to be found fast. To find a number in a list it needs to iterate through the whole list.

字典并没有那么糟糕,因为字典中的键旨在快速找到。要在列表中查找数字,它需要遍历整个列表。

Of course the dictionary only works if your numbers are unique and not ordered.

当然,字典仅在您的数字是唯一且未排序的情况下才有效。

I think there is also a HashSet<T>class in .NET 3.5, it also allows only unique elements.

我认为HashSet<T>.NET 3.5 中也有一个类,它也只允许唯一的元素。

回答by Mitch Wheat

A SortedListwill be faster to search (but slower to insert items)

一个排序列表将以更快的速度搜索(但速度较慢插入项目)

回答by user2023861

This is not exactly an answer to your question, but I have a class that increases the performance of Contains() on a collection. I subclassed a Queue and added a Dictionary that maps hashcodes to lists of objects. The Dictionary.Contains()function is O(1) whereas List.Contains(), Queue.Contains(), and Stack.Contains()are O(n).

这不完全是您问题的答案,但我有一个类可以提高 Contains() 对集合的性能。我子类化了一个队列并添加了一个将哈希码映射到对象列表的字典。该Dictionary.Contains()功能是O(1),而List.Contains()Queue.Contains()Stack.Contains()是为O(n)。

The value-type of the dictionary is a queue holding objects with the same hashcode. The caller can supply a custom class object that implements IEqualityComparer. You could use this pattern for Stacks or Lists. The code would need just a few changes.

字典的值类型是一个队列,其中包含具有相同哈希码的对象。调用者可以提供实现 IEqualityComparer 的自定义类对象。您可以将此模式用于堆栈或列表。代码只需要进行一些更改。

/// <summary>
/// This is a class that mimics a queue, except the Contains() operation is O(1) rather     than O(n) thanks to an internal dictionary.
/// The dictionary remembers the hashcodes of the items that have been enqueued and dequeued.
/// Hashcode collisions are stored in a queue to maintain FIFO order.
/// </summary>
/// <typeparam name="T"></typeparam>
private class HashQueue<T> : Queue<T>
{
    private readonly IEqualityComparer<T> _comp;
    public readonly Dictionary<int, Queue<T>> _hashes; //_hashes.Count doesn't always equal base.Count (due to collisions)

    public HashQueue(IEqualityComparer<T> comp = null) : base()
    {
        this._comp = comp;
        this._hashes = new Dictionary<int, Queue<T>>();
    }

    public HashQueue(int capacity, IEqualityComparer<T> comp = null) : base(capacity)
    {
        this._comp = comp;
        this._hashes = new Dictionary<int, Queue<T>>(capacity);
    }

    public HashQueue(IEnumerable<T> collection, IEqualityComparer<T> comp = null) :     base(collection)
    {
        this._comp = comp;

        this._hashes = new Dictionary<int, Queue<T>>(base.Count);
        foreach (var item in collection)
        {
            this.EnqueueDictionary(item);
        }
    }

    public new void Enqueue(T item)
    {
        base.Enqueue(item); //add to queue
        this.EnqueueDictionary(item);
    }

    private void EnqueueDictionary(T item)
    {
        int hash = this._comp == null ? item.GetHashCode() :     this._comp.GetHashCode(item);
        Queue<T> temp;
        if (!this._hashes.TryGetValue(hash, out temp))
        {
            temp = new Queue<T>();
            this._hashes.Add(hash, temp);
        }
        temp.Enqueue(item);
    }

    public new T Dequeue()
    {
        T result = base.Dequeue(); //remove from queue

        int hash = this._comp == null ? result.GetHashCode() : this._comp.GetHashCode(result);
        Queue<T> temp;
        if (this._hashes.TryGetValue(hash, out temp))
        {
            temp.Dequeue();
            if (temp.Count == 0)
                this._hashes.Remove(hash);
        }

        return result;
    }

    public new bool Contains(T item)
    { //This is O(1), whereas Queue.Contains is (n)
        int hash = this._comp == null ? item.GetHashCode() : this._comp.GetHashCode(item);
        return this._hashes.ContainsKey(hash);
    }

    public new void Clear()
    {
        foreach (var item in this._hashes.Values)
            item.Clear(); //clear collision lists

        this._hashes.Clear(); //clear dictionary

        base.Clear(); //clear queue
    }
}

My simple testing shows that my HashQueue.Contains()runs much faster than Queue.Contains(). Running the test code with count set to 10,000 takes 0.00045 seconds for the HashQueue version and 0.37 seconds for the Queue version. With a count of 100,000, the HashQueue version takes 0.0031 seconds whereas the Queue takes 36.38 seconds!

我的简单测试表明我的HashQueue.Contains()运行速度比Queue.Contains(). 在将 count 设置为 10,000 的情况下运行测试代码,HashQueue 版本需要 0.00045 秒,Queue 版本需要 0.37 秒。计数为 100,000 时,HashQueue 版本需要 0.0031 秒,而 Queue 版本需要 36.38 秒!

Here's my testing code:

这是我的测试代码:

static void Main(string[] args)
{
    int count = 10000;

    { //HashQueue
        var q = new HashQueue<int>(count);

        for (int i = 0; i < count; i++) //load queue (not timed)
            q.Enqueue(i);

        System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
        for (int i = 0; i < count; i++)
        {
            bool contains = q.Contains(i);
        }
        sw.Stop();
        Console.WriteLine(string.Format("HashQueue, {0}", sw.Elapsed));
    }

    { //Queue
        var q = new Queue<int>(count);

        for (int i = 0; i < count; i++) //load queue (not timed)
            q.Enqueue(i);

        System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
        for (int i = 0; i < count; i++)
        {
            bool contains = q.Contains(i);
        }
        sw.Stop();
        Console.WriteLine(string.Format("Queue,     {0}", sw.Elapsed));
    }

    Console.ReadLine();
}

回答by Andrew

Why is a dictionary inappropriate?

为什么字典不合适?

To see if a particular value is in the list you need to walk the entire list. With a dictionary (or other hash based container) it's much quicker to narrow down the number of objects you need to compare against. The key (in your case, the number) is hashed and that gives the dictionary the fractional subset of objects to compare against.

要查看特定值是否在列表中,您需要遍历整个列表。使用字典(或其他基于散列的容器)可以更快地缩小需要比较的对象数量。键(在你的情况下,数字)被散列,这为字典提供了要比较的对象的小数子集。

回答by Mark McGookin

I'm using this in the Compact Framework where there is no support for HashSet, I have opted for a Dictionary where both strings are the value I am looking for.

我在不支持 HashSet 的 Compact Framework 中使用它,我选择了一个字典,其中两个字符串都是我要查找的值。

It means I get list<> functionality with dictionary performance. It's a bit hacky, but it works.

这意味着我获得了具有字典性能的 list<> 功能。这有点hacky,但它有效。