C# 什么 .NET 集合提供最快的搜索

Question

提问by

I have 60k items that need to be checked against a 20k lookup list. Is there a collection object (like List, HashTable) that provides an exceptionly fast Contains()method? Or will I have to write my own? In otherwords, is the default Contains()method just scan each item or does it use a better search algorithm.

我有 60k 项需要根据 20k 查找列表进行检查。是否有提供异常快速方法的集合对象（如List, HashTable）Contains()？还是我必须自己写？换句话说，是默认Contains()方法只是扫描每个项目还是使用更好的搜索算法。

foreach (Record item in LargeCollection)
{
    if (LookupCollection.Contains(item.Key))
    {
       // Do something
    }
}

Note. The lookup list is already sorted.

注意。查找列表已经排序。

Answer 1

采纳答案by Jimmy

In the most general case, consider System.Collections.Generic.HashSetas your default "Contains" workhorse data structure, because it takes constant time to evaluate Contains.

在最一般的情况下，将其System.Collections.Generic.HashSet视为默认的“包含”主力数据结构，因为评估Contains.

The actual answer to "What is the fastest searchable collection" depends on your specific data size, ordered-ness, cost-of-hashing, and search frequency.

“什么是最快的可搜索集合”的实际答案取决于您的特定数据大小、有序性、散列成本和搜索频率。

Answer 2

回答by SLaks

If you don't need ordering, try HashSet<Record>(new to .Net 3.5)

如果您不需要订购，请尝试HashSet<Record>（.Net 3.5 的新手）

If you do, use a List<Record>and call BinarySearch.

如果这样做，请使用 aList<Record>并调用BinarySearch。

Answer 3

回答by Mark

Have you considered List.BinarySearch(item)?

你考虑过List.BinarySearch(item)吗？

You said that your large collection is already sorted so this seems like the perfect opportunity? A hash would definitely be the fastest, but this brings about its own problems and requires a lot more overhead for storage.

你说你的大量收藏已经分类，所以这似乎是一个绝佳的机会？散列肯定是最快的，但这会带来它自己的问题，并且需要更多的存储开销。

Answer 4

回答by Robert Horvick

If you aren't worried about squeaking every single last bit of performance the suggestion to use a HashSet or binary search is solid. Your datasets just aren't large enough that this is going to be a problem 99% of the time.

如果你不担心每一个最后一点的性能都会受到影响，那么使用 HashSet 或二分搜索的建议是可靠的。您的数据集不够大，99% 的情况下这都会成为问题。

But if this just one of thousands of times you are going to do this and performance is critical (and proven to be unacceptable using HashSet/binary search), you could certainly write your own algorithm that walked the sorted lists doing comparisons as you went. Each list would be walked at most once and in the pathological cases wouldn't be bad (once you went this route you'd probably find that the comparison, assuming it's a string or other non-integral value, would be the real expense and that optimizing that would be the next step).

但是，如果这只是您要执行此操作的数千次中的一次，并且性能至关重要（并且证明使用 HashSet/二进制搜索是不可接受的），那么您当然可以编写自己的算法，在进行比较时遍历已排序的列表。每个列表最多走一次，在病理情况下不会坏（一旦你走这条路，你可能会发现比较，假设它是一个字符串或其他非整数值，将是真正的费用和下一步就是优化）。

Answer 5

回答by Rich Schuler

If it's possible to sort your items then there is a much faster way to do this then doing key lookups into a hashtable or b-tree. Though if you're items aren't sortable you can't really put them into a b-tree anyway.

如果可以对您的项目进行排序，那么有一种更快的方法来执行此操作，然后在哈希表或 b 树中进行键查找。尽管如果您的项目不可排序，则无论如何您都无法真正将它们放入 b 树中。

Anyway, if sortable sort both lists then it's just a matter of walking the lookup list in order.

无论如何，如果可排序对两个列表进行排序，那么这只是按顺序遍历查找列表的问题。

Walk lookup list
   While items in check list <= lookup list item
     if check list item = lookup list item do something
   Move to next lookup list item

Answer 6

回答by Brian

If you're using .Net 3.5, you can make cleaner code using:

如果您使用 .Net 3.5，您可以使用以下方法制作更清晰的代码：

foreach (Record item in LookupCollection.Intersect(LargeCollection))
{
  //dostuff
}

I don't have .Net 3.5 here and so this is untested. It relies on an extension method. Not that LookupCollection.Intersect(LargeCollection)is probably not the same as LargeCollection.Intersect(LookupCollection)... the latter is probably much slower.

我这里没有 .Net 3.5，所以这是未经测试的。它依赖于扩展方法。不，这LookupCollection.Intersect(LargeCollection)可能与LargeCollection.Intersect(LookupCollection)......后者可能要慢得多。

This assumes LookupCollection is a HashSet

这假设 LookupCollection 是一个 HashSet

Answer 7

回答by clemahieu

Keep both lists x and y in sorted order.

按排序顺序保留两个列表 x 和 y。

If x = y, do your action, if x < y, advance x, if y < x, advance y until either list is empty.

如果 x = y，则执行您的操作，如果 x < y，则前进 x，如果 y < x，则前进 y 直到任一列表为空。

The run time of this intersection is proportional to min (size (x), size (y))

这个交点的运行时间与min成正比（大小（x），大小（y））

Don'trun a .Contains () loop, this is proportional to x * y which is much worse.

不要运行 .Contains () 循环，这与 x * y 成正比，更糟。

Answer 8

回答by clemahieu

You should read this blogthat speed tested several different types of collections and methods for each using both single and multi-threaded techniques.

您应该阅读这篇博客，其中使用单线程和多线程技术对几种不同类型的集合和方法进行了速度测试。

According to the results, a BinarySearch on a List and SortedList were the top performers constantly running neck-in-neck when looking up something as a "value".

根据结果，List 和 SortedList 上的 BinarySearch 是表现最佳的，在查找某物作为“值”时不断地并驾齐驱。

When using a collection that allows for "keys", the Dictionary, ConcurrentDictionary, Hashset, and HashTables performed the best overall.

当使用允许“键”的集合时，Dictionary、ConcurrentDictionary、Hashset 和 HashTables 整体表现最好。

C# 什么 .NET 集合提供最快的搜索

提问by

采纳答案by Jimmy

回答by SLaks

回答by Mark

回答by Robert Horvick

回答by Rich Schuler

回答by Brian

回答by clemahieu

回答by clemahieu

相关推荐

最近更新

标签

C# 什么 .NET 集合提供最快的搜索

提问by

采纳答案by Jimmy

回答by SLaks

回答by Mark

回答by Robert Horvick

回答by Rich Schuler

回答by Brian

回答by clemahieu

回答by clemahieu

相关推荐

Linux C++11：如何给函数取别名？

C# 获取枚举值

Linux nmi 看门狗是如何工作的？

Linux tcmalloc/jemalloc 和内存池之间（以及选择的原因）有什么区别？

相关推荐

最近更新

标签