C# 优化查找:字典键查找与数组索引查找
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/908050/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimizing Lookups: Dictionary key lookups vs. Array index lookups
提问by snazzer
I'm writing a 7 card poker hand evaluator as one of my pet projects. While trying to optimize its speed (I like the challenge), I was shocked to find that the performance of Dictionary key lookups was quite slow compared to array index lookups.
我正在编写一个 7 张牌手牌评估器作为我的宠物项目之一。在尝试优化其速度时(我喜欢挑战),我震惊地发现字典键查找的性能与数组索引查找相比相当慢。
For example, I ran this sample code that enumerates over all 52 choose 7 = 133,784,560 possible 7 card hands:
例如,我运行了这个示例代码,它枚举了所有 52 个选择 7 = 133,784,560 种可能的 7 张牌:
var intDict = new Dictionary<int, int>();
var intList = new List<int>();
for (int i = 0; i < 100000; i ++)
{
intDict.Add(i, i);
intList.Add(i);
}
int result;
var sw = new Stopwatch();
sw.Start();
for (int card1 = 0; card1 < 46; card1++)
for (int card2 = card1 + 1; card2 < 47; card2++)
for (int card3 = card2 + 1; card3 < 48; card3++)
for (int card4 = card3 + 1; card4 < 49; card4++)
for (int card5 = card4 + 1; card5 < 50; card5++)
for (int card6 = card5 + 1; card6 < 51; card6++)
for (int card7 = card6 + 1; card7 < 52; card7++)
result = intDict[32131]; // perform C(52,7) dictionary key lookups
sw.Stop();
Console.WriteLine("time for dictionary lookups: {0} ms", sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int card1 = 0; card1 < 46; card1++)
for (int card2 = card1 + 1; card2 < 47; card2++)
for (int card3 = card2 + 1; card3 < 48; card3++)
for (int card4 = card3 + 1; card4 < 49; card4++)
for (int card5 = card4 + 1; card5 < 50; card5++)
for (int card6 = card5 + 1; card6 < 51; card6++)
for (int card7 = card6 + 1; card7 < 52; card7++)
result = intList[32131]; // perform C(52,7) array index lookups
sw.Stop();
Console.WriteLine("time for array index lookups: {0} ms", sw.ElapsedMilliseconds);
which outputs:
输出:
time for dictionary lookups: 2532 ms
time for array index lookups: 313 ms
Is this type of behavior expected (performance decrease by a factor of 8)? IIRC, a Dictionary has, on average, O(1) lookups, while an array has worst-case O(1) lookups, so I do expect the array lookups to be faster, but not by this much!
这种行为是否符合预期(性能下降 8 倍)?IIRC,字典平均有 O(1) 次查找,而数组有最坏情况 O(1) 次查找,所以我确实希望数组查找更快,但不会快这么多!
I am currently storing poker hand rankings in a Dictionary. I suppose if this is as fast as the dictionary lookups can be, I have to rethink my approach and use arrays instead, although indexing the rankings will get a little tricky and I'll probably have to ask another question about it.
我目前正在字典中存储扑克手牌排名。我想如果这与字典查找一样快,我必须重新考虑我的方法并使用数组,尽管索引排名会变得有点棘手,我可能不得不问另一个问题。
采纳答案by Jon Skeet
Don't forget that Big-O notations only says how the complexity grows with respect to the size (etc) - it doesn't give any indication of the constant factors involved. That's why sometimes even a linear searchfor keys is faster than a dictionary lookup, when there are sufficiently few keys. In this case you're not even doing a search with the array though - just a straight indexing operation.
不要忘记,Big-O 符号只说明了复杂性如何随大小(等)增长 - 它没有给出任何涉及的常数因素的指示。这就是为什么当键足够少时,有时甚至键的线性搜索也比字典查找快。在这种情况下,您甚至没有对数组进行搜索 - 只是一个直接的索引操作。
For straight index lookups, arrays are basically ideal - it's just a case of
对于直接索引查找,数组基本上是理想的 - 这只是一个例子
pointer_into_array = base_pointer + offset * size
(And then a pointer dereference.)
(然后是指针取消引用。)
Performing a dictionary lookup is relatively complicated - very fast compared with (say) a linear lookup by key when there are lots of keys, but much more complicated than a straight array lookup. It has to calculate the hash of the key, then work out which bucket that should be in, possibly deal with duplicate hashes (or duplicate buckets) and then check for equality.
执行字典查找相对复杂 - 与(例如)在有很多键时按键进行线性查找相比非常快,但比直接数组查找要复杂得多。它必须计算密钥的散列,然后确定应该在哪个桶中,可能会处理重复的散列(或重复的桶),然后检查是否相等。
As always, choose the right data structure for the job - and if you really can get away with just indexing into an array (or List<T>
) then yes, that will be blindingly fast.
与往常一样,为工作选择正确的数据结构 - 如果您真的可以通过索引到数组(或List<T>
)中逃脱,那么是的,这将非常快。
回答by 1800 INFORMATION
An array lookup is about the fastest thing you can do - essentially all it is is a single bit of pointer arithmetic to go from the start of the array to the element you wanted to find. On the other hand, the dictionary lookup is likely to be somewhat slower since it needs to do hashing and concern itself with finding the correct bucket. Although the expected runtime is also O(1) - the algorithmic constants are greater so it will be slower.
数组查找是你能做的最快的事情——基本上它只是一个指针算法,从数组的开头到你想要查找的元素。另一方面,字典查找可能会稍微慢一些,因为它需要进行散列并关注自己找到正确的存储桶。虽然预期的运行时间也是 O(1) - 算法常数更大,所以它会更慢。
回答by ebo
Welcome to Big-O notation. You always have to consider that there is a constant factor involved.
欢迎使用 Big-O 符号。您必须始终考虑其中涉及一个恒定因素。
Doing one Dict-Lookup is of course much more expensive than an array lookup.
做一个 Dict-Lookup 当然比数组查找要昂贵得多。
Big-O only tells you how algorithms scale. Double the amount of lookups and see how the numbers change: Both should take around the twice time.
Big-O 只告诉你算法是如何扩展的。将查找次数加倍,看看数字如何变化:两者都应该花费大约两倍的时间。
回答by ChrisW
Is this type of behavior expected (performance decrease by a factor of 8)?
这种行为是否符合预期(性能下降 8 倍)?
Why not? Each array lookup is almost intantaneous/negligeable, whereas a dictionary lookup may need at least an extra subroutine call.
为什么不?每个数组查找几乎是瞬时的/可以忽略不计的,而字典查找可能至少需要一个额外的子程序调用。
The point of their both being O(1) means that even if you have 50 times more items in each collection, the performance decrease is still only a factor of whatever it is (8).
它们都是 O(1) 的点意味着即使每个集合中有 50 倍的项目,性能下降仍然只是它的一个因素(8)。
回答by gbjbaanb
The cost of retrieving an element from a Dictionary is O(1), but that's because a dictionary is implemented as a hashtable - so you have to first calculate the hash value to know which element to return. Hashtables are often not that efficient - but they are good for large datasets, or datasets that have a lot of unique-hash values.
从字典中检索元素的成本是 O(1),但这是因为字典是作为哈希表实现的 - 因此您必须首先计算哈希值才能知道要返回哪个元素。哈希表通常效率不高——但它们适用于大型数据集或具有大量唯一哈希值的数据集。
The List (apart from being a rubbish word used to dercribe an array rather than a linked list!) will be faster as it will return the value by directly calculating the element you want returned.
List(除了是一个用来描述数组而不是链表的垃圾词!)会更快,因为它会通过直接计算你想要返回的元素来返回值。
回答by Mike Dunlavey
Something could take a millenium, and still be O(1).
有些事情可能需要一千年,而且仍然是 O(1)。
If you single-step through this code in the disassembly window, you will quickly come to understand what the difference is.
如果您在反汇编窗口中单步执行此代码,您将很快了解其中的区别。
回答by LBushkin
Dictionary structures are most useful when the key space is very large and cannot be mapped into a stable, sequenced order. If you can convert your keys into a simple integer in a relatively small range, you will be hard-pressed to find a data structure that will perform better than an array.
当键空间非常大且无法映射为稳定的有序顺序时,字典结构最有用。如果您可以将键转换为相对较小范围内的简单整数,您将很难找到比数组性能更好的数据结构。
On an implementation note; in .NET, dictionaries are essentially hashables. You can somewhat improve their key-lookup performance by ensuring that your keys hash into a large space of unique values. It looks like in your case, you are using a simple integer as a key (which I believe hashes to its own value) - so that may be the best you can do.
在实施说明上;在 .NET 中,字典本质上是可哈希的。您可以通过确保您的键散列到一个大的唯一值空间来稍微提高它们的键查找性能。看起来在您的情况下,您使用的是一个简单的整数作为键(我相信它会散列到它自己的值) - 所以这可能是你能做的最好的。