java 为哈希选择合适的表大小

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/286058/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 11:46:25  来源:igfitidea点击:

Chosing a suitable table size for a Hash

javahashhashtable

提问by kylex

If I have a key set of 1000, what is a suitable size for my Hash table, and how is that determined?

如果我的密钥集为 1000,那么适合我的哈希表的大小是多少,如何确定?

采纳答案by Bill the Lizard

It depends on the load factor (the "percent full" point where the table will increase its size and re-distribute its elements). If you know you have exactly 1000 entries, and that number will never change, you can just set the load factor to 1.0 and the initial size to 1000 for maximum efficiency. If you weren't sure of the exact size, you could leave the load factor at its default of 0.75 and set your initial size to 1334 (expected size/LF) for reallygood performance, at a cost of extra memory.

这取决于加载因子(表格将增加其大小并重新分配其元素的“已满百分比”点)。如果您知道您正好有 1000 个条目,并且这个数字永远不会改变,您可以将加载因子设置为 1.0,将初始大小设置为 1000 以获得最大效率。如果您不确定确切的大小,您可以将负载因子保留为默认值 0.75,并将初始大小设置为 1334(预期大小/LF)以获得非常好的性能,但需要额外的内存。

You can use the following constructor to set the load factor:

您可以使用以下构造函数来设置负载因子:

Hashtable(int initialCapacity, float loadFactor) 

回答by EvilTeach

You need to factor in the hash function as well.

您还需要考虑散列函数。

one rule of thumb suggests make the table size about double, so that there is room to expand, and hopefully keep the number of collisions small.

一个经验法则建议将表大小增加一倍,以便有扩展的空间,并希望保持较小的碰撞次数。

Another rule of thumb is to assume that you are doing some sort of modulo related hashing, then round your table size up to the next largest prime number, and use that prime number as the modulo value.

另一个经验法则是假设您正在执行某种与模相关的散列,然后将您的表大小四舍五入到下一个最大的素数,并将该素数用作模值。

What kind of things are you hashing? More detail should generate better advice.

你在散列什么样的东西?更多的细节应该会产生更好的建议。

回答by ReneS

Let it grow. With this size, the automatic handling is fine. Other than that, 2 x size + 1 is a simple formula. Prime numbers are also kind of good, but as soon as your data set reaches a certain size, the hash implementation might decide to rehash and grow the table.

让它成长。有了这个尺寸,自动处理就可以了。除此之外,2 x size + 1 是一个简单的公式。质数也不错,但是一旦您的数据集达到特定大小,散列实现可能会决定重新散列并扩大表。

Your keys are driving the effectiveness and are hopefully distinct enough.

您的密钥正在提高效率,并且希望足够清晰。

Bottom line: Ask the size question when you have problems such as size or slow performance, other than that: Do not worry!

底线:当您遇到大小或性能缓慢等问题时,请询问大小问题,除此之外:别担心!

回答by sblundy

There's some discussion of these factors in the documentation for Hashtable

在文档中对这些因素进行了一些讨论 Hashtable

回答by fulmicoton

Twice is good.

两次就好了。

You don't have a big keyset. Don't bother about difficult discussions about your HashTable implementation, and go for 2000.

你没有大键组。不要为关于你的 HashTable 实现的困难讨论而烦恼,去 2000。

回答by Terry Lacy

I'd like to reiterate what https://stackoverflow.com/users/33229/wwwflickrcomphotosrene-germanysaid above. 1000 doesn't seem like a very big hash to me. I've been using a lot of hashtables about that size in java without seeing much in the way of performance problems. And I hardly ever muck about with the size or load factor.

我想重申https://stackoverflow.com/users/33229/wwwflickrcomphotosrene-germany上面所说的。1000 对我来说似乎不是一个很大的哈希值。我在 java 中使用了很多关于这个大小的哈希表,但没有看到太多的性能问题。而且我几乎从不考虑大小或负载系数。

If you've run a profiler on your code and determined that the hashtable is your problem, then by all means start tweaking. Otherwise, I wouldn't assume you've got a problem until you're sure.

如果您在代码上运行了分析器并确定哈希表是您的问题,那么一定要开始调整。否则,在您确定之前,我不会假设您有问题。

After all, in most code, the performance problem isn't where you think it is. I try not to anticipate.

毕竟,在大多数代码中,性能问题并不在您认为的地方。我尽量不去预料。