java 什么是 HashMap 映射到原始类型的快速替代方案?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9967495/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is a fast alternative to HashMap for mapping to primitive types?
提问by Martinos
First of all let me tell you that i have read the following questions that has been asked before Java HashMap performance optimization / alternativeand i have a similar question.
首先让我告诉你,我已经阅读了在Java HashMap 性能优化/替代之前提出的以下问题,我有一个类似的问题。
What i want to do is take a LOT of dependencies from New york times text that will be processed by stanford parser to give dependencies and store the dependencies in a hashmap along with their scores, i.e. if i see a dependency twice i will increment the score from the hashmap by 1.
我想要做的是从纽约时报文本中获取大量依赖项,这些依赖项将由斯坦福解析器处理以提供依赖项并将依赖项及其分数存储在哈希图中,即如果我看到依赖项两次,我将增加分数从哈希图中 1。
The task starts off really quickly, about 10 sentences a second but scales off quickly. At 30 000 sentences( which is assuming 10 words in each sentence and about 3-4 dependences for each word which im storing) is about 300 000 entries in my hashmap.
任务开始得非常快,大约每秒 10 个句子,但很快就结束了。在 30 000 个句子(假设每个句子中有 10 个单词,我存储的每个单词大约有 3-4 个依赖项)时,我的哈希图中大约有 300 000 个条目。
How will i be able to increase the performance of my hashmap? What kind of hashkey can i use?
我将如何提高哈希图的性能?我可以使用什么样的哈希键?
Thanks a lot Martinos
非常感谢马蒂诺斯
EDIT 1:
编辑 1:
ok guys maybe i phrased my question wrongly ok , well the byte arrays are not used in MY project but in the similar question of another person above. I dont know what they are using it for hence thats why i asked.
好吧,伙计们,也许我错误地表达了我的问题,好吧,字节数组没有在我的项目中使用,而是在上面另一个人的类似问题中使用。我不知道他们用它做什么,所以这就是我问的原因。
secondly: i will not post code as i consider it will make things very hard to understand but here is a sample:
其次:我不会发布代码,因为我认为它会使事情变得非常难以理解,但这里有一个示例:
With sentence : "i am going to bed" i have dependencies: (i , am , -1) (i, going, -2) (i,to,-3) (am, going, -1) . . . (to,bed,-1) These dependencies of all sentences(1 000 000 sentences) will be stored in a hashmap. If i see a dependency twice i will get the score of the existing dependency and add 1.
对于句子:“我要睡觉了”我有依赖项: (i , am , -1) (i, going, -2) (i,to,-3) (am, going, -1) 。. . (to,bed,-1) 所有句子(1 000 000 个句子)的这些依赖关系将存储在一个哈希图中。如果我看到依赖项两次,我将获得现有依赖项的分数并加 1。
And that is pretty much it. All is well but the rate of adding sentences in hashmap(or retrieving) scales down on this line: dependancyBank.put(newDependancy, dependancyBank.get(newDependancy) + 1); Can anyone tell me why? Regards Martinos
差不多就是这样。一切都很好,但在 hashmap(或检索)中添加句子的速度在这条线上按比例缩小:dependancyBank.put(newDependancy,dependancyBank.get(newDependancy) + 1); 谁能告诉我为什么?问候马蒂诺斯
回答by Micha? Kosmulski
Trovehas optimized hashmaps for the case where key or value are of primitive type.
Trove已针对键或值是原始类型的情况优化了哈希映射。
However, much will still depend on smart choice of structure and hash code for your keys.
但是,很大程度上仍然取决于您对密钥的结构和哈希码的明智选择。
This part of your question is unclear: The task starts off really quickly, about 10 sentences a second but scales off quickly. At 30 000 sentences( which is assuming 10 words in each sentence and about 3-4 dependences for each word which im storing) is about 300 000 entries in my hashmap.
. But you don't say what the performance is for the larger data. Your map grows, which is kind of obvious. Hashmaps are O(1)
only in theory, in practice you will see some performance changes with size, due to less cache locality, and due to occasional jumps caused by rehashing. So, put()
and get()
times will not be constant, but still they should be close to that. Perhaps you are using the hashmap in a way which doesn't guarantee fast access, e.g. by iterating over it? In that case your time will grow linearly with size and you can't change that unless you change your algorithm.
您问题的这一部分不清楚:The task starts off really quickly, about 10 sentences a second but scales off quickly. At 30 000 sentences( which is assuming 10 words in each sentence and about 3-4 dependences for each word which im storing) is about 300 000 entries in my hashmap.
。但是您没有说明较大数据的性能。你的地图在增长,这很明显。HashmapsO(1)
只是理论上的,在实践中你会看到一些性能随着大小的变化,由于较少的缓存局部性,以及由于重新散列引起的偶尔跳转。所以,put()
和get()
时间将不是恒定的,但他们应该是接近。也许您正在以不保证快速访问的方式使用哈希图,例如通过迭代它?在这种情况下,您的时间将随大小线性增长,除非您更改算法,否则您无法更改它。
回答by bmargulies
Google 'fastutil' and you will find a superior solution for mapping object keys to scores.
谷歌“fastutil”,你会发现一个将对象键映射到分数的优秀解决方案。
回答by Rick Mangi
Take a look at the Guava multimaps: http://www.coffee-bytes.com/2011/12/22/guava-multimapsThey are designed to basically keep a list of things that all map to the same key. That might solve your need.
看看 Guava multimaps:http: //www.coffee-bytes.com/2011/12/22/guava-multimaps它们旨在基本上保留所有映射到相同键的事物的列表。那可能会解决您的需求。
回答by Peter Lawrey
How will i be able to increase the performance of my hashmap?
我将如何提高哈希图的性能?
If its taking more than 1 micro-second per get() or put(), you have a bug IMHO. You need to determine why its taking as long as it is. Even in the worst case where every object has the same hasCode, you won't have performance this bad.
如果每个 get() 或 put() 花费的时间超过 1 微秒,恕我直言,您有一个错误。您需要确定为什么它需要这么长时间。即使在每个对象都具有相同 hasCode 的最坏情况下,您的性能也不会如此糟糕。
What kind of hashkey can i use?
我可以使用什么样的哈希键?
That depends on the data type of the key. What is it?
这取决于密钥的数据类型。它是什么?
and finally what are byte[] a = new byte[2]; byte[] b = new byte[3]; in the question that was posted above?
最后什么是 byte[] a = new byte[2]; 字节[] b = 新字节[3]; 在上面发布的问题中?
They are arrays of bytes. They can be used as values to look up but its likely that you need a different value type.
它们是字节数组。它们可以用作要查找的值,但您可能需要不同的值类型。
回答by Sanket Sarang
An HashMap has an overloaded constructor which takes initial capacity as input. The scale off you see is because of rehashing during which the HashMap will virtually not be usable. To prevent frequent rehashing you need to start with a HashMap of greater initial capacity. You can also set a loading factor which indicates how much percentage do you load the hashes before rehashing.
HashMap 有一个重载的构造函数,它将初始容量作为输入。您看到的规模缩小是由于重新散列,在此期间 HashMap 几乎不可用。为了防止频繁的重新散列,您需要从具有更大初始容量的 HashMap 开始。您还可以设置一个加载因子,指示在重新散列之前加载散列的百分比。
public HashMap(int initialCapacity)
.
public HashMap(int initialCapacity)
.
Pass the initial capacity to the HashMap during object construction. It is preferable to set a capacity to almost twice the number of elements you would want to add in the map during the course of execution of your program.
在对象构建期间将初始容量传递给 HashMap。最好将容量设置为在程序执行过程中要添加到映射中的元素数量的几乎两倍。