Java 你会使用哪种数据结构:TreeMap 还是 HashMap?(爪哇)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/302371/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Which data structure would you use: TreeMap or HashMap? (Java)
提问by JohnZaj
Description |A Java program to read a text file and print each of the unique words in alphabetical order together with the number of times the word occurs in the text.
说明 | 一个 Java 程序,用于读取文本文件并按字母顺序打印每个唯一的单词以及该单词在文本中出现的次数。
The program should declare a variable of type Map<String, Integer>
to store the words and corresponding frequency of occurrence. Which concrete type, though? TreeMap<String, Number>
or HashMap<String, Number>
?
程序应该声明一个类型变量Map<String, Integer>
来存储单词和相应的出现频率。但是,哪种具体类型?TreeMap<String, Number>
或者HashMap<String, Number>
?
The input should be converted to lower case.
输入应转换为小写。
A word does not contain any of these characters: \t\t\n]f.,!?:;\"()'
一个单词不包含以下任何字符: \t\t\n]f.,!?:;\"()'
Example output |
示例输出 |
Word Frequency
a 1
and 5
appearances 1
as 1
.
.
.
Remark |I know, I've seen elegant solutions to this in Perl with roughly two lines of code. However, I want to see it in Java.
备注 | 我知道,我已经用大约两行代码在 Perl 中看到了优雅的解决方案。但是,我想在 Java 中看到它。
Edit: Oh yeah, it be helpful to show an implementation using one of these structures (in Java).
编辑:哦,是的,使用这些结构之一(在 Java 中)显示实现会很有帮助。
采纳答案by Jon Skeet
TreeMap
seems a no-brainer to me - simply because of the "in alphabetical order" requirement. HashMap
has no ordering when you iterate through it; TreeMap
iterates in the natural key order.
TreeMap
对我来说似乎很简单 - 仅仅是因为“按字母顺序”的要求。HashMap
遍历它时没有排序;TreeMap
以自然键顺序迭代。
EDIT: I think Konrad's comment may have been suggesting "use HashMap
, then sort." This is good because although we'll have N iterations initially, we'll have K <= N keys by the end due to duplicates. We might as well save the expensive bit (sorting) until the end when we've got fewer keys than take the small-but-non-constant hit of keeping it sorted as we go.
编辑:我认为康拉德的评论可能是在暗示“使用HashMap
,然后排序”。这很好,因为虽然我们最初会有 N 次迭代,但由于重复,我们最终会有 K <= N 个键。我们不妨将昂贵的位(排序)保存到最后,当我们得到更少的键时,而不是在我们进行时保持排序的小但非常规的命中。
Having said that, I'm sticking to my answer for the moment: because it's the simplestway of achieving the goal. We don't really know that the OP is particularly worried about performance, but the question implies that he's concerned about the elegance and brevity. Using a TreeMap
makes this incredibly brief, which appeals to me. I suspect that if performance is really an issue, there may be a better way of attacking it than either TreeMap
or HashMap
:)
话虽如此,我暂时坚持我的答案:因为这是实现目标的最简单方法。我们真的不知道 OP 特别担心性能,但这个问题暗示他关心优雅和简洁。使用 aTreeMap
使这非常简短,这对我很有吸引力。我怀疑,如果性能是一个真正的问题,有可能是攻击它比任何一个更好的方式TreeMap
或HashMap
:)
回答by JodaStephen
TreeMap beats HashMap because TreeMap is already sorted for you.
TreeMap 胜过 HashMap,因为 TreeMap 已经为你排序了。
However, you might want to consider using a more appropriate data structure, a bag. See Commons Collections- and the TreeBagclass:
但是,您可能要考虑使用更合适的数据结构,包。参见 Commons Collections- 和TreeBag类:
This has a nice optimised internal structure and API:
这有一个很好的优化内部结构和 API:
bag.add("big")
bag.add("small")
bag.add("big")
int count = bag.getCount("big")
EDIT: The question of HashMap vs TreeMap performance was answered by Jon - HashMap and sort may be quicker (try it!), but TreeBag is easier. The same is true for bags. There is a HashBag as well as a TreeBag. Based on the implementation (uses a mutable integer) a bag should outperform the equivalent plain map of Integer. The only way to know for sure is to test, as with any performance question.
编辑:Jon 回答了 HashMap 与 TreeMap 性能的问题 - HashMap 和排序可能更快(试试看!),但 TreeBag 更容易。包包也是如此。有一个HashBag 和一个TreeBag。基于实现(使用可变整数),包应该优于 Integer 的等效普通映射。确定知道的唯一方法是测试,就像任何性能问题一样。
回答by CAdaker
回答by matt b
Why not use TreeSet?
为什么不使用TreeSet?
Same ordering concept as a TreeMap, except it's a Set - which, by definition, is "A collection that contains no duplicate elements".
与 TreeMap 相同的排序概念,除了它是一个 Set - 根据定义,它是“不包含重复元素的集合”。
From your problem description, it sounds as if you need a Set, I don't see what keys and values you are mapping together.
从您的问题描述来看,听起来好像您需要一个 Set,我看不到您将哪些键和值映射在一起。
This class implements the Set interface, backed by a TreeMap instance. This class guarantees that the sorted set will be in ascending element order, sorted according to the natural order of the elements (see Comparable), or by the comparator provided at set creation time, depending on which constructor is used.
此类实现 Set 接口,由 TreeMap 实例支持。此类保证已排序的集合将按元素升序排列,根据元素的自然顺序(参见 Comparable)排序,或按集合创建时提供的比较器排序,具体取决于使用的构造函数。
回答by erickson
You can't assign a TreeMap<String,Number>
to a variable with the type Map<String,Integer>
. Double
, Long
, etc. can be "put" into a TreeMap<String,Number>
. When I "get" a value from a Map<String,Integer>
, it must be an Integer
.
您不能将 a 分配给TreeMap<String,Number>
类型为 的变量Map<String,Integer>
。Double
,Long
等可以“放入”到TreeMap<String,Number>
. 当我从 a 中“获取”一个值时Map<String,Integer>
,它必须是一个Integer
.
Completely ignoring any i18n issues, memory constraints, and error handling, here goes:
完全忽略任何 i18n 问题、内存限制和错误处理,这里是:
class Counter {
public static void main(String... argv)
throws Exception
{
FileChannel fc = new FileInputStream(argv[0]).getChannel();
ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
CharBuffer cb = Charset.defaultCharset().decode(bb);
Pattern p = Pattern.compile("[^ \t\r\n\f.,!?:;\"()']+");
Map<String, Integer> counts = new TreeMap<String, Integer>();
Matcher m = p.matcher(cb);
while (m.find()) {
String word = m.group();
Integer count = counts.get(word);
count = (count == null) ? 1 : count + 1;
counts.put(word, count);
}
fc.close();
for (Map.Entry<String, Integer> e : counts.entrySet()) {
System.out.printf("%s: %d%n", e.getKey(), e.getValue());
}
}
}
回答by erickson
Hash map should be much faster. You should not choose a container based on how you want the items to be arranged eventually; Just sort the list of (word, frequency)-pairs at the end. There will usually be less such pairs to be sorted than words in the files, so asymptotic (and real) performance with a hash map will be better.
哈希映射应该快得多。您不应该根据您希望物品最终如何排列来选择容器;只需对最后的(词,频率)对列表进行排序。与文件中的单词相比,要排序的此类对通常更少,因此使用哈希映射的渐近(和真实)性能会更好。
回答by erickson
I would definitely choose a TreeMap:
我肯定会选择 TreeMap:
- TreeMap automatically sorts new keys on insertion, no sorting afterwards is needed.
- When a key already exists it has the same performance as a HashMap.
- TreeMap 在插入时自动对新键进行排序,之后不需要排序。
- 当一个键已经存在时,它的性能与 HashMap 相同。
A TreeSet internally uses a TreeMap so why not use TreeMap directly.
TreeSet 内部使用 TreeMap,所以为什么不直接使用 TreeMap。
回答by coderz
"When a key already exists it has the same performance as a HashMap." - That is just plain wrong. HashMap has O(1) insertion and TreeMap O(n log n). It'll take at least n log n checks to find out if it's in the table!
“当一个键已经存在时,它的性能与 HashMap 相同。” - 那完全是错误的。HashMap 有 O(1) 插入和 TreeMap O(n log n)。至少需要 n log n 检查才能确定它是否在表中!
回答by G Kumar
consider the frequency of addition or deletion to the data structure. TreeMap would not be ideal if it is high. Apart from the search for existing entry nLn it also undergoes frequent rebalancing.
考虑添加或删除数据结构的频率。如果它很高,TreeMap 就不理想了。除了搜索现有条目 nLn 之外,它还经常进行重新平衡。
on the other hand Hash structures are bit flamboyant on memory (over allocates). If you can bite that bullet then go for hash structure and sort when required.
另一方面,哈希结构在内存上有点夸张(过度分配)。如果您可以咬紧牙关,那么请在需要时使用哈希结构并进行排序。
回答by Anit Singh
Basically it depend on the requirement. Sometimes hash map is good sometimes treemap. but hash map is better to use only their is some constraint for overhead to sort it.
基本上它取决于要求。有时哈希图很好,有时树图很好。但是哈希映射最好仅使用它们的一些约束来对其进行排序。