在 Java 中增加 Map 值的最有效方法

Question

提问by gregory

I hope this question is not considered too basic for this forum, but we'll see. I'm wondering how to refactor some code for better performance that is getting run a bunch of times.

我希望这个问题对于这个论坛来说不是太基本，但我们会看到。我想知道如何重构一些代码以获得更好的性能，这些代码运行了很多次。

Say I'm creating a word frequency list, using a Map (probably a HashMap), where each key is a String with the word that's being counted and the value is an Integer that's incremented each time a token of the word is found.

假设我正在创建一个词频列表，使用 Map（可能是 HashMap），其中每个键是一个字符串，其中包含正在计算的单词，而值是一个 Integer，每次找到单词的标记时都会递增。

In Perl, incrementing such a value would be trivially easy:

在 Perl 中，增加这样的值非常容易：

$map{$word}++;

But in Java, it's much more complicated. Here the way I'm currently doing it:

但在 Java 中，情况要复杂得多。这是我目前的做法：

int count = map.containsKey(word) ? map.get(word) : 0;
map.put(word, count + 1);

Which of course relies on the autoboxing feature in the newer Java versions. I wonder if you can suggest a more efficient way of incrementing such a value. Are there even good performance reasons for eschewing the Collections framework and using a something else instead?

这当然依赖于较新的 Java 版本中的自动装箱功能。我想知道您是否可以建议一种更有效的方法来增加这样的值。避免使用 Collections 框架并使用其他东西来代替甚至有很好的性能原因吗？

Update: I've done a test of several of the answers. See below.

更新：我已经对几个答案进行了测试。见下文。

Answer 1

采纳答案by gregory

Some test results

部分测试结果

I've gotten a lot of good answers to this question--thanks folks--so I decided to run some tests and figure out which method is actually fastest. The five methods I tested are these:

对于这个问题，我得到了很多很好的答案——谢谢大家——所以我决定运行一些测试并找出哪种方法实际上最快。我测试的五种方法是：

the "ContainsKey" method that I presented in the question
the "TestForNull" method suggested by Aleksandar Dimitrov
the "AtomicLong" method suggested by Hank Gay
the "Trove" method suggested by jrudolph
the "MutableInt" method suggested by phax.myopenid.com

我在问题中提出的“ContainsKey”方法
Aleksandar Dimitrov 建议的“TestForNull”方法
Hank Gay 建议的“AtomicLong”方法
jrudolph 建议的“Trove”方法
phax.myopenid.com 建议的“MutableInt”方法

Method

方法

Here's what I did...

这就是我所做的...

created five classes that were identical except for the differences shown below. Each class had to perform an operation typical of the scenario I presented: opening a 10MB file and reading it in, then performing a frequency count of all the word tokens in the file. Since this took an average of only 3 seconds, I had it perform the frequency count (not the I/O) 10 times.
timed the loop of 10 iterations but not the I/O operationand recorded the total time taken (in clock seconds) essentially using Ian Darwin's method in the Java Cookbook.
performed all five tests in series, and then did this another three times.
averaged the four results for each method.

创建了五个相同的类，除了下面显示的差异。每个班级都必须执行我介绍的场景中的典型操作：打开一个 10MB 的文件并将其读入，然后对文件中的所有单词进行频率计数。由于这平均只需要 3 秒，我让它执行频率计数（不是 I/O）10 次。
对 10 次迭代的循环而不是 I/O 操作进行计时，并基本上使用Java Cookbook 中的 Ian Darwin 方法记录所花费的总时间（以时钟秒为单位）。
连续执行所有五次测试，然后再执行三次。
平均每种方法的四个结果。

Results

结果

I'll present the results first and the code below for those who are interested.

我将首先向有兴趣的人展示结果和下面的代码。

The ContainsKeymethod was, as expected, the slowest, so I'll give the speed of each method in comparison to the speed of that method.

该的containsKey是，如预期的方法，最慢的，所以我给每个方法的速度相比，该方法的速度。

ContainsKey:30.654 seconds (baseline)
AtomicLong:29.780 seconds (1.03 times as fast)
TestForNull:28.804 seconds (1.06 times as fast)
Trove:26.313 seconds (1.16 times as fast)
MutableInt:25.747 seconds (1.19 times as fast)

包含密钥：30.654 秒（基线）
AtomicLong：29.780 秒（快 1.03 倍）
TestForNull：28.804 秒（快 1.06 倍）
Trove：26.313 秒（快 1.16 倍）
MutableInt：25.747 秒（快 1.19 倍）

Conclusions

结论

It would appear that only the MutableInt method and the Trove method are significantly faster, in that only they give a performance boost of more than 10%. However, if threading is an issue, AtomicLong might be more attractive than the others (I'm not really sure). I also ran TestForNull with finalvariables, but the difference was negligible.

看起来只有 MutableInt 方法和 Trove 方法明显更快，因为只有它们能提供超过 10% 的性能提升。但是，如果线程是一个问题，AtomicLong 可能比其他的更有吸引力（我不太确定）。我还使用final变量运行了 TestForNull ，但差异可以忽略不计。

Note that I haven't profiled memory usage in the different scenarios. I'd be happy to hear from anybody who has good insights into how the MutableInt and Trove methods would be likely to affect memory usage.

请注意，我没有分析不同场景中的内存使用情况。我很高兴听到任何对 MutableInt 和 Trove 方法可能如何影响内存使用有深刻见解的人的来信。

Personally, I find the MutableInt method the most attractive, since it doesn't require loading any third-party classes. So unless I discover problems with it, that's the way I'm most likely to go.

就个人而言，我发现 MutableInt 方法最有吸引力，因为它不需要加载任何第三方类。因此，除非我发现它的问题，否则我最有可能采用这种方式。

The code

编码

Here is the crucial code from each method.

这是每个方法的关键代码。

ContainsKey

包含密钥

import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
int count = freq.containsKey(word) ? freq.get(word) : 0;
freq.put(word, count + 1);

TestForNull

测试为空

import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
Integer count = freq.get(word);
if (count == null) {
    freq.put(word, 1);
}
else {
    freq.put(word, count + 1);
}

AtomicLong

原子长

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicLong;
...
final ConcurrentMap<String, AtomicLong> map = 
    new ConcurrentHashMap<String, AtomicLong>();
...
map.putIfAbsent(word, new AtomicLong(0));
map.get(word).incrementAndGet();

Trove

宝藏

import gnu.trove.TObjectIntHashMap;
...
TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
...
freq.adjustOrPutValue(word, 1, 1);

MutableInt

可变整数

import java.util.HashMap;
import java.util.Map;
...
class MutableInt {
  int value = 1; // note that we start at 1 since we're counting
  public void increment () { ++value;      }
  public int  get ()       { return value; }
}
...
Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
...
MutableInt count = freq.get(word);
if (count == null) {
    freq.put(word, new MutableInt());
}
else {
    count.increment();
}

Answer 2

回答by jrudolph

I think your solution would be the standard way, but - as you noted yourself - it is probably not the fastest way possible.

我认为您的解决方案将是标准方式，但是 - 正如您自己指出的那样 - 这可能不是最快的方式。

You may look at GNU Trove. That is a library which contains all sorts of fast primitive Collections. Your example would use a TObjectIntHashMapwhich has a method adjustOrPutValue which does exactly what you want.

你可以看看GNU Trove。这是一个包含各种快速原始集合的库。你的榜样将使用TObjectIntHashMap其中有一个方法adjustOrPutValue这不正是你想要的东西。

Answer 3

回答by Hank Gay

The various primitive wrappers, e.g., Integerare immutable so there's really not a more concise way to do what you're asking unlessyou can do it with something like AtomicLong. I can give that a go in a minute and update. BTW, Hashtableisa part of the Collections Framework.

例如，各种原始包装器Integer是不可变的，因此实际上没有更简洁的方法来完成您的要求，除非您可以使用AtomicLong 之类的东西来完成。我可以在一分钟内试一试并更新。顺便说一句，Hashtable是Collections Framework的一部分。

Answer 4

回答by tovare

There are a couple of approaches:

有几种方法：

Use a Bag alorithm like the sets contained in Google Collections.
Create mutable container which you can use in the Map:

使用 Bag 算法，如 Google Collections 中包含的集合。
创建可以在 Map 中使用的可变容器：


    class My{
        String word;
        int count;
    }

And use put("word", new My("Word") ); Then you can check if it exists and increment when adding.

并使用 put("word", new My("Word") ); 然后你可以检查它是否存在并在添加时增加。

Avoid rolling your own solution using lists, because if you get innerloop searching and sorting, your performance will stink. The first HashMap solution is actually quite fast, but a proper like that found in Google Collections is probably better.

避免使用列表滚动你自己的解决方案，因为如果你进行内循环搜索和排序，你的性能会很糟糕。第一个 HashMap 解决方案实际上非常快，但在 Google Collections 中找到的合适的解决方案可能更好。

Counting words using Google Collections, looks something like this:

使用 Google Collections 计算单词，看起来像这样：



    HashMultiset s = new HashMultiset();
    s.add("word");
    s.add("word");
    System.out.println(""+s.count("word") );

Using the HashMultiset is quite elegent, because a bag-algorithm is just what you need when counting words.

使用 HashMultiset 非常优雅，因为袋算法正是您计算单词时所需要的。

Answer 5

回答by Hank Gay

@Hank Gay

@汉克盖伊

As a follow-up to my own (rather useless) comment: Trove looks like the way to go. If, for whatever reason, you wanted to stick with the standard JDK, ConcurrentMapand AtomicLongcan make the code a tinybit nicer, though YMMV.

作为我自己的（相当无用的）评论的后续：Trove 看起来是要走的路。如果出于某种原因，你想坚持使用标准的JDK，ConcurrentMap和AtomicLong的可以使代码一个微小的一点更好，但情况因人而异。

    final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
    map.putIfAbsent("foo", new AtomicLong(0));
    map.get("foo").incrementAndGet();

will leave 1as the value in the map for foo. Realistically, increased friendliness to threading is all that this approach has to recommend it.

将1作为地图中的值保留foo。实际上，增加对线程的友好性是这种方法必须推荐的全部内容。

Answer 6

回答by Philip Helger

Another way would be creating a mutable integer:

另一种方法是创建一个可变整数：

class MutableInt {
  int value = 0;
  public void inc () { ++value; }
  public int get () { return value; }
}
...
Map<String,MutableInt> map = new HashMap<String,MutableInt> ();
MutableInt value = map.get (key);
if (value == null) {
  value = new MutableInt ();
  map.put (key, value);
} else {
  value.inc ();
}

of course this implies creating an additional object but the overhead in comparison to creating an Integer (even with Integer.valueOf) should not be so much.

当然，这意味着创建一个额外的对象，但与创建一个 Integer（即使使用 Integer.valueOf）相比，开销不应该那么多。

Answer 7

回答by Glever

Instead of calling containsKey() it is faster just to call map.get and check if the returned value is null or not.

与调用 containsKey() 相比，调用 map.get 并检查返回值是否为 null 会更快。

    Integer count = map.get(word);
    if(count == null){
        count = 0;
    }
    map.put(word, count + 1);

Answer 8

回答by Aleksandar Dimitrov

You should be aware of the fact that your original attempt

你应该意识到你最初的尝试

int count = map.containsKey(word) ? map.get(word) : 0;

contains two potentially expensive operations on a map, namely containsKeyand get. The former performs an operation potentially pretty similar to the latter, so you're doing the same work twice!

在地图上包含两个潜在的开销较大的操作，即containsKey和get。前者执行的操作可能与后者非常相似，因此您要做两次相同的工作！

If you look at the API for Map, getoperations usually return nullwhen the map does not contain the requested element.

如果您查看 Map 的 API，当地图不包含请求的元素时，get操作通常会返回null。

Note that this will make a solution like

请注意，这将产生一个类似的解决方案

map.put( key, map.get(key) + 1 );

dangerous, since it might yield NullPointerExceptions. You should check for a nullfirst.

危险，因为它可能会产生NullPointerExceptions。你应该先检查一下null。

Also note, and this is very important, that HashMaps cancontain nullsby definition. So not every returned nullsays "there is no such element". In this respect, containsKeybehaves differentlyfrom getin actually telling you whetherthere is such an element. Refer to the API for details.

另请注意，这非常重要，根据定义HashMaps可以包含nulls。所以并不是每个返回的人null都说“没有这样的元素”。在这方面，containsKey表现不同从get在实际告诉你是否有这样的元素。有关详细信息，请参阅 API。

For your case, however, you might not want to distinguish between a stored nulland "noSuchElement". If you don't want to permit nulls you might prefer a Hashtable. Using a wrapper library as was already proposed in other answers might be a better solution to manual treatment, depending on the complexity of your application.

但是，对于您的情况，您可能不想区分存储的null和“noSuchElement”。如果您不想允许nulls，您可能更喜欢Hashtable. 使用其他答案中已经提出的包装库可能是手动处理的更好解决方案，具体取决于您的应用程序的复杂性。

To complete the answer (and I forgot to put that in at first, thanks to the edit function!), the best way of doing it natively, is to getinto a finalvariable, check for nulland putit back in with a 1. The variable should be finalbecause it's immutable anyway. The compiler might not need this hint, but its clearer that way.

要完成答案（由于编辑功能，我一开始忘了输入它！），最好的本地方法是get输入一个final变量，检查null并put使用1. 变量应该是final因为它无论如何都是不可变的。编译器可能不需要这个提示，但这样更清楚。

final HashMap map = generateRandomHashMap();
final Object key = fetchSomeKey();
final Integer i = map.get(key);
if (i != null) {
    map.put(i + 1);
} else {
    // do something
}

If you do not want to rely on autoboxing, you should say something like map.put(new Integer(1 + i.getValue()));instead.

如果你不想依赖自动装箱，你应该说类似的话map.put(new Integer(1 + i.getValue()));。

Answer 9

回答by jb.

I'd use Apache Collections Lazy Map (to initialize values to 0) and use MutableIntegers from Apache Lang as values in that map.

我会使用 Apache Collections Lazy Map（将值初始化为 0）并使用来自 Apache Lang 的 MutableIntegers 作为该映射中的值。

Biggest cost is having to serach the map twice in your method. In mine you have to do it just once. Just get the value (it will get initialized if absent) and increment it.

最大的成本是必须在您的方法中两次搜索地图。在我的，你只需要做一次。只需获取值（如果不存在，它将被初始化）并增加它。

Answer 10

回答by jb.

Are you sure that this is a bottleneck? Have you done any performance analysis?

你确定这是瓶颈？你做过性能分析吗？

Try using the NetBeans profiler (its free and built into NB 6.1) to look at hotspots.

尝试使用 NetBeans 分析器（它是免费的并内置于 NB 6.1）来查看热点。

Finally, a JVM upgrade (say from 1.5->1.6) is often a cheap performance booster. Even an upgrade in build number can provide good performance boosts. If you are running on Windows and this is a server class application, use -server on the command line to use the Server Hotspot JVM. On Linux and Solaris machines this is autodetected.

最后，JVM 升级（比如从 1.5->1.6）通常是一种廉价的性能助推器。即使是内部版本号的升级也可以提供良好的性能提升。如果您在 Windows 上运行并且这是一个服务器类应用程序，请在命令行上使用 -server 以使用服务器热点 JVM。在 Linux 和 Solaris 机器上，这是自动检测到的。

Answer 11

回答by Alex Miller

@Vilmantas Baranauskas: Regarding this answer, I would comment if I had the rep points, but I don't. I wanted to note that the Counter class defined there is NOT thread-safe as it is not sufficient to just synchronize inc() without synchronizing value(). Other threads calling value() are not guaranteed to see the the value unless a happens-before relationship has been established with the update.

@Vilmantas Baranauskas：关于这个答案，如果我有代表点，我会发表评论，但我没有。我想指出，那里定义的 Counter 类不是线程安全的，因为仅同步 inc() 而不同步 value() 是不够的。其他调用 value() 的线程不能保证看到该值，除非在更新中建立了happens-before 关系。

在 Java 中增加 Map 值的最有效方法

提问by gregory

采纳答案by gregory

Some test results

部分测试结果

Method

方法

Results

结果

Conclusions

结论

The code

编码

ContainsKey

包含密钥

TestForNull

测试为空

AtomicLong

原子长

Trove

宝藏

MutableInt

可变整数

回答by jrudolph

回答by Hank Gay

回答by tovare

回答by Hank Gay

回答by Philip Helger

回答by Glever

回答by Aleksandar Dimitrov

回答by jb.

回答by jb.

回答by Alex Miller

相关推荐

最近更新

标签

在 Java 中增加 Map 值的最有效方法

提问by gregory

采纳答案by gregory

Some test results

部分测试结果

Method

方法

Results

结果

Conclusions

结论

The code

编码

ContainsKey

包含密钥

TestForNull

测试为空

AtomicLong

原子长

Trove

宝藏

MutableInt

可变整数

回答by jrudolph

回答by Hank Gay

回答by tovare

回答by Hank Gay

回答by Philip Helger

回答by Glever

回答by Aleksandar Dimitrov

回答by jb.

回答by jb.

回答by Alex Miller

相关推荐

Java Elasticsearch 内存问题

在 Java 中插入 SQL 后访问自动递增标识字段

Java 错误：包 org.apache.commons.codec.binary 不存在

如何以人类可读的方式打开 Java .class 文件？

相关推荐

最近更新

标签