与 ArrayList 相比，Java HashMap 的内存开销

Question

提问by elhoim

I am wondering what is the memory overhead of java HashMap compared to ArrayList?

我想知道与 ArrayList 相比，java HashMap 的内存开销是多少？

Update:

更新：

I would like to improve the speed for searching for specific values of a big pack (6 Millions+) of identical objects.

我想提高搜索一大包（6 百万+）相同对象的特定值的速度。

Thus, I am thinking about using one or several HashMap instead of using ArrayList. But I am wondering what is the overhead of HashMap.

因此，我正在考虑使用一个或多个 HashMap 而不是使用 ArrayList。但我想知道 HashMap 的开销是多少。

As far as i understand, the key is not stored, only the hash of the key, so it should be something like size of the hash of the object + one pointer.

据我了解，密钥没有存储，只有密钥的散列，所以它应该是对象散列的大小+一个指针。

But what hash function is used? Is it the one offered by Objector another one?

但是使用了什么哈希函数？它是由 Object 提供的还是另一个？

Answer 1

采纳答案by Tim Cooper

If you're comparing HashMap with ArrayList, I presume you're doing some sort of searching/indexing of the ArrayList, such as binary search or custom hash table...? Because a .get(key) thru 6 million entries would be infeasible using a linear search.

如果您将 HashMap 与 ArrayList 进行比较，我认为您正在对 ArrayList 进行某种搜索/索引，例如二进制搜索或自定义哈希表......？因为 .get(key) 到 600 万个条目使用线性搜索是不可行的。

Using that assumption, I've done some empirical tests and come up with the conclusion that "You can store 2.5 times as many small objects in the same amount of RAM if you use ArrayList with binary search or custom hash map implementation, versus HashMap". My test was based on small objects containing only 3 fields, of which one is the key, and the key is an integer. I used a 32bit jdk 1.6. See below for caveats on this figure of "2.5".

使用这个假设，我做了一些实证测试，得出的结论是“如果将 ArrayList 与二进制搜索或自定义哈希映射实现相比，与 HashMap 相比，您可以在相同数量的 RAM 中存储 2.5 倍的小对象” . 我的测试是基于只包含3个字段的小对象，其中一个是key，key是一个整数。我使用了 32 位 jdk 1.6。有关“2.5”这个数字的注意事项，请参见下文。

The key things to note are:

需要注意的关键事项是：

(a) it's not the space required for references or "load factor" that kills you, but rather the overhead required for object creation. If the key is a primitive type, or a combination of 2 or more primitive or reference values, then each key will require its own object, which carries an overhead of 8 bytes.

(a) 不是引用或“加载因子”所需的空间会杀死您，而是创建对象所需的开销。如果键是原始类型，或者是 2 个或多个原始值或引用值的组合，则每个键都需要自己的对象，该对象带有 8 个字节的开销。

(b) In my experience you usually need the key as part of the value, (e.g. to store customer records, indexed by customer id, you still want the customer id as part of the Customer object). This means it is IMO somewhat wasteful that a HashMap separately stores references to keys and values.

(b) 根据我的经验，您通常需要将键作为值的一部分（例如，要存储客户记录，按客户 ID 索引，您仍然希望客户 ID 作为 Customer 对象的一部分）。这意味着 IMO 将 HashMap 单独存储对键和值的引用有点浪费。

Caveats:

注意事项：

The most common type used for HashMap keys is String. The object creation overhead doesn't apply here so the difference would be less.
I got a figure of 2.8, being 8880502 entries inserted into the ArrayList compared with 3148004 into the HashMap on -Xmx256M JVM, but my ArrayList load factor was 80% and my objects were quite small - 12 bytes plus 8 byte object overhead.
My figure, and my implementation, requires that the key is contained within the value, otherwise I'd have the same problem with object creation overhead and it would be just another implementation of HashMap.

用于 HashMap 键的最常见类型是 String。对象创建开销在这里不适用，因此差异会更小。
我得到了 2.8 的数字，即 8880502 个条目插入到 ArrayList 中，而 3148004 个条目插入到 -Xmx256M JVM 上的 HashMap，但我的 ArrayList 加载因子是 80%，我的对象非常小 - 12 字节加上 8 字节的对象开销。
我的图和我的实现要求键包含在值中，否则我会遇到与对象创建开销相同的问题，它只是 HashMap 的另一个实现。

My code:

我的代码：

public class Payload {
    int key,b,c;
    Payload(int _key) { key = _key; }
}


import org.junit.Test;

import java.util.HashMap;
import java.util.Map;


public class Overhead {
    @Test
    public void useHashMap()
    {
        int i=0;
        try {
            Map<Integer, Payload> map = new HashMap<Integer, Payload>();
            for (i=0; i < 4000000; i++) {
                int key = (int)(Math.random() * Integer.MAX_VALUE);
                map.put(key, new Payload(key));
            }
        }
        catch (OutOfMemoryError e) {
            System.out.println("Got up to: " + i);
        }
    }

    @Test
    public void useArrayList()
    {
        int i=0;
        try {
            ArrayListMap map = new ArrayListMap();
            for (i=0; i < 9000000; i++) {
                int key = (int)(Math.random() * Integer.MAX_VALUE);
                map.put(key, new Payload(key));
            }
        }
        catch (OutOfMemoryError e) {
            System.out.println("Got up to: " + i);
        }
    }
}


import java.util.ArrayList;


public class ArrayListMap {
    private ArrayList<Payload> map = new ArrayList<Payload>();
    private int[] primes = new int[128];

    static boolean isPrime(int n)
    {
        for (int i=(int)Math.sqrt(n); i >= 2; i--) {
            if (n % i == 0)
                return false;
        }
        return true;
    }

    ArrayListMap()
    {
        for (int i=0; i < 11000000; i++)    // this is clumsy, I admit
            map.add(null);
        int n=31;
        for (int i=0; i < 128; i++) {
            while (! isPrime(n))
                n+=2;
            primes[i] = n;
            n += 2;
        }
        System.out.println("Capacity = " + map.size());
    }

    public void put(int key, Payload value)
    {
        int hash = key % map.size();
        int hash2 = primes[key % primes.length];
        if (hash < 0)
            hash += map.size();
        do {
            if (map.get(hash) == null) {
                map.set(hash, value);
                return;
            }
            hash += hash2;
            if (hash >= map.size())
                hash -= map.size();
        } while (true);
    }

    public Payload get(int key)
    {
        int hash = key % map.size();
        int hash2 = primes[key % primes.length];
        if (hash < 0)
            hash += map.size();
        do {
            Payload payload = map.get(hash);
            if (payload == null)
                return null;
            if (payload.key == key)
                return payload;
            hash += hash2;
            if (hash >= map.size())
                hash -= map.size();
        } while (true);
    }
}

Answer 2

回答by Jon Skeet

The simplest thing would be to look at the source and work it out that way. However, you're really comparing apples and oranges - lists and maps are conceptually quite distinct. It's rare that you would choose between them on the basis of memory usage.

最简单的方法是查看源代码并以这种方式进行处理。但是，您实际上是在比较苹果和橙子——列表和地图在概念上是截然不同的。您很少会根据内存使用情况在它们之间进行选择。

What's the background behind this question?

这个问题背后的背景是什么？

Answer 3

回答by Malaxeur

I don't know the exact number, but HashMaps are much heavier. Comparing the two, ArrayList's internal representation is self evident, but HashMaps retain Entry objects (Entry) which can balloon your memory consumption.

我不知道确切的数字，但 HashMaps 重得多。比较两者，ArrayList 的内部表示是不言而喻的，但是 HashMaps 保留了 Entry 对象（Entry），这会增加您的内存消耗。

It's not that much larger, but it's larger. A great way to visualize this would be with a dynamic profiler such as YourKitwhich allows you to see all heap allocations. It's pretty nice.

它不是那么大，但它更大。一个很好的可视化方法是使用动态分析器，例如YourKit，它允许您查看所有堆分配。这很不错。

Answer 4

回答by aperkins

As Jon Skeet noted, these are completely different structures. A map (such as HashMap) is a mapping from one value to another - i.e. you have a key that maps to a value, in a Key->Value kind of relationship. The key is hashed, and is placed in an array for quick lookup.

正如 Jon Skeet 所指出的，这些是完全不同的结构。映射（例如 HashMap）是从一个值到另一个值的映射 - 即您有一个映射到一个值的键，处于 Key->Value 类型的关系中。键被散列，并被放置在一个数组中以便快速查找。

A List, on the other hand, is a collection of elements with order - ArrayList happens to use an array as the back end storage mechanism, but that is irrelevant. Each indexed element is a single element in the list.

另一方面，List 是具有顺序的元素的集合 - ArrayList 恰好使用数组作为后端存储机制，但这无关紧要。每个索引元素都是列表中的一个元素。

edit: based on your comment, I have added the following information:

编辑：根据您的评论，我添加了以下信息：

The key is stored in a hashmap. This is because a hash is not guaranteed to be unique for any two different elements. Thus, the key has to be stored in the case of hashing collisions. If you simply want to see if an element exists in a set of elements, use a Set (the standard implementation of this being HashSet). If the order matters, but you need a quick lookup, use a LinkedHashSet, as it keeps the order the elements were inserted. The lookup time is O(1) on both, but the insertion time is slightly longer on a LinkedHashSet. Use a Map only if you are actually mapping from one value to another - if you simply have a set of unique objects, use a Set, if you have ordered objects, use a List.

密钥存储在哈希图中。这是因为不能保证任何两个不同元素的哈希都是唯一的。因此，必须在散列冲突的情况下存储密钥。如果您只是想查看某个元素是否存在于一组元素中，请使用 Set（此方法的标准实现是 HashSet）。如果顺序很重要，但您需要快速查找，请使用 LinkedHashSet，因为它保持元素插入的顺序。两者的查找时间都是 O(1)，但 LinkedHashSet 的插入时间稍长。仅当您实际从一个值映射到另一个值时才使用 Map - 如果您只有一组唯一对象，请使用 Set，如果您已订购对象，请使用 List。

Answer 5

回答by reccles

Hashmaps try to maintain a load factor (usually 75% full), you can think of a hashmap as a sparsely filled array list. The problem in a straight up comparison in size is this load factor of the map grows to meet the size of the data. ArrayList on the other hand grows to meet it's need by doubling it's internal array size. For relatively small sizes they are comparable, however as you pack more and more data into the map it requires a lot of empty references in order to maintain the hash performance.

Hashmaps 尝试保持负载因子（通常为 75% 满），您可以将 hashmap 视为稀疏填充的数组列表。直接比较大小的问题是地图的加载因子会增加以满足数据的大小。另一方面，ArrayList 通过将其内部数组大小增加一倍来满足其需求。对于相对较小的大小，它们具有可比性，但是随着您将越来越多的数据打包到映射中，它需要大量的空引用以保持散列性能。

In either case I recommend priming the expected size of the data before you start adding. This will give the implementations a better initial setting and will likely consume less over all in both cases.

在任何一种情况下，我都建议在开始添加之前准备数据的预期大小。这将为实现提供更好的初始设置，并且在两种情况下都可能消耗更少。

Update:

更新：

based on your updated problem check out Glazed lists. This is a neat little tool written by some of the Google people for doing operations similar to the one you describe. It's also very quick. Allows clustering, filtering, searching, etc.

根据您更新的问题，请查看Glazed 列表。这是一个简洁的小工具，由 Google 的一些人编写，用于执行与您描述的操作类似的操作。它也非常快。允许聚类、过滤、搜索等。

Answer 6

回答by sanscore

I don't have an answer for you either, but a quick google search turned up a function in Java that might help.

我也没有给你答案，但快速的谷歌搜索在 Java 中找到了一个可能有帮助的函数。

Runtime.getRuntime().freeMemory();

So I propose that you populate a HashMap and an ArrayList with the same data. Record the free memory, delete the first object, record memory, delete the second object, record the memory, compute the differences,..., profit!!!

因此，我建议您使用相同的数据填充 HashMap 和 ArrayList。记录空闲内存，删除第一个对象，记录内存，删除第二个对象，记录内存，计算差异，...，利润！！！

You should probably do this with magnitudes of data. ie Start with 1000, then 10000, 100000, 1000000.

您可能应该对大量数据执行此操作。即从 1000 开始，然后是 10000、100000、1000000。

EDIT:Corrected, thanks to amischiefr.

编辑：更正，感谢amischiefr。

EDIT: Sorry for editing your post, but this is pretty important if you are going to use this (and It's a little much for a comment) . freeMemory does not work like you think it would. First, it's value is changed by garbage collection. Secondly, it's value is changed when java allocates more memory. Just using the freeMemory call alone doesn't provide useful data.

编辑：很抱歉编辑您的帖子，但是如果您要使用它，这非常重要（并且评论有点多）。freeMemory 不像你想象的那样工作。首先，垃圾回收改变了它的值。其次，当java分配更多内存时，它的值会改变。仅使用 freeMemory 调用并不能提供有用的数据。

Try this:

尝试这个：

public static void displayMemory() {
    Runtime r=Runtime.getRuntime();
    r.gc();
    r.gc(); // YES, you NEED 2!
    System.out.println("Memory Used="+(r.totalMemory()-r.freeMemory()));
}

Or you can return the memory used and store it, then compare it to a later value. Either way, remember the 2 gcs and subtracting from totalMemory().

或者您可以返回使用的内存并存储它，然后将其与以后的值进行比较。无论哪种方式，请记住 2 个 gcs 并从 totalMemory() 中减去。

Again, sorry to edit your post!

再次，抱歉编辑您的帖子！

Answer 7

回答by OscarRyz

HashMaphold a reference to the value and a reference to the key.

HashMap持有对值的引用和对键的引用。

ArrayListjust hold a reference to the value.

ArrayList只保存对值的引用。

So, assuming that the key uses the same memory of the value, HashMap uses 50% more memory ( although strictly speaking , is not the HashMap who uses that memory because it just keep a reference to it )

所以，假设键使用值的相同内存，HashMap 使用了 50% 以上的内存（虽然严格来说，不是 HashMap 使用该内存，因为它只是保持对它的引用）

In the other hand HashMap provides constant-time performance for the basic operations (get and put)So, although it may use more memory, getting an element may be much faster using a HashMap than a ArrayList.

另一方面，HashMap 为基本操作（get 和 put）提供恒定时间性能。因此，虽然它可能使用更多内存，但使用 HashMap 获取元素可能比使用 ArrayList 快得多。

So, the next thing you should do is not to care about who uses more memorybut what are they good for.

所以，接下来你应该做的不是关心谁使用了更多的内存，而是他们有什么用。

Using the correct data structure for your program saves more CPU/memory than how the library is implemented underneath.

为您的程序使用正确的数据结构比在下面实现库的方式节省更多的 CPU/内存。

EDIT

编辑

After Grant Welch answer I decided to measure for 2,000,000 integers.

在格兰特韦尔奇回答之后，我决定测量 2,000,000 个整数。

Here's the source code

这是源代码

This is the output

这是输出

$
$javac MemoryUsage.java  
Note: MemoryUsage.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
$java -Xms128m -Xmx128m MemoryUsage 
Using ArrayListMemoryUsage@8558d2 size: 0
Total memory: 133.234.688
Initial free: 132.718.608
  Final free: 77.965.488

Used: 54.753.120
Memory Used 41.364.824
ArrayListMemoryUsage@8558d2 size: 2000000
$
$java -Xms128m -Xmx128m MemoryUsage H
Using HashMapMemoryUsage@8558d2 size: 0
Total memory: 133.234.688
Initial free: 124.329.984
  Final free: 4.109.600

Used: 120.220.384
Memory Used 129.108.608
HashMapMemoryUsage@8558d2 size: 2000000

Answer 8

回答by Avrom

Basically, you should be using the "right tool for the job". Since there are different instances where you'll need a key/value pair (where you may use a HashMap) and different instances where you'll just need a list of values (where you may use a ArrayList) then the question of "which one uses more memory", in my opinion, is moot, since it is not a consideration of choosing one over the other.

基本上，您应该使用“适合工作的工具”。由于在不同的情况下您需要一个键/值对（您可以使用 a HashMap）和不同的情况您只需要一个值列表（您可以使用 a ArrayList）那么“哪个使用更多的记忆”，在我看来，是没有实际意义的，因为这不是选择一个而不是另一个的考虑。

But to answer the question, since HashMapstores key/value pairs while ArrayListstores just values, I would assume that the addition of keys alone to the HashMap would mean that it takes up more memory, assuming, of course, we are comparing them by the same value type(e.g. where the values in both are Strings).

但是为了回答这个问题，由于HashMap存储键/值对而ArrayList只存储值，我会假设单独将键添加到 HashMap 将意味着它占用更多内存，当然，假设我们通过相同的方式比较它们值类型（例如，两者中的值都是字符串）。

Answer 9

回答by Dean J

If you're considering two ArrayLists vs one Hashmap, it's indeterminate; both are partially-full data structures. If you were comparing Vector vs Hashtable, Vector is probably more memory efficient, because it only allocates the space it uses, whereas Hashtables allocate more space.

如果您正在考虑使用两个 ArrayList 与一个 Hashmap，则这是不确定的；两者都是部分完整的数据结构。如果您比较 Vector 与 Hashtable，Vector 可能更有效地节省内存，因为它只分配它使用的空间，而 Hashtables 分配更多的空间。

If you need a key-value pair and aren't doing incredibly memory-hungry work, just use the Hashmap.

如果您需要一个键值对并且没有做非常耗内存的工作，只需使用 Hashmap。

Answer 10

回答by Bill K

All that is stored in either is pointers. Depending on your architecture a pointer should be 32 or 64 bits (or more or less)

所有存储在两者中的都是指针。根据您的架构，指针应该是 32 位或 64 位（或更多或更少）

An array list of 10 tends to allocate 10 "Pointers" at a minimum (and also some one-time overhead stuff).

一个 10 的数组列表倾向于至少分配 10 个“指针”（还有一些一次性开销的东西）。

A map has to allocate twice that (20 pointers) because it stores two values at a time. Then on top of that, it has to store the "Hash". which should be bigger than the map, at a loading of 75% it SHOULD be around 13 32-bit values (hashes).

映射必须分配两倍（20 个指针），因为它一次存储两个值。然后最重要的是，它必须存储“哈希”。它应该比地图大，在加载 75% 时，它应该是大约 13 个 32 位值（哈希）。

so if you want an offhand answer, the ratio should be about 1:3.25 or so, but you are only talking pointer storage--very small unless you are storing a massive number of objects--and if so, the utility of being able to reference instantly (HashMap) vs iterate (array) should be MUCH more significant than the memory size.

所以如果你想要一个即兴的答案，这个比例应该是大约 1:3.25 左右，但你只是在谈论指针存储——非常小，除非你要存储大量的对象——如果是这样，能够的效用立即引用（HashMap）与迭代（数组）应该比内存大小重要得多。

Oh, also: Arrays can be fit to the exact size of your collection. HashMaps can as well if you specify the size, but if it "Grows" beyond that size, it will re-allocate a larger array and not use some of it, so there can be a little waste there as well.

哦，还有：数组可以适合您集合的确切大小。如果您指定大小，HashMaps 也可以，但如果它“增长”超过该大小，它将重新分配一个更大的数组而不使用其中的一些，因此那里也可能会有一些浪费。

Answer 11

回答by matt b

I think the wrong question is being asked here.

我认为这里提出了错误的问题。

If you would like to improve the speed at which you can search for an object in a Listcontaining six million entries, then you should look into how fastthese datatype's retrieval operations perform.

如果你想改善你可以搜索一个物体的速度List包含六个万个条目，那么你应该看看有多快，这些数据类型的检索操作执行。

As usual, the Javadocs for these classes state pretty plainly what type of performance they offer:

像往常一样，这些类的 Javadoc 非常清楚地说明了它们提供的性能类型：

HashMap:

哈希映射：

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.

此实现为基本操作（get 和 put）提供恒定时间性能，假设散列函数在存储桶中正确分散元素。

This means that HashMap.get(key) is O(1).

这意味着 HashMap.get(key) 是O(1)。

ArrayList:

数组列表：

The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking).

size、isEmpty、get、set、iterator 和 listIterator 操作在恒定时间内运行。add 操作在分摊常数时间内运行，即添加 n 个元素需要 O(n) 时间。所有其他操作都在线性时间内运行（粗略地说）。

This means that most of ArrayList's operations are O(1), but likely not the ones that you would be using to find objects that match a certain value.

这意味着的大部分ArrayList操作都是O(1)，但可能不是您用来查找与某个值匹配的对象的操作。

If you are iterating over every element in the ArrayListand testing for equality, or using contains(), then this means that your operation is running at O(n)time (or worse).

如果您正在迭代中的每个元素ArrayList并测试相等性或使用contains()，则这意味着您的操作正在运行O(n)（或更糟）。

If you are unfamiliar with O(1)or O(n)notation, this is referring to how long an operation will take. In this case, if you can get constant-time performance, you want to take it. If HashMap.get()is O(1)this means that retrieval operations take roughly the same amount of time regardlessof how many entries are in the Map.

如果您不熟悉O(1)或O(n)符号，这是指操作需要多长时间。在这种情况下，如果您可以获得恒定时间的性能，您就想拿下它。如果HashMap.get()是，O(1)这意味着无论Map 中有多少条目，检索操作花费的时间大致相同。

The fact that something like ArrayList.contains()is O(n)means that the amount of time it takes grows as the size of the list grows; so iterating thru an ArrayListwith six million entries will not be very effective at all.

类似于ArrayList.contains()is的事实O(n)意味着它花费的时间随着列表大小的增加而增加；因此，通过ArrayList600 万个条目进行迭代根本不会非常有效。

与 ArrayList 相比，Java HashMap 的内存开销

提问by elhoim

采纳答案by Tim Cooper

回答by Jon Skeet

回答by Malaxeur

回答by aperkins

回答by reccles

回答by sanscore

回答by OscarRyz

回答by Avrom

回答by Dean J

回答by Bill K

回答by matt b

相关推荐

最近更新

标签

与 ArrayList 相比，Java HashMap 的内存开销

提问by elhoim

采纳答案by Tim Cooper

回答by Jon Skeet

回答by Malaxeur

回答by aperkins

回答by reccles

回答by sanscore

回答by OscarRyz

回答by Avrom

回答by Dean J

回答by Bill K

回答by matt b

相关推荐

用Java将单词拆分为字母

Java 如果布尔值为真，如何选中复选框？

如何从 Java 中的“double”类型的值中删除十进制值

如何在Java中找到数组中元素的索引？

相关推荐

最近更新

标签