Java HashMap 性能优化/替代

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1757363/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 22:22:14  来源:igfitidea点击:

Java HashMap performance optimization / alternative

javaperformanceoptimizationmaphashmap

提问by nash

I want to create a large HashMap but the put()performance is not good enough. Any ideas?

我想创建一个大的 HashMap 但put()性能不够好。有任何想法吗?

Other data structure suggestions are welcome but I need the lookup feature of a Java Map:

欢迎其他数据结构建议,但我需要 Java Map 的查找功能:

map.get(key)

map.get(key)

In my case I want to create a map with 26 million entries. Using the standard Java HashMap the put rate becomes unbearably slow after 2-3 million insertions.

就我而言,我想创建一个包含 2600 万个条目的地图。使用标准的 Java HashMap,在 2 到 3 百万次插入后,放置速度变得难以忍受。

Also, does anyone know if using different hash code distributions for the keys could help?

另外,有谁知道对密钥使用不同的哈希码分布是否有帮助?

My hashcode method:

我的哈希码方法:

byte[] a = new byte[2];
byte[] b = new byte[3];
...

public int hashCode() {
    int hash = 503;
    hash = hash * 5381 + (a[0] + a[1]);
    hash = hash * 5381 + (b[0] + b[1] + b[2]);
    return hash;
}

I am using the associative property of addition to ensure that equal objects have the same hashcode. The arrays are bytes with values in the range 0 - 51. Values are only used once in either array. The objects are equal if the a arrays contain the same values (in either order) and the same goes for the b array. So a = {0,1} b = {45,12,33} and a = {1,0} b = {33,45,12} are equal.

我使用加法的关联属性来确保相等的对象具有相同的哈希码。数组是值在 0 - 51 范围内的字节。值在任一数组中仅使用一次。如果 a 数组包含相同的值(按任一顺序),则对象相等,而 b 数组也是如此。所以 a = {0,1} b = {45,12,33} 和 a = {1,0} b = {33,45,12} 是相等的。

EDIT, some notes:

编辑,一些注意事项:

  • A few people have criticized using a hash map or other data structure to store 26 million entries. I cannot see why this would seem strange. It looks like a classic data structures and algorithms problem to me. I have 26 million items and I want to be able to quickly insert them into and look them up from a data structure: give me the data structure and algorithms.

  • Setting the initial capacity of the default Java HashMap to 26 million decreasesthe performance.

  • Some people have suggested using databases, in some other situations that is definitely the smart option. But I am really asking a data structures and algorithms question, a full database would be overkill and much slower than a good datastructure solution (after all the database is just software but would have communication and possibly disk overhead).

  • 一些人批评使用哈希映射或其他数据结构来存储 2600 万个条目。我不明白为什么这看起来很奇怪。对我来说,这看起来像是一个经典的数据结构和算法问题。我有 2600 万个项目,我希望能够快速将它们插入到数据结构中并从数据结构中查找:给我数据结构和算法。

  • 将默认 Java HashMap 的初始容量设置为 2600 万会降低性能。

  • 有些人建议使用数据库,在其他一些情况下这绝对是明智的选择。但我真的是在问一个数据结构和算法问题,一个完整的数据库会比一个好的数据结构解决方案大材小用,而且慢得多(毕竟数据库只是软件,但会有通信和可能的磁盘开销)。

采纳答案by nash

As many people pointed out the hashCode()method was to blame. It was only generating around 20,000 codes for 26 million distinct objects. That is an average of 1,300 objects per hash bucket = very very bad. However if I turn the two arrays into a number in base 52 I am guaranteed to get a unique hash code for every object:

正如许多人指出的那样,hashCode()方法是罪魁祸首。它只为 2600 万个不同的对象生成了大约 20,000 个代码。也就是说,每个哈希桶平均有 1,300 个对象 = 非常非常糟糕。但是,如果我将两个数组转换为以 52 为基数的数字,我保证会为每个对象获得一个唯一的哈希码:

public int hashCode() {       
    // assume that both a and b are sorted       
    return a[0] + powerOf52(a[1], 1) + powerOf52(b[0], 2) + powerOf52(b[1], 3) + powerOf52(b[2], 4);
}

public static int powerOf52(byte b, int power) {
    int result = b;
    for (int i = 0; i < power; i++) {
        result *= 52;
    }
    return result;
}

The arrays are sorted to ensure this methods fulfills the hashCode()contract that equal objects have the same hash code. Using the old method the average number of puts per second over blocks of 100,000 puts, 100,000 to 2,000,000 was:

对数组进行排序以确保此方法满足hashCode()相等对象具有相同哈希码的约定。使用旧方法,在 100,000 个 puts,100,000 到 2,000,000 块上的每秒平均 puts 数是:

168350.17
109409.195
81344.91
64319.023
53780.79
45931.258
39680.29
34972.676
31354.514
28343.062
25562.371
23850.695
22299.22
20998.006
19797.799
18702.951
17702.434
16832.182
16084.52
15353.083

Using the new method gives:

使用新方法给出:

337837.84
337268.12
337078.66
336983.97
313873.2
317460.3
317748.5
320000.0
309704.06
310752.03
312944.5
265780.75
275540.5
264350.44
273522.97
270910.94
279008.7
276285.5
283455.16
289603.25

Much much better. The old method tailed off very quickly while the new one keeps up a good throughput.

好多了。旧方法很快就结束了,而新方法保持了良好的吞吐量。

回答by Mykola Golubyev

HashMap has initial capacity and HashMap's performance very very depends on hashCode that produce underlying objects.

HashMap 具有初始容量,HashMap 的性能非常依赖于产生底层对象的 hashCode。

Try to tweak both.

尝试调整两者。

回答by delfuego

My first idea is to make sure you're initializing your HashMap appropriately. From the JavaDocs for HashMap:

我的第一个想法是确保正确初始化 HashMap。来自HashMapJavaDocs

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

HashMap 的实例有两个影响其性能的参数:初始容量和负载因子。容量是哈希表中的桶数,初始容量就是哈希表创建时的容量。负载因子是衡量哈希表在其容量自动增加之前允许达到多满的指标。当哈希表中的条目数超过负载因子和当前容量的乘积时,重新哈希表(即重建内部数据结构),使哈希表具有大约两倍的桶数。

So if you're starting off with a too-small HashMap, then every time it needs to resize, allthe hashes are recomputed... which might be what you're feeling when you get to the 2-3 million insertion point.

因此,如果您从一个太小的 HashMap 开始,那么每次需要调整大小时,都会重新计算所有散列......当您到达 2-3 百万个插入点时,这可能是您的感受。

回答by ReneS

Allocate a large map in the beginning. If you know it will have 26 million entries and you have the memory for it, do a new HashMap(30000000).

一开始就分配一个大地图。如果您知道它将有 2600 万个条目并且您有足够的内存,请执行new HashMap(30000000).

Are you sure, you have enough memory for 26 million entries with 26 million keys and values? This sounds like a lot memory to me. Are you sure that the garbage collection is doing still fine at your 2 to 3 million mark? I could imagine that as a bottleneck.

你确定,你有足够的内存来容纳 2600 万个条目和 2600 万个键和值吗?这对我来说听起来像是很多记忆。您确定垃圾收集在您的 2 到 300 万大关中仍然运行良好吗?我可以想象这是一个瓶颈。

回答by OscarRyz

You could try two things:

你可以尝试两件事:

  • Make your hashCodemethod return something simpler and more effective such as a consecutive int

  • Initialize your map as:

    Map map = new HashMap( 30000000, .95f );
    

Those two actions will reduce tremendously the amount of rehashing the structure is doing, and are pretty easy to test I think.

If that doesn't work, consider using a different storage such a RDBMS.

EDIT

Is strange that setting the initial capacity reduce the performance in your case.

See from the javadocs:

If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

I made a microbeachmark ( which is not by anymeans definitive but at least proves this point )

$cat Huge*java
import java.util.*;
public class Huge {
    public static void main( String [] args ) {
        Map map = new HashMap( 30000000 , 0.95f );
        for( int i = 0 ; i < 26000000 ; i ++ ) { 
            map.put( i, i );
        }
    }
}
import java.util.*;
public class Huge2 {
    public static void main( String [] args ) {
        Map map = new HashMap();
        for( int i = 0 ; i < 26000000 ; i ++ ) { 
            map.put( i, i );
        }
    }
}
$time java -Xms2g -Xmx2g Huge

real    0m16.207s
user    0m14.761s
sys 0m1.377s
$time java -Xms2g -Xmx2g Huge2

real    0m21.781s
user    0m20.045s
sys 0m1.656s
$

So, using the initial capacity drops from 21s to 16s because of the rehasing. That leave us with your hashCodemethod as an "area of opportunity" ;)

  • 让你的 hashCode方法返回一些更简单、更有效的东西,比如一个连续的 int

  • 将您的地图初始化为:

    Map map = new HashMap( 30000000, .95f );
    

这两个操作将大大减少结构正在执行的重新散列的数量,并且我认为很容易测试。

如果这不起作用,请考虑使用不同的存储,例如 RDBMS。

编辑

奇怪的是,在您的情况下,设置初始容量会降低性能。

javadocs 中看到:

如果初始容量大于最大条目数除以负载因子,则不会发生重新哈希操作。

我做了一个 microbeachmark(这无论如何都不是决定性的,但至少证明了这一点)

$cat Huge*java
import java.util.*;
public class Huge {
    public static void main( String [] args ) {
        Map map = new HashMap( 30000000 , 0.95f );
        for( int i = 0 ; i < 26000000 ; i ++ ) { 
            map.put( i, i );
        }
    }
}
import java.util.*;
public class Huge2 {
    public static void main( String [] args ) {
        Map map = new HashMap();
        for( int i = 0 ; i < 26000000 ; i ++ ) { 
            map.put( i, i );
        }
    }
}
$time java -Xms2g -Xmx2g Huge

real    0m16.207s
user    0m14.761s
sys 0m1.377s
$time java -Xms2g -Xmx2g Huge2

real    0m21.781s
user    0m20.045s
sys 0m1.656s
$

因此,由于重新调整,使用初始容量从 21 秒下降到 16 秒。这让我们将您的hashCode方法视为“机会领域”;)

EDIT

编辑

Is not the HashMap

不是HashMap

As per your last edition.

按照你的上一版。

I think you should really profile your application and see where it the memory/cpu is being consumed.

我认为您应该真正分析您的应用程序并查看内存/cpu 被消耗的位置。

I have created a class implementing your same hashCode

我创建了一个实现你的类 hashCode

That hash code give millions of collisions, then the entries in the HashMap is reduced dramatically.

该哈希码会产生数百万次冲突,然后 HashMap 中的条目会急剧减少。

I pass from 21s, 16s in my previous test to 10s and 8s. The reason is because the hashCode provokes a high number of collisions and you are not storing the 26M objects you think but a much significant lower number ( about 20k I would say ) So:

我从之前测试中的 21s、16s 到了 10s 和 8s。原因是因为 hashCode 引发了大量冲突,并且您没有存储您认为的 26M 个对象,而是一个明显更低的数字(我会说大约 20k)所以:

The problems IS NOT THE HASHMAPis somewhere else in your code.

问题不在于 HASHMAP在您的代码中的其他地方。

It is about time to get a profiler and find out where. I would think it is on the creation of the item or probably you're writing to disk or receiving data from the network.

是时候获取分析器并找出位置了。我认为这是在创建项目时,或者您可能正在写入磁盘或从网络接收数据。

Here's my implementation of your class.

这是我对您的课程的实现。

noteI didn't use a 0-51 range as you did but -126 to 127 for my values and admits repeated, that's because I did this test before you updated your question

请注意,我没有像您那样使用 0-51 范围,而是使用 -126 到 127 作为我的值并承认重复,那是因为我在您更新问题之前进行了此测试

The only difference is that your class will have more collisions thus less items stored in the map.

唯一的区别是你的类会有更多的碰撞,因此存储在地图中的项目更少。

import java.util.*;
public class Item {

    private static byte w = Byte.MIN_VALUE;
    private static byte x = Byte.MIN_VALUE;
    private static byte y = Byte.MIN_VALUE;
    private static byte z = Byte.MIN_VALUE;

    // Just to avoid typing :) 
    private static final byte M = Byte.MAX_VALUE;
    private static final byte m = Byte.MIN_VALUE;


    private byte [] a = new byte[2];
    private byte [] b = new byte[3];

    public Item () {
        // make a different value for the bytes
        increment();
        a[0] = z;        a[1] = y;    
        b[0] = x;        b[1] = w;   b[2] = z;
    }

    private static void increment() {
        z++;
        if( z == M ) {
            z = m;
            y++;
        }
        if( y == M ) {
            y = m;
            x++;
        }
        if( x == M ) {
            x = m;
            w++;
        }
    }
    public String toString() {
        return "" + this.hashCode();
    }



    public int hashCode() {
        int hash = 503;
        hash = hash * 5381 + (a[0] + a[1]);
        hash = hash * 5381 + (b[0] + b[1] + b[2]);
        return hash;
    }
    // I don't realy care about this right now. 
    public boolean equals( Object other ) {
        return this.hashCode() == other.hashCode();
    }

    // print how many collisions do we have in 26M items.
    public static void main( String [] args ) {
        Set set = new HashSet();
        int collisions = 0;
        for ( int i = 0 ; i < 26000000 ; i++ ) {
            if( ! set.add( new Item() ) ) {
                collisions++;
            }
        }
        System.out.println( collisions );
    }
}

Using this class has Key for the previous program

使用这个类有上一个程序的Key

 map.put( new Item() , i );

gives me:

给我:

real     0m11.188s
user     0m10.784s
sys 0m0.261s


real     0m9.348s
user     0m9.071s
sys  0m0.161s

回答by Jay

To elaborate on Pascal: Do you understand how a HashMap works? You have some number of slots in your hash table. The hash value for each key is found, and then mapped to an entry in the table. If two hash values map to the same entry -- a "hash collision" -- HashMap builds a linked list.

详细说明 Pascal:您了解 HashMap 的工作原理吗?您的哈希表中有一定数量的插槽。找到每个键的哈希值,然后映射到表中的一个条目。如果两个散列值映射到同一个条目——“散列冲突”——HashMap 构建一个链表。

Hash collisions can kill the performance of a hash map. In the extreme case, if all your keys have the same hash code, or if they have different hash codes but they all map to the same slot, then your hash map turns into a linked list.

哈希冲突会降低哈希映射的性能。在极端情况下,如果您的所有键都具有相同的哈希码,或者如果它们具有不同的哈希码但它们都映射到同一个槽,那么您的哈希映射将变成一个链表。

So if you're seeing performance problems, the first thing I'd check is: Am I getting a random-looking distribution of hash codes? If not, you need a better hash function. Well, "better" in this case may mean "better for my particular set of data". Like, suppose you were working with strings, and you took the length of the string for the hash value. (Not how Java's String.hashCode works, but I'm just making up a simple example.) If your strings have widely varying lengths, from 1 to 10,000, and are fairly evenly distributed across that range, that this could be a very good hash function. But if your strings are all 1 or 2 characters, this would be a very bad hash function.

因此,如果您遇到性能问题,我首先要检查的是:我是否获得了随机分布的哈希码?如果没有,您需要一个更好的哈希函数。好吧,在这种情况下,“更好”可能意味着“对我的特定数据集更好”。就像,假设您正在处理字符串,并且您将字符串的长度作为哈希值。(不是 Java 的 String.hashCode 是如何工作的,但我只是举了一个简单的例子。)如果您的字符串长度变化很大,从 1 到 10,000,并且在该范围内分布相当均匀,那么这可能是一个非常好的哈希函数。但是如果你的字符串都是 1 或 2 个字符,这将是一个非常糟糕的哈希函数。

Edit: I should add: Every time you add a new entry, HashMap checks if this is a duplicate. When there's a hash collision, it has to compare the incoming key against every key that mapped to that slot. So in the worst case where everything hashes to a single slot, the second key is compared to the first key, the third key is compared to #1 and #2, the fourth key is compared to #1, #2, and #3, etc. By the time you get to key #1 million, you've done over a trillion compares.

编辑:我应该添加:每次添加新条目时,HashMap 都会检查这是否重复。当发生哈希冲突时,它必须将传入的键与映射到该槽的每个键进行比较。因此,在所有散列到单个槽的最坏情况下,将第二个键与第一个键进行比较,将第三个键与 #1 和 #2 进行比较,将第四个键与 #1、#2 和 #3 进行比较等。当你达到#100 万的时候,你已经完成了超过一万亿次的比较。

@Oscar: Umm, I don't see how that's a "not really". It's more like a "let me clarify". But yes, it's true that if you make a new entry with the same key as an existing entry, that this overwrites the first entry. That's what I meant when I talked about looking for duplicates in the last paragraph: Whenever a key hashes to the same slot, HashMap must check if it's a duplicate of an existing key, or if they are just in the same slot by coincidence of the hash function. I don't know that that's the "whole point" of a HashMap: I would say that the "whole point" is that you can retrieve elements by key quickly.

@Oscar:嗯,我不明白这是“不是真的”。这更像是“让我澄清一下”。但是,是的,如果您使用与现有条目相同的键创建一个新条目,这确实会覆盖第一个条目。这就是我在上一段中谈到寻找重复项时的意思:每当一个键散列到同一个槽时,HashMap 必须检查它是否是现有键的重复,或者它们是否只是在同一个槽中巧合哈希函数。我不知道那是 HashMap 的“重点”:我会说“重点”是您可以通过键快速检索元素。

But anyway, that doesn't affect the "whole point" that I was trying to make: When you have two keys -- yes, different keys, not the same key showing up again -- that map to the same slot in the table, HashMap builds a linked list. Then, because it has to check each new key to see if it is in fact a duplicate of an existing key, each attempt to add a new entry that maps to this same slot must chase the linked list examining each existing entry to see if this is a duplicate of a previously-seen key, or if it is a new key.

但无论如何,这不会影响我试图提出的“重点”:当你有两个键时——是的,不同的键,不再出现同一个键——映射到表中的同一个槽, HashMap 建立链表。然后,因为它必须检查每个新键以查看它是否实际上是现有键的副本,所以每次尝试添加映射到同一槽的新条目都必须追踪链表,检查每个现有条目以查看是否存在是先前看到的密钥的副本,或者如果它是新密钥。

Update long after the original post

在原帖很久之后更新

I just got an up-vote on this answer 6 years after posting which led me to re-read the question.

发布 6 年后,我刚刚对这个答案投了赞成票,这让我重新阅读了这个问题。

The hash function given in the question is not a good hash for 26 million entries.

问题中给出的散列函数对于 2600 万个条目来说并不是一个好的散列。

It adds together a[0]+a[1] and b[0]+b[1]+b[2]. He says values of each byte range from 0 to 51, so that gives only (51*2+1)*(51*3+1)=15,862 possible hash values. With 26 million entries, this means an average of about 1639 entries per hash value. That is lots and lots of collisions, requiring lots and lots of sequential searches through linked lists.

它将 a[0]+a[1] 和 b[0]+b[1]+b[2] 相加。他说每个字节的值范围从 0 到 51,因此只有 (51*2+1)*(51*3+1)=15,862 个可能的哈希值。有 2600 万个条目,这意味着每个哈希值平均大约有 1639 个条目。这是大量的冲突,需要通过链表进行大量的顺序搜索。

The OP says that different orders within array a and array b should be considered equal, i.e. [[1,2],[3,4,5]].equals([[2,1],[5,3,4]]), and so to fulfill the contract they must have equal hash codes. Okay. Still, there are a lot more than 15,000 possible values. His second proposed hash function is much better, giving a broader range.

OP 说数组 a 和数组 b 中的不同顺序应该被认为是相等的,即 [[1,2],[3,4,5]].equals([[2,1],[5,3,4] ]),因此为了履行合约,它们必须具有相同的哈希码。好的。尽管如此,仍有超过 15,000 个可能的值。他提出的第二个哈希函数要好得多,范围更广。

Though as someone else commented, it seems inappropriate for a hash function to change other data. It would make more sense to "normalize" the object when it is created, or to have the hash function work from copies of the arrays. Also, using a loop to calculate constants every time through the function is inefficient. As there are only four values here, I would have either written

尽管正如其他人评论的那样,散列函数更改其他数据似乎不合适。在创建对象时“规范化”对象,或者让散列函数从数组的副本中工作会更有意义。此外,每次通过函数使用循环来计算常量是低效的。由于这里只有四个值,我要么写

return a[0]+a[1]*52+b[0]*52*52+b[1]*52*52*52+b[2]*52*52*52*52;

which would cause the compiler to perform the calculation once at compile time; or have 4 static constants defined in the class.

这将导致编译器在编译时执行一次计算;或者在类中定义了 4 个静态常量。

Also, the first draft at a hash function has several calculations that do nothing to add to the range of outputs. Note he first sets hash =503 than multiplies by 5381 before even considering values from the class. So ... in effect he adds 503*5381 to every value. What does this accomplish? Adding a constant to every hash value just burns cpu cycles without accomplishing anything useful. Lesson here: Adding complexity to a hash function is not the goal. The goal is to get a broad range of different values, not just to add complexity for the sake of complexity.

此外,散列函数的初稿有几个计算,对输出范围没有任何影响。请注意,在考虑类中的值之前,他首先设置 hash = 503 而不是乘以 5381。所以......实际上他将 503*5381 添加到每个值。这有什么作用?为每个哈希值添加一个常量只会消耗 CPU 周期,而不会完成任何有用的事情。教训:增加哈希函数的复杂性不是目标。目标是获得广泛的不同值,而不仅仅是为了复杂而增加复杂性。

回答by Adrian

You can try to use an in-memory database like HSQLDB.

您可以尝试使用像HSQLDB这样的内存数据库。

回答by coolest_head

Have you considered using a embeded database to do this. Look at Berkeley DB. It is open-source, owned by Oracle now.

您是否考虑过使用嵌入式数据库来执行此操作。看看伯克利数据库。它是开源的,现在归 Oracle 所有。

It stores everything as Key->Value pair, it is NOT an RDBMS. and it aims to be fast.

它将所有内容存储为 Key->Value 对,它不是 RDBMS。它的目标是快速。

回答by JRL

SQLitelets you use it in memory.

SQLite允许您在内存中使用它。

回答by Juha Syrj?l?

First you should check that you are using Map correctly, good hashCode() method for keys, initial capacity for Map, right Map implementation etc. like many other answers describe.

首先,您应该检查您是否正确使用 Map、键的良好 hashCode() 方法、Map 的初始容量、正确的 Map 实现等,就像许多其他答案所描述的那样。

Then I would suggest using a profiler to see what is actually happening and where the execution time is spent. Is, for example, hashCode() method executed for billions of times?

然后我建议使用分析器来查看实际发生的情况以及执行时间花费在哪里。例如,hashCode() 方法是否执行了数十亿次?

If that doesn't help, how about using something like EHCacheor memcached? Yes, they are products for caching but you could configure them so that they will have enough capacity and will never evict any values from cache storage.

如果这没有帮助,那么使用EHCachememcached 之类的东西怎么样?是的,它们是用于缓存的产品,但您可以对其进行配置,以便它们具有足够的容量并且永远不会从缓存存储中驱逐任何值。

Another option would be some database engine that is lighter weight than full SQL RDBMS. Something like Berkeley DB, maybe.

另一种选择是一些比完整 SQL RDBMS 更轻的数据库引擎。也许像Berkeley DB这样的东西。

Note, that I have personally no experience of these products' performance, but they could be worth the try.

请注意,我个人对这些产品的性能没有经验,但值得一试。

回答by coolest_head

If the keys have any pattern to them then you can split the map into smaller maps and have a index map.

如果键有任何模式,那么您可以将映射拆分为较小的映射并拥有索引映射。

Example: Keys: 1,2,3,.... n 28 maps of 1 million each. Index map: 1-1,000,000 -> Map1 1,000,000-2,000,000 -> Map2

示例: Keys: 1,2,3,.... n 28 张地图,每张地图 100 万张。索引图:1-1,000,000 -> Map1 1,000,000-2,000,000 -> Map2

So you'll be doing two lookups but the key set would be 1,000,000 vs 28,000,000. You can easily do this with sting patterns also.

因此,您将进行两次查找,但密钥集将是 1,000,000 与 28,000,000。您也可以使用刺痛模式轻松做到这一点。

If the keys are completely random then this will not work

如果密钥是完全随机的,那么这将不起作用