Java 什么是散列中的折叠技术以及如何实现它?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36565101/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 18:26:15  来源:igfitidea点击:

What is Folding technique in hashing and how to implement it?

javaalgorithmdata-structureshashhashtable

提问by Yogesh Umesh Vaity

I heard at a data structures seminar that we can break a key into groups of digits and then do the addition of groups. This ensures that all the digits contribute the hash code. The number of digits in a group correspond to the size of the array.

我在一个数据结构研讨会上听说,我们可以将一个键分解成几组数字,然后进行组的加法。这确保所有数字都贡献哈希码。组中的位数对应于数组的大小。

For example, I have a machine number say 424-124-9675, how do I make the hash function using the Folding technique?

例如,我有一个机器号说424-124-9675,我如何使用折叠技术制作哈希函数?

采纳答案by Yogesh Umesh Vaity

Following the answers from Tony and Sumeet, I did some more research on digit folding and decided to implement the technique explained by Robert Lafore in his Data Structures book.

根据 Tony 和 Sumeet 的回答,我对数字折叠进行了更多研究,并决定实施 Robert Lafore 在他的《数据结构》一书中解释的技术。

For example, suppose you want to hash 10-digit machine numbers. If the array size is 1,000, you would divide the 10-digit number into three groups of three digits and one group of one-digit. In out example in question machine number is 424-124-9675, so you would calculate a key value of 424 + 124 + 967 + 5 = 1520. You can use the %operator to trim such sums so the highest index is 999. In this case, 1520 % 1000 = 520.

例如,假设您要散列 10 位机器号。如果数组大小为1,000,则将 10 位数字分成三组三位数和一组一位数。在有问题的机器编号的示例中424-124-9675,因此您将计算 的键值424 + 124 + 967 + 5 = 1520。您可以使用%运算符来修剪此类总和,以便最高索引为999。在这种情况下,1520 % 1000 = 520

If the array size is 100, you would need to break the 10-digit key into five two-digit numbers: 42 + 41 + 24 + 96 + 75 = 278, and 278 % 100 = 78.

如果数组大小为100,则需要将 10 位密钥分解为五个两位数: 42 + 41 + 24 + 96 + 75 = 278、 和278 % 100 = 78

It's easier to imagine how this works when the array size is a multiple of 10. However, for best results it should be a prime number.

当数组大小是 10 的倍数时,更容易想象这是如何工作的。但是,为了获得最佳结果,它应该是一个质数。

Here's the Java code of digit folding technique I implemented:

这是我实现的数字折叠技术的Java代码:

public class DigitFolder {
    public static void main(String[] args) {
        int hashVal = hashFunc(424124967);
        System.out.println(hashVal);
    }
    public static int hashFunc(int key) {
        int arraySize = 1021;
        int keyDigitCount = getDigitCount(key);
        int groupSize = getDigitCount(arraySize);
        int groupSum = 0;
        String keyString = Integer.toString(key);
        int i;
        for (i = 0; i < keyString.length(); i += groupSize) {
            if (i + groupSize <= keyString.length()) {
                String group = keyString.substring(i, i + groupSize);
                groupSum += Integer.parseInt(group);
            }
        }
        // There is no remaining part if count is divisible by groupsize.
        if (keyDigitCount % groupSize != 0) {
            String remainingPart = 
                   keyString.substring(i - groupSize, keyString.length());
            groupSum += Integer.parseInt(remainingPart);
        }
        return groupSum % arraySize;
    }
    public static int getDigitCount(int n) {
        int numDigits = 1;
        while (n > 9) {
            n /= 10;
            numDigits++;
        }
        return numDigits;
    }
}

I found the group making method here. But it makes groups right to left. So, I used String#subString()method to make left to right groups.

我在这里找到了组制作方法。但它使组从右到左。所以,我使用String#subString()方法从左到右分组。

回答by Tony Delroy

Given 424-124-9675, you decide where you want to break it into groups. For example:

给定 424-124-9675,您可以决定要将其分组的位置。例如:

  • every 3 digits from left: hash = 424 + 124 + 967 + 5

  • every 3 digits from right: hash = 675 + 249 + 241 + 4

  • where the dashes are: hash = 424 + 124 + 9675

  • 从左起每 3 位数字:hash = 424 + 124 + 967 + 5

  • 从右起每 3 位数字:hash = 675 + 249 + 241 + 4

  • 破折号是:hash = 424 + 124 + 9675

It's a terribly weak way to hash though - very collision prone.

虽然这是一种非常弱的散列方式 - 非常容易发生冲突。

回答by Sumeet

There are 2 types of folding methods used Fold shiftand Fold boundary.

有 2 种折叠方法使用Fold shiftFold boundary

Fold Shift

折叠移位

You divide the key in parts whose size matches the size of required address. The parts are simply added to get the required address.

您将密钥分成大小与所需地址大小相匹配的部分。只需添加零件即可获得所需的地址。

Key:123456789 and size of required address is 3 digits.

密钥:123456789,所需地址大小为 3 位。

123+456+789 = 1368. To reduce the size to 3, either 1 or 8 is removed and accordingly the key would be 368 or 136 respectively.

123+456+789 = 1368。为了将大小减小到 3,删除 1 或 8,相应的键将分别为 368 或 136。

Fold Boundary

折叠边界

You again divide the key in parts whose size matches the size of required address.But now you also applying folding, except for the middle part,if its there.

您再次将密钥分成大小与所需地址大小匹配的部分。但是现在您还应用折叠,除了中间部分,如果有的话。

Key:123456789 and size of required address is 3 digits

密钥:123456789,所需地址大小为 3 位

321 (folding applied)+456+987 (folding applied) = 1764(discard 1 or 4)

321(折叠应用)+456+987(折叠应用)= 1764(丢弃1或4)