Java 良好的字符串散列函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2624192/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 10:02:08  来源:igfitidea点击:

Good Hash Function for Strings

javahashhashtablehashcode

提问by Leif Andersen

I'm trying to think up a good hash function for strings. And I was thinking it might be a good idea to sum up the unicode values for the first five characters in the string (assuming it has five, otherwise stop where it ends). Would that be a good idea, or is it a bad one?

我正在尝试为字符串想出一个好的散列函数。而且我认为将字符串中前五个字符的 unicode 值相加可能是一个好主意(假设它有五个,否则在它结束的地方停止)。这是一个好主意,还是一个坏主意?

I am doing this in Java, but I wouldn't imagine that would make much of a difference.

我正在用 Java 做这件事,但我认为这不会有太大的不同。

采纳答案by jonathanasdf

Usually hashes wouldn't do sums, otherwise stopand potswill have the same hash.

通常哈希不会做算术,否则stoppots将具有相同的哈希值。

and you wouldn't limit it to the first n characters because otherwise house and houses would have the same hash.

并且您不会将其限制为前 n 个字符,否则 house 和 house 将具有相同的哈希值。

Generally hashs take values and multiply it by a prime number (makes it more likely to generate unique hashes) So you could do something like:

通常散列取值并将其乘以质数(使其更有可能生成唯一的散列)所以您可以执行以下操作:

int hash = 7;
for (int i = 0; i < strlen; i++) {
    hash = hash*31 + charAt(i);
}

回答by Pyrolistical

If you are doing this in Java then why are you doing it? Just call .hashCode()on the string

如果您在 Java 中执行此操作,那么您为什么要这样做?只需调用.hashCode()字符串

回答by Pratik Deoghare

// djb2 hash function
unsigned long hash(unsigned char *str)
{
    unsigned long hash = 5381;
    int c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
}

sourceLogic behind djb2 hash function - SO

djb2 哈希函数背后的逻辑 - SO

回答by Frederik

You should probably use String.hashCode().

您可能应该使用String.hashCode()

If you really want to implement hashCode yourself:

如果你真的想自己实现 hashCode:

Do not be tempted to exclude significant parts of an object from the hash code computation to improve performance -- Joshua Bloch, Effective Java

不要试图从哈希码计算中排除对象的重要部分以提高性能——Joshua Bloch,Effective Java

Using only the first five characters is a bad idea. Think about hierarchical names, such as URLs: they will all have the same hash code (because they all start with "http://", which means that they are stored under the same bucket in a hash map, exhibiting terrible performance.

只使用前五个字符是个坏主意。想想分层名称,例如 URL:它们都将具有相同的哈希码(因为它们都以“http://”开头,这意味着它们存储在哈希映射中的同一个桶下,表现出糟糕的性能。

Here's a war story paraphrased on the String hashCode from "Effective Java":

这是在“ Effective Java”中的 String hashCode 上转述的战争故事:

The String hash function implemented in all releases prior to 1.2 examined at most sixteen characters, evenly spaced throughout the string, starting with the first character. For large collections of hierarchical names, such as URLs, this hash function displayed terrible behavior.

在 1.2 之前的所有版本中实现的 String 哈希函数最多检查 16 个字符,从第一个字符开始,在整个字符串中均匀分布。对于大量分层名称(例如 URL),此哈希函数表现出糟糕的行为。

回答by Frederik

If it's a security thing, you could use Java crypto:

如果是安全问题,您可以使用 Java 加密:

import java.security.MessageDigest;

MessageDigest messageDigest = MessageDigest.getInstance("SHA-256");
messageDigest.update(stringToEncrypt.getBytes());
String encryptedString = new String(messageDigest.digest());

回答by Dean J

If you want to see the industry standard implementations, I'd look at java.security.MessageDigest.

如果您想查看行业标准实现,我会查看java.security.MessageDigest

"Message digests are secure one-way hash functions that take arbitrary-sized data and output a fixed-length hash value."

“消息摘要是安全的单向散列函数,它采用任意大小的数据并输出固定长度的散列值。”

回答by Thomas Pornin

FNV-1is rumoured to be a good hash function for strings.

传闻FNV-1是一个很好的字符串散列函数。

For long strings (longer than, say, about 200 characters), you can get good performance out of the MD4hash function. As a cryptographic function, it was broken about 15 years ago, but for non cryptographic purposes, it is still very good, and surprisingly fast. In the context of Java, you would have to convert the 16-bit charvalues into 32-bit words, e.g. by grouping such values into pairs. A fast implementation of MD4 in Java can be found in sphlib. Probably overkill in the context of a classroom assignment, but otherwise worth a try.

对于长字符串(比方说,大约 200 个字符长),您可以从MD4散列函数中获得良好的性能。作为一种密码功能,它在大约 15 年前就被破解了,但对于非密码用途来说,它仍然非常好,而且速度惊人。在 Java 环境中,您必须将 16 位char值转换为 32 位字,例如,将这些值分组。可以在sphlib 中找到 Java 中 MD4 的快速实现。在课堂作业的背景下可能有点矫枉过正,但值得一试。

回答by Festus Tamakloe

This function provided by Nick is good but if you use new String(byte[] bytes) to make the transformation to String, it failed. You can use this function to do that.

Nick 提供的这个函数很好,但是如果你使用 new String(byte[] bytes) 来转换为 String,它就失败了。您可以使用此功能来做到这一点。

private static final char[] hex = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };

public static String byteArray2Hex(byte[] bytes) {
    StringBuffer sb = new StringBuffer(bytes.length * 2);
    for(final byte b : bytes) {
        sb.append(hex[(b & 0xF0) >> 4]);
        sb.append(hex[b & 0x0F]);
    }
    return sb.toString();
}

public static String getStringFromSHA256(String stringToEncrypt) throws NoSuchAlgorithmException {
    MessageDigest messageDigest = MessageDigest.getInstance("SHA-256");
    messageDigest.update(stringToEncrypt.getBytes());
    return byteArray2Hex(messageDigest.digest());
}

May be this can help somebody

可能这可以帮助某人

回答by Mike Samuel

Guava's HashFunction(javadoc) provides decent non-crypto-strong hashing.

Guava 的HashFunction( javadoc) 提供了不错的非加密强散列。

回答by Charaf JRA

         public String hashString(String s) throws NoSuchAlgorithmException {
    byte[] hash = null;
    try {
        MessageDigest md = MessageDigest.getInstance("SHA-256");
        hash = md.digest(s.getBytes());

    } catch (NoSuchAlgorithmException e) { e.printStackTrace(); }
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < hash.length; ++i) {
        String hex = Integer.toHexString(hash[i]);
        if (hex.length() == 1) {
            sb.append(0);
            sb.append(hex.charAt(hex.length() - 1));
        } else {
            sb.append(hex.substring(hex.length() - 2));
        }
    }
    return sb.toString();
}