java 一个双向的String哈希函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6639725/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 16:45:31  来源:igfitidea点击:

A two way String hash function

javastringhashcode

提问by Ankur

I want to get a unique numeric representation of a String. I know there are lots of ways of doing this, my question is which do you think is the best? I don't want to have negative numbers - so the hashcode() function in java is not so good, although I could override it ... but I'd rather not since I am not so confident and don't want to accidentally break something.

我想获得一个字符串的唯一数字表示。我知道有很多方法可以做到这一点,我的问题是您认为哪种方法最好?我不想有负数 - 所以 java 中的 hashcode() 函数不太好,虽然我可以覆盖它......但我宁愿不要,因为我不太自信并且不想意外打破东西。

My Strings are all semantic-web URIS. The reason for the numeric representation is that when I display the data for a URI on a page I need something to pass into the query String or put into various fields in my javascript. The URI itself is too unwieldy and looks bad when you have a URI as a value in a URI.

我的字符串都是语义网络 URIS。使用数字表示的原因是,当我在页面上显示 URI 的数据时,我需要将某些内容传递到查询字符串中或放入我的 javascript 中的各个字段中。当您将 URI 作为 URI 中的值时,URI 本身太笨拙并且看起来很糟糕。

Basically I want to have a class called Resourcewhich will look like this

基本上我想要一个名为的类Resource,它看起来像这样

Resource{
  int id;
  String uri;
  String value; // this is the label or human readable name

  // .... other code/getters/setters here

  public int getId(){
    return id = stringToIntFunction();
  }

  private int stringToIntFunction(String uri){
  // do magic here
  }
}

Can you suggestion a function that would do this if:

如果出现以下情况,您能否建议一个可以执行此操作的函数:

  1. It had to be two way, that is you could also recover the original string from the numeric value
  2. It doesn't have to be two way
  1. 它必须有两种方式,即您还可以从数值中恢复原始字符串
  2. 它不一定是两种方式

Also are there other issues that are important that I am not considering?

还有其他重要的问题我没有考虑吗?

回答by Jon Skeet

If you want it to be reversible, you're in trouble. Hashes are designedto be one-way.

如果你想让它可逆,那你就有麻烦了。哈希被设计为单向的。

In particular, given that an inthas 32 bits of information, and a charhas 16 bits of information, requiring reversibility means you can only have strings of zero, one or two characters (and even that's assuming that you're happy to encode "" as "\0\0" or something similar). That's assuming you don't have any storage, of course. If you can use storage, then just store numbers sequentially... something like:

特别是,鉴于 anint有 32 位信息,achar有 16 位信息,要求可逆性意味着您只能有零、一或两个字符的字符串(即使假设您很乐意将 "" 编码为"\0\0" 或类似的东西)。当然,这是假设您没有任何存储空间。如果您可以使用存储,那么只需按顺序存储数字……例如:

private int stringToIntFunction(String uri) {
    Integer existingId = storage.get(uri);
    if (existingId != null) {
        return existingId.intValue();
    }
    return storage.put(uri);
}

Here storage.put()would increase a counter internally, store the URI as being associated with that counter value, and return it. My guess is that that's not what you're after though.

这里storage.put()将在内部增加一个计数器,将 URI 存储为与该计数器值相关联,并返回它。我的猜测是那不是你想要的。

Basically, to perform a reversible encryption, I'd use a standard encryption library having converted the string to a binary format first (e.g. using UTF-8). I would expect the result to be a byte[].

基本上,要执行可逆加密,我会使用标准加密库,先将字符串转换为二进制格式(例如,使用 UTF-8)。我希望结果是byte[].

If it doesn'thave to be reversible, I'd consider just taking the absolute value of the normal hashCode()result (but mapping Integer.MIN_VALUEto something specific, as its absolute value can't be represented as an int).

如果它不必是可逆的,我会考虑只取正常hashCode()结果的绝对值(但映射Integer.MIN_VALUE到特定的东西,因为它的绝对值不能表示为int)。

回答by Michael Stum

Hashes are one way only (that's part of the reason they have a fixed length regardless of the input size). If you need two-way, you're looking at something like Base64 encoding.

哈希只是一种方式(这是它们具有固定长度而不考虑输入大小的部分原因)。如果您需要双向,则可以查看 Base64 编码之类的东西。

Why can't you have negative numbers? Where do the URIs come from? Are they in a database? Why not use the Database Key ID? If they are not in a database, can you generate them for the user given a set of variables/parameters? (So the query string only contains things like foo=1&bar=two and you generate the URL on the Server or JavaScript side)

为什么不能有负数?URI 来自哪里?它们在数据库中吗?为什么不使用数据库密钥 ID?如果它们不在数据库中,您能否为给定一组变量/参数的用户生成它们?(因此查询字符串仅包含 foo=1&bar=two 之类的内容,并且您在服务器或 JavaScript 端生成 URL)

回答by Vincent Mimoun-Prat

Given all the remars done above (hash function is one way), I would go for 2 possible solutions:

鉴于上面所做的所有 remars(哈希函数是一种方式),我会选择 2 种可能的解决方案:

  • Use some encrypting function to get a long string representing your URL (you'll get something like -> param=456ab894ce897b98f (this could be longer and/or shorter depending on the URL). See DES encryption for instance or base64url.
  • Keep track of the URLs in a database (could be also a simple file-based database such as SQLite). Then you'll effectively have an uint <=> URL equivalence.
  • 使用一些加密函数来获得一个代表你的 URL 的长字符串(你会得到类似 -> param=456ab894ce897b98f 的东西(这可能更长和/或更短取决于 URL)。例如参见 DES 加密或base64url
  • 跟踪数据库中的 URL(也可以是简单的基于文件的数据库,例如 SQLite)。然后,您将有效地获得 uint <=> URL 等效项。

回答by Will A

"Unique representation" implies that the Java supplied string.hashcode would be useless - you'd soon come across two URIs that shared the same hashcode.

“唯一表示”意味着 Java 提供的 string.hashcode 将毫无用处——您很快就会遇到两个共享相同哈希码的 URI。

Any two-way scheme is going to result in an unwieldy string - unless you store the URIs in a database and use the record ID as your unique identifier.

任何双向方案都会产生一个笨拙的字符串 - 除非您将 URI 存储在数据库中并使用记录 ID 作为您的唯一标识符。

As far as one-way goes - an MD5 hash would be considerably more unique (but by no means unique) than the simple hashcode - but might be verging on "unwieldy" depending on your definition!

就单向而言 - MD5 散列将比简单的散列码更加独特(但绝不是唯一的) - 但根据您的定义,它可能接近“笨拙”!

回答by rossum

Q1: If you want to recover the string from the number then you could use:

Q1:如果你想从数字中恢复字符串,那么你可以使用:

1a: an encryption of the string, which is going to be the same size, or longer, unless you zip the string first. This will give an array of random looking bytes, which could be displayed as Base-64.

1a:字符串的加密,除非您先压缩字符串,否则字符串的大小将相同或更长。这将给出一个随机字节数组,可以显示为 Base-64。

1b: a database, or a map, and the number is the index of the string in the map/database.

1b:一个数据库,或者一个地图,数字是该字符串在地图/数据库中的索引。

Q2: The string does not have to be recoverable.

Q2:字符串不一定是可恢复的。

Various ideas are possible here. You can display the hash in hex or in Base-64 to avoid negative signs. The only non-alphanumeric characters in Base-64 are '+', '/' and '='. For an almost unique hash you will need something of cryptographic size, MD5 (128 bits), SHA-1 (160 bits) or SHA-2 (256 or 512 bits).

各种想法在这里都是可能的。您可以以十六进制或 Base-64 显示哈希以避免出现负号。Base-64 中唯一的非字母数字字符是“+”、“/”和“=”。对于几乎唯一的散列,您需要一些加密大小的东西,MD5(128 位)、SHA-1(160 位)或 SHA-2(256 或 512 位)。

An MD5 hash looks like "d131dd02c5e6eec4693d9a0698aff95c" in hex; the larger the hash the less likely a collision is.

MD5 哈希在十六进制中看起来像“d131dd02c5e6eec4693d9a0698aff95c”;哈希值越大,冲突的可能性就越小。

rossum

罗苏姆