java 字符串到整数的映射

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/383570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 12:11:09  来源:igfitidea点击:

Mapping of strings to integers

java

提问by Kaarel

What is the easiest way in Java to map strings (Java String) to (positive) integers (Java int), so that

Java中将字符串(Java String)映射到(正)整数(Java int)的最简单方法是什么,以便

  • equal strings map to equal integers, and
  • different strings map to different integers?
  • 相等的字符串映射到相等的整数,并且
  • 不同的字符串映射到不同的整数?

So, similar to hashCode()but different strings are required to produce different integers. So, in a sense, it would be a hasCode() without the collision possibility.

因此,需要类似hashCode()但不同的字符串来生成不同的整数。所以,从某种意义上说,它是一个没有碰撞可能性的 hasCode() 。

An obvious solution would maintain a mapping table from strings to integers, and a counter to guarantee that new strings are assigned a new integer. I'm just wondering how is this problem usually solved. Would also be interesting to extend it to other objects than strings.

一个明显的解决方案是维护一个从字符串到整数的映射表,以及一个计数器来保证为新字符串分配一个新整数。我只是想知道这个问题通常是如何解决的。将它扩展到字符串以外的其他对象也会很有趣。

采纳答案by martinus

This is impossible to achieve without any restrictions, simply because there are more possible Strings than there are integers, so eventually you will run out of numbers.

这是不可能在没有任何限制的情况下实现的,仅仅是因为可能的字符串比整数多,所以最终你会用完数字。

A solution is only possible when you limit the number of usable Strings. Then you can use a simple counter. Here is a simple implementation where all (2^32 = 4294967296 different strings) can be used. Never mind that it uses lots of memory.

只有当您限制可用字符串的数量时,才有可能解决。然后你可以使用一个简单的计数器。这是一个简单的实现,可以使用所有(2^32 = 4294967296 个不同的字符串)。别介意它使用大量内存。

import java.util.HashMap;
import java.util.Map;

public class StringToInt {

    private Map<String, Integer> map;

    private int counter = Integer.MIN_VALUE;

    public StringToInt() {
        map = new HashMap<String, Integer>();
    }

    public int toInt(String s) {
        Integer i = map.get(s);
        if (i == null) {
            map.put(s, counter);
            i = counter;
            ++counter;
        }
        return i;
    }
}

回答by Dan Dyer

Have a look at perfect hashing.

看看完美的散列

回答by frankodwyer

In most hashcode() type implementations, collisions are accepted as inevitable and tested for.

在大多数 hashcode() 类型的实现中,冲突被认为是不可避免的并经过测试。

If you absolutely must have no collisions, guaranteed, the solution you outline will work.

如果您绝对必须没有冲突,保证,您概述的解决方案将起作用。

Aside from this, there are cryptographic hash functions such as MD5 and SHA, where collisions are extremely unlikely (though with a lot of effort can be forced). The Java Cryptography Architecture has implementations of these. Those methods may perhaps be faster than a good implementation of your solution for very large sets. They will also execute in constant time and give the same code for the same string, no matter which order the strings are added in. Also, it doesn't require storing each string. Crypto hash results could be considered as integers but they won't fit in a java int - you could use a BigInteger to hold them as suggested in another answer.

除此之外,还有诸如 MD5 和 SHA 之类的加密哈希函数,它们极不可能发生冲突(尽管可以强制执行很多努力)。Java 密码体系结构具有这些实现。对于非常大的集合,这些方法可能比良好的解决方案实现更快。它们还将在恒定时间内执行并为相同的字符串提供相同的代码,无论字符串以何种顺序添加。此外,它不需要存储每个字符串。加密哈希结果可以被视为整数,但它们不适合 java int - 您可以按照另一个答案中的建议使用 BigInteger 来保存它们。

Incidentally, if you're put off by the idea of a collision being 'extremely unlikely', it's probably similar likelihood that a bit would randomly flip in your computer memory or hard disk and cause any program to behave differently than you expect :-)

顺便说一句,如果您对碰撞“极不可能”的想法感到厌烦,那么您的计算机内存或硬盘中的某个位可能会随机翻转并导致任何程序的行为与您预期的不同:-)

Note, there are also some theoretical weaknesses in some hash functions (e.g. MD5) but for your purposes that probably doesn't matter and you could just use the most efficient such function - those weaknesses are only relevant if someone is maliciously trying to come up with strings that have the same code as another string.

请注意,某些哈希函数(例如 MD5)也存在一些理论上的弱点,但对于您的目的而言,这可能无关紧要,您可以使用最有效的此类函数 - 这些弱点仅在有人恶意尝试提出时才相关使用与另一个字符串具有相同代码的字符串。

edit: I just noticed in the title of your question, it seems you want bidirectional mapping, though you don't actually state this in the question. It is (by design) not possible to go from a Crypto hash to the original string. If you really need that, you'd have to store a map keying hashes back to strings.

编辑:我刚刚在您的问题标题中注意到,您似乎想要双向映射,尽管您实际上并未在问题中说明这一点。(按设计)不可能从加密哈希到原始字符串。如果您真的需要它,则必须将映射键控哈希存储回字符串。

回答by Bill the Lizard

There's not going to be an easy or complete solution. We use hashes because there are way more possible Strings than there are ints. Collisions are just a limitation of using a finite number of bits to represent integers.

不会有一个简单或完整的解决方案。我们使用散列是因为字符串比整数多得多。冲突只是使用有限数量的位来表示整数的限制。

回答by Urs Reupke

I'd try to do by introducing an object holding Map and Map. Adding Strings to that object (or maybe having them created from said object) will assign them an Integer value. Requesting a Integer value for a String already registered will return the same value.

我会尝试通过引入一个包含 Map 和 Map 的对象来做。将字符串添加到该对象(或者可能从所述对象创建它们)将为它们分配一个整数值。为已注册的字符串请求整数值将返回相同的值。

Drawbacks: Different launches will yield different Integers for the same String, depending on order unless you somehow persist the whole thing. Also, it's not very object oriented and requires a special object to create/register a String. Plus side: It's quite similar to internalizing Strings and easily understandable. (Also, you asked for an easy, not elegant way.)

缺点:不同的启动会为同一个字符串产生不同的整数,这取决于顺序,除非你以某种方式坚持整个事情。此外,它不是非常面向对象,需要一个特殊的对象来创建/注册一个字符串。好的一面:它与内部化字符串非常相似并且易于理解。(此外,您要求一种简单而不优雅的方式。)

For the more general case, you might create a high level subclass of Object, introduce a "integerize" method there and extend every single class from that. I think, however, that road leads to tears.

对于更一般的情况,您可以创建 Object 的高级子类,在那里引入“整数化”方法并从中扩展每个类。然而,我认为这条路会导致眼泪。

回答by Avi

Since Strings in java are unbounded in length, and each character has 16 bits, and ints have 32 bits, you could only produce a unique mapping of Strings to ints if the Strings were up to two characters. But you could use BigInteger to produce a unique mapping, with something like:

由于 java 中的字符串长度是无限的,每个字符有 16 位,整数有 32 位,如果字符串最多两个字符,您只能生成字符串到整数的唯一映射。但是您可以使用 BigInteger 生成一个唯一的映射,例如:

String s = "my string";
BigInteger bi = new BigInteger(s.getBytes());

Reverse mapping:

反向映射:

String str = new String(bi.toByteArray());

回答by Norman Ramsey

As you outline, a hash table that resolves collisions is a standard solution. You could also use a Bentley/Sedgewick style search trie, which in many applications is faster than hashing.

正如您概述的那样,解决冲突的哈希表是标准解决方案。您还可以使用 Bentley/Sedgewick 样式的搜索树,它在许多应用程序中比散列更快。

If you substitute 'unique pointer' for 'unique integer' you can see Dave Hanson's solution to this problem in C. This is quite a nice abstraction because

如果将“唯一指针”替换为“唯一整数”,您可以在 C 中看到Dave Hanson 对此问题的解决方案。这是一个很好的抽象,因为

  • The pointers can still be used as C strings.

  • Equal strings hash to equal pointers, so strcmpcan be dispensed with in favor of pointer equality, and the pointers can be used as keys in other hash tables.

  • 指针仍可用作 C 字符串。

  • 相等的字符串散列到相等的指针,因此strcmp可以省去支持指针相等,并且指针可以用作其他哈希表中的键。

If Java offers a test for object identityon Stringobjects then you can play the same game there.

如果Java提供一个为对象标识测试String对象,那么你可以在那里玩同一游戏。

回答by Paul Tomblin

Can you use a Map to indicate which Strings you already have assigned integers to? That's kind of the "database-y" solution, where you assign each String a "primary key" from a sequence as it comes up. Then you put the String and Integer pair into a Map so you can look it up again. And if you need the String for a given Integer, you can also put the same pair into a Map.

您可以使用 Map 来指示您已经将整数分配给哪些字符串吗?这是一种“数据库-y”解决方案,您可以在每个字符串出现时从序列中为其分配一个“主键”。然后将 String 和 Integer 对放入 Map 中,以便再次查找。如果您需要给定整数的字符串,您也可以将同一对放入 Map。

回答by devios1

If by integer you mean the data type, then as other posters have explained this is quite impossible, due to the fact that the integer data type is of fixed size, and strings are unbound.

如果整数是指数据类型,那么正如其他海报所解释的那样,这是完全不可能的,因为整数数据类型的大小是固定的,而字符串是未绑定的。

However if you simply mean a positive number, then theoretically you should be able to interpret the string as if it were an "integer" simply by regarding it as a byte array (in a consistent encoding). You could also treat it as an array of integers of arbitrary length, but if you can do that why not just use a string? :)

但是,如果您只是表示一个正数,那么理论上您应该能够将字符串解释为“整数”,只需将其视为字节数组(以一致的编码)。你也可以把它当作一个任意长度的整数数组,但如果你能做到这一点,为什么不只使用一个字符串呢?:)

Implementation speaking, this is usually "solved" by using a hash code and simply double-checking any collisions, since there are likely to be none anyway and on the off chance there is a collision, it still works out to be constant time. However if this isn't applicable, I'm not sure what the best solution would be.

就实现而言,这通常是通过使用哈希码并简单地再次检查任何冲突来“解决”的,因为无论如何可能没有冲突,并且在发生冲突的可能性很小的情况下,它仍然是恒定的时间。但是,如果这不适用,我不确定最好的解决方案是什么。

Interesting question.

有趣的问题。

回答by baz

I don't know if this is practical, but if we take only lowercase letter alphabet, than every word can be viewed as a number in 26-base positional system. For example, if a is 0 and z is 25 than boom is 1*26^3 + 14*26^2 + 14*26^1 + 12*26^0 = 27416

我不知道这是否实用,但如果我们只取小写字母,那么每个单词都可以看作是 26 基位置系统中的一个数字。例如,如果 a 是 0 并且 z 是 25,那么繁荣是 1*26^3 + 14*26^2 + 14*26^1 + 12*26^0 = 27416