Java 在不使用额外数据结构和小写字符假设的情况下确定一个字符串具有所有唯一字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21057827/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 06:15:35  来源:igfitidea点击:

Determining a string has all unique characters without using additional data structures and without the lowercase characters assumption

javastringalgorithmbit-manipulationbitvector

提问by user3184017

This is one of the questions in the Cracking the Coding Interviewbookby Gayle Laakmann McDowell:

这是Gayle Laakmann McDowell在Cracking the Coding Interview一书中提出的问题之一:

Implement an algorithm to determine if a string has all unique characters. What if you can not use additional data structures?

实现一个算法来确定一个字符串是否包含所有唯一字符。如果不能使用额外的数据结构怎么办?

The author wrote:

作者写道:

We can reduce our space usage a little bit by using a bit vector. We will assume, in the below code, that the string is only lower case 'a'through 'z'. This will allow us to use just a single int.

我们可以通过使用位向量来减少我们的空间使用量。我们假设,在下面的代码,该字符串只有较低的情况下'a'通过'z'。这将允许我们只使用一个 int。

The author has this implementation:

作者有这个实现:

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0)
            return false;
        checker |= (1 << val);
    }
    return true;
}

Let's say we get rid of the assumption that "the string is only lower case 'a'through 'z'". Instead, the string can contain any kind of character—like ASCII characters or Unicode characters.

假设我们摆脱了“字符串仅'a'通过'z'”小写的假设。相反,字符串可以包含任何类型的字符,如 ASCII 字符或 Unicode 字符。

Is there a solution as efficient as the author's (or a solution that comes close to being as efficient as the author's)?

有没有和作者一样高效的解决方案(或者接近于作者一样高效的解决方案)?

Related questions:

相关问题:

采纳答案by vandale

for the asccii character set you can represent the 256bits in 4 longs: you basically hand code an array.

对于 asccii 字符集,您可以用 4 个 long 表示 256 位:您基本上手动编码一个数组。

public static boolean isUniqueChars(String str) {
    long checker1 = 0;
    long checker2 = 0;
    long checker3 = 0;
    long checker4 = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i);
        int toCheck = val / 64;
        val %= 64;
        switch (toCheck) {
            case 0:
                if ((checker1 & (1L << val)) > 0) {
                    return false;
                }
                checker1 |= (1L << val);
                break;
            case 1:
                if ((checker2 & (1L << val)) > 0) {
                    return false;
                }
                checker2 |= (1L << val);
                break;
            case 2:
                if ((checker3 & (1L << val)) > 0) {
                    return false;
                }
                checker3 |= (1L << val);
                break;
            case 3:
                if ((checker4 & (1L << val)) > 0) {
                    return false;
                }
                checker4 |= (1L << val);
                break;
        }            
    }
    return true;
}

You can use the following code to generate the body of a similar method for unicode characters:

您可以使用以下代码为 un​​icode 字符生成类似方法的主体:

static void generate() {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("long checker%d = 0;%n", i));
    }
    sb.append("for (int i = 0; i < str.length(); ++i) {\n"
            + "int val = str.charAt(i);\n"
            + "int toCheck = val / 64;\n"
            + "val %= 64;\n"
            + "switch (toCheck) {\n");
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("case %d:\n"
                + "if ((checker%d & (1L << val)) > 0) {\n"
                + "return false;\n"
                + "}\n"
                + "checker%d |= (1L << val);\n"
                + "break;\n", i, i, i));
    }
    sb.append("}\n"
            + "}\n"
            + "return true;");
    System.out.println(sb);
}

回答by Bohemian

You only need one line... well less than one line actually:

你只需要一行……实际上不到一行:

if (str.matches("((.)(?!.*\1))*"))

this uses a negative look ahead to assert that each character is not repeated later in the string.

这使用否定前瞻来断言每个字符在字符串的后面不会重复。

This approach a time complexity of O(n^2), because for all n characters in the input, all characters that follow (there are n of those) are compared for equality.

这种方法的时间复杂度为 O(n^2),因为对于输入中的所有 n 个字符,后面的所有字符(其中有 n 个)都进行相等性比较。

回答by A. I. Breveleri

I think we need a general and practical definition of "additional data structures". Intuitively, we don't want to call every scalar integer or pointer a "data structure", because that makes nonsense of any prohibition of "additional data structures".

我认为我们需要对“附加数据结构”有一个通用且实用的定义。直观地说,我们不想将每个标量整数或指针都称为“数据结构”,因为这与禁止“附加数据结构”毫无意义。

I propose we borrow a concept from big-O notation: an "additional data structure" is one that grows with the size of the data set.

我建议我们从 big-O 符号借用一个概念:“附加数据结构”是一种随着数据集的大小而增长的结构。

In the present case, the code quoted by the OP appears to have a space requirement of O(1) because the bit vector happens to fit into an integer type. But as the OP implies, the general form of the problem is really O(N).

在本例中,OP 引用的代码似乎有 O(1) 的空间要求,因为位向量恰好适合整数类型。但正如 OP 所暗示的那样,问题的一般形式实际上是 O(N)。

An example of a solution to the general case is to use two pointers and a nested loop to simply compare every character to every other. The space requirement is O(1) but the time requirement is O(N^2).

一般情况的解决方案的一个示例是使用两个指针和嵌套循环来简单地将每个字符相互比较。空间要求是 O(1) 但时间要求是 O(N^2)。

回答by raj

How about the following algorithm?

下面的算法怎么样?

Steps:

脚步:

Convert string to lowercase.

将字符串转换为小写。

Loop through each character in the string

循环遍历字符串中的每个字符

Set variable data = 0

设置变量数据 = 0

Calculate offset = ascii value of first character in string - 97

计算偏移量 = 字符串中第一个字符的 ascii 值 - 97

Set the flag for that position with mask = 1 << offset

使用 mask = 1 << offset 设置该位置的标志

If bitwise AND returns true, then a character is repeating (mask & data), so break here.

如果按位 AND 返回 true,则一个字符正在重复(掩码和数据),因此在此处中断。

else if we have not seen the character repeat yet, set the bit for that character by doing a bitwise OR by doing data = data | mask

否则,如果我们还没有看到字符重复,则通过按位或通过执行 data = data | 为该字符设置位。面具

Continue till end of characters.

继续直到字符结束。

回答by Shane aalam

One-line solution without any extra data structure:

没有任何额外数据结构的单行解决方案:

str.chars().distinct().count() == (int)str.length();