Java 在不使用额外数据结构和小写字符假设的情况下确定一个字符串具有所有唯一字符

Question

提问by user3184017

This is one of the questions in the Cracking the Coding Interviewbookby Gayle Laakmann McDowell:

这是Gayle Laakmann McDowell在Cracking the Coding Interview一书中提出的问题之一：

Implement an algorithm to determine if a string has all unique characters. What if you can not use additional data structures?

实现一个算法来确定一个字符串是否包含所有唯一字符。如果不能使用额外的数据结构怎么办？

The author wrote:

作者写道：

We can reduce our space usage a little bit by using a bit vector. We will assume, in the below code, that the string is only lower case 'a'through 'z'. This will allow us to use just a single int.

我们可以通过使用位向量来减少我们的空间使用量。我们假设，在下面的代码，该字符串只有较低的情况下'a'通过'z'。这将允许我们只使用一个 int。

The author has this implementation:

作者有这个实现：

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0)
            return false;
        checker |= (1 << val);
    }
    return true;
}

Let's say we get rid of the assumption that "the string is only lower case 'a'through 'z'". Instead, the string can contain any kind of character—like ASCII characters or Unicode characters.

假设我们摆脱了“字符串仅'a'通过'z'”小写的假设。相反，字符串可以包含任何类型的字符，如 ASCII 字符或 Unicode 字符。

Is there a solution as efficient as the author's (or a solution that comes close to being as efficient as the author's)?

有没有和作者一样高效的解决方案（或者接近于作者一样高效的解决方案）？

采纳答案by vandale

for the asccii character set you can represent the 256bits in 4 longs: you basically hand code an array.

对于 asccii 字符集，您可以用 4 个 long 表示 256 位：您基本上手动编码一个数组。

public static boolean isUniqueChars(String str) {
    long checker1 = 0;
    long checker2 = 0;
    long checker3 = 0;
    long checker4 = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i);
        int toCheck = val / 64;
        val %= 64;
        switch (toCheck) {
            case 0:
                if ((checker1 & (1L << val)) > 0) {
                    return false;
                }
                checker1 |= (1L << val);
                break;
            case 1:
                if ((checker2 & (1L << val)) > 0) {
                    return false;
                }
                checker2 |= (1L << val);
                break;
            case 2:
                if ((checker3 & (1L << val)) > 0) {
                    return false;
                }
                checker3 |= (1L << val);
                break;
            case 3:
                if ((checker4 & (1L << val)) > 0) {
                    return false;
                }
                checker4 |= (1L << val);
                break;
        }            
    }
    return true;
}

You can use the following code to generate the body of a similar method for unicode characters:

您可以使用以下代码为 unicode 字符生成类似方法的主体：

static void generate() {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("long checker%d = 0;%n", i));
    }
    sb.append("for (int i = 0; i < str.length(); ++i) {\n"
            + "int val = str.charAt(i);\n"
            + "int toCheck = val / 64;\n"
            + "val %= 64;\n"
            + "switch (toCheck) {\n");
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("case %d:\n"
                + "if ((checker%d & (1L << val)) > 0) {\n"
                + "return false;\n"
                + "}\n"
                + "checker%d |= (1L << val);\n"
                + "break;\n", i, i, i));
    }
    sb.append("}\n"
            + "}\n"
            + "return true;");
    System.out.println(sb);
}

Answer 2

回答by Bohemian

You only need one line... well less than one line actually:

你只需要一行……实际上不到一行：

if (str.matches("((.)(?!.*\1))*"))

this uses a negative look ahead to assert that each character is not repeated later in the string.

这使用否定前瞻来断言每个字符在字符串的后面不会重复。

This approach a time complexity of O(n^2), because for all n characters in the input, all characters that follow (there are n of those) are compared for equality.

这种方法的时间复杂度为 O(n^2)，因为对于输入中的所有 n 个字符，后面的所有字符（其中有 n 个）都进行相等性比较。

Answer 3

回答by A. I. Breveleri

I think we need a general and practical definition of "additional data structures". Intuitively, we don't want to call every scalar integer or pointer a "data structure", because that makes nonsense of any prohibition of "additional data structures".

我认为我们需要对“附加数据结构”有一个通用且实用的定义。直观地说，我们不想将每个标量整数或指针都称为“数据结构”，因为这与禁止“附加数据结构”毫无意义。

I propose we borrow a concept from big-O notation: an "additional data structure" is one that grows with the size of the data set.

我建议我们从 big-O 符号借用一个概念：“附加数据结构”是一种随着数据集的大小而增长的结构。

In the present case, the code quoted by the OP appears to have a space requirement of O(1) because the bit vector happens to fit into an integer type. But as the OP implies, the general form of the problem is really O(N).

在本例中，OP 引用的代码似乎有 O(1) 的空间要求，因为位向量恰好适合整数类型。但正如 OP 所暗示的那样，问题的一般形式实际上是 O(N)。

An example of a solution to the general case is to use two pointers and a nested loop to simply compare every character to every other. The space requirement is O(1) but the time requirement is O(N^2).

一般情况的解决方案的一个示例是使用两个指针和嵌套循环来简单地将每个字符相互比较。空间要求是 O(1) 但时间要求是 O(N^2)。

Answer 4

回答by raj

How about the following algorithm?

下面的算法怎么样？

Steps:

脚步：

Convert string to lowercase.

将字符串转换为小写。

Loop through each character in the string

循环遍历字符串中的每个字符

Set variable data = 0

设置变量数据 = 0

Calculate offset = ascii value of first character in string - 97

计算偏移量 = 字符串中第一个字符的 ascii 值 - 97

Set the flag for that position with mask = 1 << offset

使用 mask = 1 << offset 设置该位置的标志

If bitwise AND returns true, then a character is repeating (mask & data), so break here.

如果按位 AND 返回 true，则一个字符正在重复（掩码和数据），因此在此处中断。

else if we have not seen the character repeat yet, set the bit for that character by doing a bitwise OR by doing data = data | mask

否则，如果我们还没有看到字符重复，则通过按位或通过执行 data = data | 为该字符设置位。面具

Continue till end of characters.

继续直到字符结束。

Answer 5

回答by Shane aalam

One-line solution without any extra data structure:

没有任何额外数据结构的单行解决方案：

str.chars().distinct().count() == (int)str.length();

Java 在不使用额外数据结构和小写字符假设的情况下确定一个字符串具有所有唯一字符

提问by user3184017

采纳答案by vandale

回答by Bohemian

回答by A. I. Breveleri

回答by raj

回答by Shane aalam

相关推荐

最近更新

标签

Java 在不使用额外数据结构和小写字符假设的情况下确定一个字符串具有所有唯一字符

提问by user3184017

采纳答案by vandale

回答by Bohemian

回答by A. I. Breveleri

回答by raj

回答by Shane aalam

相关推荐

将 Java 列表转换为 Javascript 数组

Java Oracle JDBC 间歇性连接问题

Java 电话号码验证正则表达式在开头和前面的数字中包含一个加号

如何使用程序运行特定版本的 Java？

相关推荐

最近更新

标签