Java - 在字符串中查找第一个重复字符的最佳方法是什么

Question

提问by Niranjan

I have written below code for detecting first duplicate character in a string.

我写了下面的代码来检测字符串中的第一个重复字符。

public static int detectDuplicate(String source) {
    boolean found = false;
    int index = -1;
    final long start = System.currentTimeMillis();
    final int length = source.length();
    for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) {
        boolean shiftPointer = false;
        for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) {
            if ( source.charAt(outerIndex) == source.charAt(innerIndex)) {
                found = true;
                index = outerIndex;
            } else {
                shiftPointer = true;
            }
        }
    }
    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

I need help on two things:

我需要两件事的帮助：

What is the worst case complexity of this algorithm? - my understanding is O(n).
Is it the best way to do this? Can somebody provide a better solution (if any)?

该算法的最坏情况复杂度是多少？- 我的理解是 O(n)。
这是最好的方法吗？有人可以提供更好的解决方案（如果有的话）？

Thanks, NN

谢谢，神经网络

Answer 1

回答by assylias

As mentioned by others, your algorithm is O(n^2). Here is an O(N) algorithm, because HashSet#add runs in constant time ( the hash function disperses the elements properly among the buckets) - Note that I originally size the hashset to the maximum size to avoid resizing/rehashing:

正如其他人所提到的，你的算法是 O(n^2)。这是一个 O(N) 算法，因为 HashSet#add 在恒定时间内运行（散列函数将元素正确地分散在存储桶中） - 请注意，我最初将散列集大小设置为最大大小以避免调整大小/重新散列：

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        if (!uniqueChars.add(chars[i])) return i;
    }
    return -1;
}

Note: this returns the index of the first duplicate (i.e. the index of the first character that is a duplicate of a previous character). To return the index of the first appearance of that character, you would need to store the indices in a Map<Character, Integer>(Map#putis also O(1) in this case):

注意：这将返回第一个重复项的索引（即与前一个字符重复的第一个字符的索引）。要返回该字符第一次出现的索引，您需要将索引存储在 a Map<Character, Integer>（Map#put在这种情况下也是 O(1) ）：

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        Integer previousIndex = uniqueChars.put(chars[i], i);
        if (previousIndex != null) {
            return previousIndex;
        }
    }
    return -1;
}

Answer 2

回答by Tom Anderson

This is O(n**2), not O(n). Consider the case abcdefghijklmnopqrstuvwxyzz. outerIndexwill range from 0 to 25 before the procedure terminates, and each time it increments, innerIndexwill have ranged from outerIndexto 26.

这是 O(n**2)，而不是 O(n)。考虑一下情况abcdefghijklmnopqrstuvwxyzz。outerIndex在过程终止之前，范围从 0 到 25，每次增加时，innerIndex范围从outerIndex到 26。

To get to O(n), you need to make a single pass over the list, and to do O(1) work at each position. Since the job to do at each position is to check if the character has been seen before (and if so, where), that means you need an O(1) map implementation. A hashtable gives you that; so does an array, indexed by the character code.

要达到 O(n)，您需要对列表进行一次遍历，并在每个位置执行 O(1) 工作。由于在每个位置要做的工作是检查角色之前是否见过（如果见过，在哪里），这意味着您需要一个 O(1) 映射实现。哈希表为您提供了这一点；由字符代码索引的数组也是如此。

assylias shows how to do it with hashing, so here's how to do it with an array (just for laughs, really):

assylias展示了如何使用 hashing 做到这一点，所以这里是如何使用数组做到这一点（只是为了笑，真的）：

public static int detectDuplicate(String source) {
    int[] firstOccurrence = new int[1 << Character.SIZE];
    Arrays.fill(firstOccurrence, -1);
    for (int i = 0; i < source.length(); i++) {
        char ch = source.charAt(i);
        if (firstOccurrence[ch] != -1) return firstOccurrence[ch];
        else firstOccurrence[ch] = i;
    }
    return -1;
}

Answer 3

回答by Qnan

The complexity is roughly O(M^2), where Mis the minimum between the length of the string and the size of the set of possible characters K.

复杂性大致是O(M^2)，其中M是字符串长度和可能字符集大小之间的最小值K。

You can get it down to O(M)with O(K)memory by simply memorizing the position where you first encounter every unique character.

你可以把它降低到O(M)与O(K)通过简单地记住您第一次遇到的每一个独特的性格位置存储单元。

Answer 4

回答by Niranjan

Okay, I found below logic to reduce O(N^2)to O(N).

好的，我发现下面的逻辑可以减少O(N^2)到O(N).

public static int detectDuplicate(String source) {
    int index = -1;
    boolean found = false;
    final long start = System.currentTimeMillis();

    for(int i = 1; i <= source.length() && !found; i++) {
        if(source.charAt(i) == source.charAt(i-1)) {
            index = (i - 1);
            found = true;
        }
    }

    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

This also shows performance improvement over my previous algorithm which has 2 nested loops. This takes 130ms.to detect first duplicate character from 63millioncharacters where the duplicate character is present at the end.

这也显示了比我以前的具有 2 个嵌套循环的算法的性能改进。这需要130ms.从63million末尾出现重复字符的字符中检测第一个重复字符。

I am not confident if this is the best solution. If anyone finds a better one, please please share it.

我不确定这是否是最佳解决方案。如果有人找到更好的，请分享。

Thanks,

谢谢，

NN

神经网络

Answer 5

回答by Tyler Durden

I can substantially improve your algorithm. It should be done like this:

我可以大大改进你的算法。应该这样做：

StringBuffer source ...
char charLast = source.charAt( source.len()-1 );
int xLastChar = source.len()-1;
source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) );
int i = 1;
while( true ){
    if( source.charAt(i) == source.charAt(i-1) ) break;
    i += 1;
}
source.setCharAt( xLastChar, charLast );
if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1;
return i;

For a large string this algorithm is probably twice as fast as yours.

对于大字符串，此算法的速度可能是您的两倍。

Answer 6

回答by Ankita Walia

You could try with:

你可以试试：

 public static char firstRecurringChar(String s)
    {
    char x=' ';
    System.out.println("STRING : "+s);
    for(int i =0;i<s.length();i++)
    {
        System.out.println("CHAR AT "+i+" = " +s.charAt(i));
        System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i)));
        if(s.lastIndexOf(s.charAt(i)) >i){
            x=s.charAt(i);
            break;
        }
    }
    return x;
    }

Answer 7

回答by amoebe

O(1)Algorithm

O(1)算法

Your solution is O(n^2) because of the two nested loops.

由于两个嵌套循环，您的解决方案是 O(n^2)。

The fastest algorithm to do this is O(1)(constant time):

最快的算法是O(1)（恒定时间）：

public static int detectDuplicate(String source) {
    boolean[] foundChars = new boolean[Character.MAX_VALUE+1];
    for(int i = 0; i < source.length(); i++) {
        if(i >= Character.MAX_VALUE) return Character.MAX_VALUE;
        char currentChar = source.charAt(i);
        if(foundChars[currentChar]) return i;
        foundChars[currentChar] = true;
    }
    return -1;
}

However, this is only fast in terms of big oh.

不过，这只是快的大哦。

Java - 在字符串中查找第一个重复字符的最佳方法是什么

提问by Niranjan

回答by assylias

回答by Tom Anderson

回答by Qnan

回答by Niranjan

回答by Tyler Durden

回答by Ankita Walia

回答by amoebe

相关推荐

最近更新

标签

Java - 在字符串中查找第一个重复字符的最佳方法是什么

提问by Niranjan

回答by assylias

回答by Tom Anderson

回答by Qnan

回答by Niranjan

回答by Tyler Durden

回答by Ankita Walia

回答by amoebe

相关推荐

java 将日期/时间添加到 JFreeChart 图

java 如何在运行时使用反射更改注释值？

java 为什么Java中没有局部静态变量？

Java 将 ArrayList 转换为字符串并返回到 ArrayList？

相关推荐

最近更新

标签