Java - 在字符串中查找第一个重复字符的最佳方法是什么
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12305028/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java - What is the best way to find first duplicate character in a string
提问by Niranjan
I have written below code for detecting first duplicate character in a string.
我写了下面的代码来检测字符串中的第一个重复字符。
public static int detectDuplicate(String source) {
boolean found = false;
int index = -1;
final long start = System.currentTimeMillis();
final int length = source.length();
for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) {
boolean shiftPointer = false;
for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) {
if ( source.charAt(outerIndex) == source.charAt(innerIndex)) {
found = true;
index = outerIndex;
} else {
shiftPointer = true;
}
}
}
System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
return index;
}
I need help on two things:
我需要两件事的帮助:
- What is the worst case complexity of this algorithm? - my understanding is O(n).
- Is it the best way to do this? Can somebody provide a better solution (if any)?
- 该算法的最坏情况复杂度是多少?- 我的理解是 O(n)。
- 这是最好的方法吗?有人可以提供更好的解决方案(如果有的话)?
Thanks, NN
谢谢,神经网络
回答by assylias
As mentioned by others, your algorithm is O(n^2). Here is an O(N) algorithm, because HashSet#add runs in constant time ( the hash function disperses the elements properly among the buckets) - Note that I originally size the hashset to the maximum size to avoid resizing/rehashing:
正如其他人所提到的,你的算法是 O(n^2)。这是一个 O(N) 算法,因为 HashSet#add 在恒定时间内运行(散列函数将元素正确地分散在存储桶中) - 请注意,我最初将散列集大小设置为最大大小以避免调整大小/重新散列:
public static int findDuplicate(String s) {
char[] chars = s.toCharArray();
Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1);
for (int i = 0; i < chars.length; i++) {
if (!uniqueChars.add(chars[i])) return i;
}
return -1;
}
Note: this returns the index of the first duplicate (i.e. the index of the first character that is a duplicate of a previous character). To return the index of the first appearance of that character, you would need to store the indices in a Map<Character, Integer>
(Map#put
is also O(1) in this case):
注意:这将返回第一个重复项的索引(即与前一个字符重复的第一个字符的索引)。要返回该字符第一次出现的索引,您需要将索引存储在 a Map<Character, Integer>
(Map#put
在这种情况下也是 O(1) ):
public static int findDuplicate(String s) {
char[] chars = s.toCharArray();
Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1);
for (int i = 0; i < chars.length; i++) {
Integer previousIndex = uniqueChars.put(chars[i], i);
if (previousIndex != null) {
return previousIndex;
}
}
return -1;
}
回答by Tom Anderson
This is O(n**2), not O(n). Consider the case abcdefghijklmnopqrstuvwxyzz
. outerIndex
will range from 0 to 25 before the procedure terminates, and each time it increments, innerIndex
will have ranged from outerIndex
to 26.
这是 O(n**2),而不是 O(n)。考虑一下情况abcdefghijklmnopqrstuvwxyzz
。outerIndex
在过程终止之前,范围从 0 到 25,每次增加时,innerIndex
范围从outerIndex
到 26。
To get to O(n), you need to make a single pass over the list, and to do O(1) work at each position. Since the job to do at each position is to check if the character has been seen before (and if so, where), that means you need an O(1) map implementation. A hashtable gives you that; so does an array, indexed by the character code.
要达到 O(n),您需要对列表进行一次遍历,并在每个位置执行 O(1) 工作。由于在每个位置要做的工作是检查角色之前是否见过(如果见过,在哪里),这意味着您需要一个 O(1) 映射实现。哈希表为您提供了这一点;由字符代码索引的数组也是如此。
assylias shows how to do it with hashing, so here's how to do it with an array (just for laughs, really):
assylias展示了如何使用 hashing 做到这一点,所以这里是如何使用数组做到这一点(只是为了笑,真的):
public static int detectDuplicate(String source) {
int[] firstOccurrence = new int[1 << Character.SIZE];
Arrays.fill(firstOccurrence, -1);
for (int i = 0; i < source.length(); i++) {
char ch = source.charAt(i);
if (firstOccurrence[ch] != -1) return firstOccurrence[ch];
else firstOccurrence[ch] = i;
}
return -1;
}
回答by Qnan
The complexity is roughly O(M^2)
, where M
is the minimum between the length of the string and the size of the set of possible characters K
.
复杂性大致是O(M^2)
,其中M
是字符串长度和可能字符集大小之间的最小值K
。
You can get it down to O(M)
with O(K)
memory by simply memorizing the position where you first encounter every unique character.
你可以把它降低到O(M)
与O(K)
通过简单地记住您第一次遇到的每一个独特的性格位置存储单元。
回答by Niranjan
Okay, I found below logic to reduce O(N^2)
to O(N)
.
好的,我发现下面的逻辑可以减少O(N^2)
到O(N)
.
public static int detectDuplicate(String source) {
int index = -1;
boolean found = false;
final long start = System.currentTimeMillis();
for(int i = 1; i <= source.length() && !found; i++) {
if(source.charAt(i) == source.charAt(i-1)) {
index = (i - 1);
found = true;
}
}
System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
return index;
}
This also shows performance improvement over my previous algorithm which has 2 nested loops. This takes 130ms.
to detect first duplicate character from 63million
characters where the duplicate character is present at the end.
这也显示了比我以前的具有 2 个嵌套循环的算法的性能改进。这需要130ms.
从63million
末尾出现重复字符的字符中检测第一个重复字符。
I am not confident if this is the best solution. If anyone finds a better one, please please share it.
我不确定这是否是最佳解决方案。如果有人找到更好的,请分享。
Thanks,
谢谢,
NN
神经网络
回答by Tyler Durden
I can substantially improve your algorithm. It should be done like this:
我可以大大改进你的算法。应该这样做:
StringBuffer source ...
char charLast = source.charAt( source.len()-1 );
int xLastChar = source.len()-1;
source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) );
int i = 1;
while( true ){
if( source.charAt(i) == source.charAt(i-1) ) break;
i += 1;
}
source.setCharAt( xLastChar, charLast );
if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1;
return i;
For a large string this algorithm is probably twice as fast as yours.
对于大字符串,此算法的速度可能是您的两倍。
回答by Ankita Walia
You could try with:
你可以试试:
public static char firstRecurringChar(String s)
{
char x=' ';
System.out.println("STRING : "+s);
for(int i =0;i<s.length();i++)
{
System.out.println("CHAR AT "+i+" = " +s.charAt(i));
System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i)));
if(s.lastIndexOf(s.charAt(i)) >i){
x=s.charAt(i);
break;
}
}
return x;
}
回答by amoebe
O(1)
Algorithm
O(1)
算法
Your solution is O(n^2) because of the two nested loops.
由于两个嵌套循环,您的解决方案是 O(n^2)。
The fastest algorithm to do this is O(1)
(constant time):
最快的算法是O(1)
(恒定时间):
public static int detectDuplicate(String source) {
boolean[] foundChars = new boolean[Character.MAX_VALUE+1];
for(int i = 0; i < source.length(); i++) {
if(i >= Character.MAX_VALUE) return Character.MAX_VALUE;
char currentChar = source.charAt(i);
if(foundChars[currentChar]) return i;
foundChars[currentChar] = true;
}
return -1;
}
However, this is only fast in terms of big oh.
不过,这只是快的大哦。