java 在java中查找字符串中字符频率的有效方法:O(n)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6215486/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficient way to find Frequency of a character in a String in java : O(n)
提问by crackerplace
In a recent interview I was asked to write the below program. Find out the character whose frequency is minimum in the given String ? So I tried by iterating through the string by using charAt and storing the character as key in a HashMap and the number of occurences as its value. Now Again I have to iterate on the Map to find the lowest element.
在最近的一次采访中,我被要求编写以下程序。找出给定 String 中频率最小的字符?所以我尝试通过使用 charAt 遍历字符串并将字符作为键存储在 HashMap 中,并将出现次数作为其值。现在我再次必须在 Map 上迭代以找到最低的元素。
Is there a more efficient way to do it as obviously the above one is too intensive i guess.
有没有更有效的方法来做到这一点,因为我猜上面的方法显然太密集了。
Update and Another Solution
更新和另一个解决方案
After some thought process and answers I think the best time that this can be is O(n). In the first iteration we will have to iterate through the String character by character and then store their frequency in an Array at the specific position(character is an int) and same time have two temporary variables which maintain the least count and the corresponding character.So when I go to the next character and store its frequency in arr[char] = arr[char]+1;At the same time I will check if the temp varible has a value greater than this value,if yes then the temp varible will be this value and also the char will be this one.In this way i suppose we dont need a second iteration to find the smallest and also no sorting is required I guess
经过一些思考过程和答案,我认为这可能是 O(n) 的最佳时间。在第一次迭代中,我们必须逐个字符地遍历字符串,然后将它们的频率存储在 Array 中的特定位置(字符是 int),同时有两个临时变量,它们保持最少的计数和相应的字符。因此,当我转到下一个字符并将其频率存储在 arr[char] = arr[char]+1; 同时我将检查临时变量的值是否大于此值,如果是,则临时变量将是这个值,字符也将是这个值。这样我想我们不需要第二次迭代来找到最小的,也不需要排序我猜
.... Wat say ? Or any more solutions
.... Wat 说什么?或更多解决方案
回答by Jay
I'd use an array rather than a hash map. If we're limited to ascii, that's just 256 entries; if we're using Unicode, 64k. Either way not an impossible size. Besides that, I don't see how you could improve on your approach. I'm trying to think of some clever trick to make it more efficient but I can't come up with any.
我会使用数组而不是哈希映射。如果我们仅限于 ascii,那只有 256 个条目;如果我们使用 Unicode,则为 64k。无论哪种方式都不是不可能的尺寸。除此之外,我看不出您可以如何改进您的方法。我试图想出一些聪明的技巧来提高效率,但我想不出任何办法。
Seems to me the answer is almost always going to be a whole list of characters: all of those that are used zero times.
在我看来,答案几乎总是一个完整的字符列表:所有使用零次的字符。
Update
更新
This is probably clost to the most efficient it could be in Java. For convenience, I'm assuming we're using plain Ascii.
这可能是 Java 中最高效的。为方便起见,我假设我们使用的是普通的 Ascii。
public List<Character> rarest(String s)
{
int[] freq=new int[256];
for (int p=s.length()-1;p>=0;--p)
{
char c=s.charAt(p);
if (c>255)
throw new UnexpectedDataException("Wasn't expecting that");
++freq[c];
}
int min=Integer.MAX_VALUE;
for (int x=freq.length-1;x>=0;--x)
{
// I'm assuming we don't want chars with frequency of zero
if (freq[x]>0 && min>freq[x])
min=freq[x];
}
List<Character> rares=new ArrayList<Character>();
for (int x=freq.length-1;x>=0;--x)
{
if (freq[x]==min)
rares.add((char)x);
}
return rares;
}
Any effort to keep the list sorted by frequency as you go is going to be way more inefficient, because it will have to re-sort every time you examine one character.
任何保持列表按频率排序的努力都会降低效率,因为每次检查一个字符时都必须重新排序。
Any attempt to sort the list of frequencies at all is going to be more inefficient, as sorting the whole list is clearly going to be slower than just picking the smallest value.
任何对频率列表进行排序的尝试都将效率低下,因为对整个列表进行排序显然比仅选择最小值要慢。
Sorting the string and then counting is going to be slower because the sort will be more expensive than the count.
对字符串进行排序然后计数会变慢,因为排序比计数更昂贵。
Technically, it would be faster to create a simple array at the end rather than an ArrayList, but the ArrayList makes slightly more readable code.
从技术上讲,在末尾创建一个简单的数组而不是 ArrayList 会更快,但 ArrayList 使代码可读性稍高。
There may be a way to do it faster, but I suspect this is close to the optimum solution. I'd certainly be interested to see if someone has a better idea.
可能有一种方法可以更快地做到这一点,但我怀疑这接近于最佳解决方案。我当然有兴趣看看是否有人有更好的主意。
回答by Thomas Mueller
I think your approach is in theory the most efficient (O(n)). However in practice it needs quite a lot of memory, and is probably very slow.
我认为你的方法理论上是最有效的(O(n))。然而在实践中它需要相当多的内存,而且可能很慢。
It is possibly more efficient (at least it uses less memory) to convert the string to a char array, sort the array, and then calculate the frequencies using a simple loop. However, in theory it is less efficient (O(n log n)) because of sorting (unless you use a more efficient sort algorithm).
将字符串转换为字符数组,对数组进行排序,然后使用简单循环计算频率可能更有效(至少它使用更少的内存)。但是,理论上它的效率较低(O(n log n)),因为排序(除非您使用更有效的排序算法)。
Test case:
测试用例:
import java.util.Arrays;
public class Test {
public static void main(String... args) throws Exception {
// System.out.println(getLowFrequencyChar("x"));
// System.out.println(getLowFrequencyChar("bab"));
// System.out.println(getLowFrequencyChar("babaa"));
for (int i = 0; i < 5; i++) {
long start = System.currentTimeMillis();
for (int j = 0; j < 1000000; j++) {
getLowFrequencyChar("long start = System.currentTimeMillis();");
}
System.out.println(System.currentTimeMillis() - start);
}
}
private static char getLowFrequencyChar(String string) {
int len = string.length();
if (len == 0) {
return 0;
} else if (len == 1) {
return string.charAt(0);
}
char[] chars = string.toCharArray();
Arrays.sort(chars);
int low = Integer.MAX_VALUE, f = 1;
char last = chars[0], x = 0;
for (int i = 1; i < len; i++) {
char c = chars[i];
if (c != last) {
if (f < low) {
if (f == 1) {
return last;
}
low = f;
x = last;
}
last = c;
f = 1;
} else {
f++;
}
}
if (f < low) {
x = last;
}
return (char) x;
}
}
回答by Mir Abbas Ali
The process of finding frequency of characters in a String is very easy.
For answer see my code.
查找字符串中字符出现频率的过程非常简单。
有关答案,请参阅我的代码。
import java.io.*;
public class frequency_of_char
{
public static void main(String args[])throws IOException
{
BufferedReader in=new BufferedReader(new InputStreamReader(System.in));
int ci,i,j,k,l;l=0;
String str,str1;
char c,ch;
System.out.println("Enter your String");
str=in.readLine();
i=str.length();
for(c='A';c<='z';c++)
{
k=0;
for(j=0;j<i;j++)
{
ch=str.charAt(j);
if(ch==c)
k++;
}
if(k>0)
System.out.println("The character "+c+" has occured for "+k+" times");
}
}
}
回答by CarManuel
Having to iterate through the HashMap is not necessarily bad. That will only be O(h)
where h
is the HashMap's length--the number of unique characters--which in this case will always be less than or equal to n
. For the example "aaabbc"
, h = 3
for the three unique characters. But, since h
is strictly less than the number of possible characters: 255, it is constant. So, your big-oh will be O(n+h)
which is actually O(n)
since h
is constant. I don't know of any algorithm that could get a better big-oh, you could try to have a bunch of java specific optimizations, but that said here is a simple algorithm I wrote that finds the char
with the lowest frequency. It returns "c"
from the input "aaabbc"
.
必须遍历 HashMap 不一定是坏事。将只O(h)
在那里h
是HashMap中的长度-独特的字符数-在这种情况下将始终小于或等于n
。例如"aaabbc"
,h = 3
对于三个唯一字符。但是,由于h
严格小于可能的字符数:255,所以它是常数。所以,你的 big-oh 将是O(n+h)
这实际上是O(n)
因为它h
是恒定的。我不知道有什么算法可以得到更好的结果,哦,你可以尝试进行一堆 Java 特定的优化,但是这里说的是我写的一个简单的算法,它可以找到char
频率最低的算法。它"c"
从输入返回"aaabbc"
。
import java.util.HashMap;
import java.util.Map;
public class StackOverflowQuestion {
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println("" + findLowestFrequency("aaabbc"));
}
public static char findLowestFrequency(String input) {
Map<Character, Integer> map = new HashMap<Character, Integer>();
for (char c : input.toCharArray())
if (map.containsKey(c))
map.put(c, map.get(c) + 1);
else
map.put(c, 0);
char rarest = map.keySet().iterator().next();
for (char c : map.keySet())
if (map.get(c) < map.get(rarest))
rarest = c;
return rarest;
}
}
回答by Tucker
I'd do it the following way as it involves the fewest lines of code:
我会按照以下方式进行,因为它涉及最少的代码行:
character you wish to want to know frequency of: "_"
String "this_is_a_test"
你想知道频率的字符:“_”
字符串“this_is_a_test”
String testStr = "this_is_a_test";
String[] parts = testStr.split("_"); //note you need to use regular expressions here
int freq = parts.length -1;
You may find weird things happen if the string starts or ends with the character in question, but I'll leave it to you to test for that.
如果字符串以相关字符开头或结尾,您可能会发现奇怪的事情发生,但我会让您自行测试。