在 Java 中遍历字符串字符的最简单/最好/最正确的方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/196830/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 11:11:20  来源:igfitidea点击:

What is the easiest/best/most correct way to iterate through the characters of a string in Java?

javastringiterationcharactertokenize

提问by Paul Wicks

StringTokenizer? Convert the Stringto a char[]and iterate over that? Something else?

StringTokenizer? 将 the 转换String为 achar[]并对其进行迭代?还有什么?

采纳答案by jjnguy

I use a for loop to iterate the string and use charAt()to get each character to examine it. Since the String is implemented with an array, the charAt()method is a constant time operation.

我使用 for 循环来迭代字符串并使用charAt()来获取每个字符来检查它。由于 String 是用数组实现的,因此该charAt()方法是一个常数时间操作。

String s = "...stuff...";

for (int i = 0; i < s.length(); i++){
    char c = s.charAt(i);        
    //Process char
}

That's what I would do. It seems the easiest to me.

这就是我会做的。这对我来说似乎是最简单的。

As far as correctness goes, I don't believe that exists here. It is all based on your personal style.

就正确性而言,我认为这里不存在。这一切都取决于您的个人风格。

回答by Alan

I wouldn't use StringTokenizeras it is one of classes in the JDK that's legacy.

我不会使用StringTokenizer它,因为它是 JDK 中遗留的类之一。

The javadoc says:

javadoc 说:

StringTokenizeris a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of Stringor the java.util.regexpackage instead.

StringTokenizer是出于兼容性原因保留的遗留类,尽管不鼓励在新代码中使用它。建议任何寻求此功能的人使用Stringjava.util.regex包的拆分方法。

回答by Eugene Yokota

See The Java Tutorials: Strings.

请参阅Java 教程:字符串

public class StringDemo {
    public static void main(String[] args) {
        String palindrome = "Dot saw I was Tod";
        int len = palindrome.length();
        char[] tempCharArray = new char[len];
        char[] charArray = new char[len];

        // put original string in an array of chars
        for (int i = 0; i < len; i++) {
            tempCharArray[i] = palindrome.charAt(i);
        } 

        // reverse array of chars
        for (int j = 0; j < len; j++) {
            charArray[j] = tempCharArray[len - 1 - j];
        }

        String reversePalindrome =  new String(charArray);
        System.out.println(reversePalindrome);
    }
}

Put the length into int lenand use forloop.

将长度放入int len并使用for循环。

回答by Bruno De Fraine

There are some dedicated classes for this:

有一些专门的课程:

import java.text.*;

final CharacterIterator it = new StringCharacterIterator(s);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
   // process c
   ...
}

回答by Dave Cheney

Two options

两种选择

for(int i = 0, n = s.length() ; i < n ; i++) { 
    char c = s.charAt(i); 
}

or

或者

for(char c : s.toCharArray()) {
    // process c
}

The first is probably faster, then 2nd is probably more readable.

第一个可能更快,然后第二个可能更具可读性。

回答by Alan Moore

StringTokenizer is totally unsuited to the task of breaking a string into its individual characters. With String#split()you can do that easily by using a regex that matches nothing, e.g.:

StringTokenizer 完全不适合将字符串分解为单个字符的任务。随着String#split()您可以通过使用符合什么,例如正则表达式做到这一点很容易:

String[] theChars = str.split("|");

But StringTokenizer doesn't use regexes, and there's no delimiter string you can specify that will match the nothing between characters. There isone cute little hack you can use to accomplish the same thing: use the string itself as the delimiter string (making every character in it a delimiter) and have it return the delimiters:

但是 StringTokenizer 不使用正则表达式,并且没有您可以指定的分隔符字符串将匹配字符之间的任何内容。这里一个可爱的小砍你可以用它来完成同样的事情:使用字符串本身作为分隔符字符串(使得在它的每一个字符分隔符),并使其返回分隔符:

StringTokenizer st = new StringTokenizer(str, str, true);

However, I only mention these options for the purpose of dismissing them. Both techniques break the original string into one-character strings instead of char primitives, and both involve a great deal of overhead in the form of object creation and string manipulation. Compare that to calling charAt() in a for loop, which incurs virtually no overhead.

但是,我只是为了忽略它们而提到这些选项。这两种技术都将原始字符串分解为一个字符的字符串而不是 char 原语,并且都涉及大量的对象创建和字符串操作形式的开销。将其与在 for 循环中调用 charAt() 进行比较,后者几乎不会产生任何开销。

回答by Alan Moore

I agree that StringTokenizer is overkill here. Actually I tried out the suggestions above and took the time.

我同意 StringTokenizer 在这里是矫枉过正。实际上,我尝试了上述建议并花时间。

My test was fairly simple: create a StringBuilder with about a million characters, convert it to a String, and traverse each of them with charAt() / after converting to a char array / with a CharacterIterator a thousand times (of course making sure to do something on the string so the compiler can't optimize away the whole loop :-) ).

我的测试相当简单:创建一个包含大约一百万个字符的 StringBuilder,将其转换为字符串,然后使用 charAt() / 在转换为字符数组 / 使用 CharacterIterator 遍历每个字符一千次(当然要确保在字符串上做一些事情,这样编译器就不能优化整个循环:-))。

The result on my 2.6 GHz Powerbook (that's a mac :-) ) and JDK 1.5:

在我的 2.6 GHz Powerbook(那是 mac :-))和 JDK 1.5 上的结果:

  • Test 1: charAt + String --> 3138msec
  • Test 2: String converted to array --> 9568msec
  • Test 3: StringBuilder charAt --> 3536msec
  • Test 4: CharacterIterator and String --> 12151msec
  • 测试 1:charAt + String --> 3138 毫秒
  • 测试 2:字符串转换为数组 --> 9568 毫秒
  • 测试 3:StringBuilder charAt --> 3536 毫秒
  • 测试 4:CharacterIterator 和字符串 --> 12151 毫秒

As the results are significantly different, the most straightforward way also seems to be the fastest one. Interestingly, charAt() of a StringBuilder seems to be slightly slower than the one of String.

由于结果明显不同,最直接的方式似乎也是最快的方式。有趣的是,StringBuilder 的 charAt() 似乎比 String 的稍慢。

BTW I suggest not to use CharacterIterator as I consider its abuse of the '\uFFFF' character as "end of iteration" a really awful hack. In big projects there's always two guys that use the same kind of hack for two different purposes and the code crashes really mysteriously.

顺便说一句,我建议不要使用 CharacterIterator,因为我认为它滥用 '\uFFFF' 字符作为“迭代结束”是一个非常糟糕的黑客攻击。在大型项目中,总是有两个人为了两个不同的目的使用同一种 hack,而代码崩溃真的很神秘。

Here's one of the tests:

这是其中一项测试:

    int count = 1000;
    ...

    System.out.println("Test 1: charAt + String");
    long t = System.currentTimeMillis();
    int sum=0;
    for (int i=0; i<count; i++) {
        int len = str.length();
        for (int j=0; j<len; j++) {
            if (str.charAt(j) == 'b')
                sum = sum + 1;
        }
    }
    t = System.currentTimeMillis()-t;
    System.out.println("result: "+ sum + " after " + t + "msec");

回答by sk.

Note most of the other techniques described here break down if you're dealing with characters outside of the BMP (Unicode Basic Multilingual Plane), i.e. code pointsthat are outside of the u0000-uFFFF range. This will only happen rarely, since the code points outside this are mostly assigned to dead languages. But there are some useful characters outside this, for example some code points used for mathematical notation, and some used to encode proper names in Chinese.

请注意,如果您正在处理 BMP(Unicode基本多语言平面)之外的字符,即u0000-uFFFF 范围之外的代码点,则此处描述的大多数其他技术都会失效。这只会很少发生,因为在此之外的代码点大多分配给死语言。但是除此之外还有一些有用的字符,例如一些用于数学符号的代码点,一些用于对中文专有名称进行编码。

In that case your code will be:

在这种情况下,您的代码将是:

String str = "....";
int offset = 0, strLen = str.length();
while (offset < strLen) {
  int curChar = str.codePointAt(offset);
  offset += Character.charCount(curChar);
  // do something with curChar
}

The Character.charCount(int)method requires Java 5+.

Character.charCount(int)方法需要 Java 5+。

Source: http://mindprod.com/jgloss/codepoint.html

来源:http: //mindprod.com/jgloss/codepoint.html

回答by Touko

If you have Guavaon your classpath, the following is a pretty readable alternative. Guava even has a fairly sensible custom List implementation for this case, so this shouldn't be inefficient.

如果您的类路径上有Guava,以下是一个非常易读的替代方案。在这种情况下,番石榴甚至有一个相当合理的自定义 List 实现,所以这不应该是低效的。

for(char c : Lists.charactersOf(yourString)) {
    // Do whatever you want     
}

UPDATE: As @Alex noted, with Java 8 there's also CharSequence#charsto use. Even the type is IntStream, so it can be mapped to chars like:

更新:正如@Alex 所指出的,Java 8 也CharSequence#chars可以使用。甚至类型是 IntStream,因此它可以映射到字符,例如:

yourString.chars()
        .mapToObj(c -> Character.valueOf((char) c))
        .forEach(c -> System.out.println(c)); // Or whatever you want

回答by Alex

If you need to iterate through the code points of a String(see this answer) a shorter / more readable way is to use the CharSequence#codePointsmethod added in Java 8:

如果您需要遍历 a 的代码点String(请参阅此答案),则更短/更易读的方法是使用CharSequence#codePointsJava 8 中添加的方法:

for(int c : string.codePoints().toArray()){
    ...
}

or using the stream directly instead of a for loop:

或直接使用流而不是 for 循环:

string.codePoints().forEach(c -> ...);

There is also CharSequence#charsif you want a stream of the characters (although it is an IntStream, since there is no CharStream).

还有CharSequence#chars,如果你想要的字符流(虽然它是IntStream,因为没有CharStream)。