Java Scanner vs. StringTokenizer vs. String.Split

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/691184/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 18:24:24  来源:igfitidea点击:

Scanner vs. StringTokenizer vs. String.Split

javajava.util.scannertokenizesplit

提问by Dave

I just learned about Java's Scanner class and now I'm wondering how it compares/competes with the StringTokenizer and String.Split. I know that the StringTokenizer and String.Split only work on Strings, so why would I want to use the Scanner for a String? Is Scanner just intended to be one-stop-shopping for spliting?

我刚刚了解了 Java 的 Scanner 类,现在我想知道它如何与 StringTokenizer 和 String.Split 进行比较/竞争。我知道 StringTokenizer 和 String.Split 只适用于字符串,那么为什么我要使用字符串扫描器呢?Scanner 只是想成为拆分的一站式购物点吗?

采纳答案by Neil Coffey

They're essentially horses for courses.

他们本质上是课程的马。

  • Scanneris designed for cases where you need to parse a string, pulling out data of different types. It's very flexible, but arguably doesn't give you the simplest API for simply getting an array of strings delimited by a particular expression.
  • String.split()and Pattern.split()give you an easy syntax for doing the latter, but that's essentially all that they do. If you want to parse the resulting strings, or change the delimiter halfway through depending on a particular token, they won't help you with that.
  • StringTokenizeris even more restrictive than String.split(), and also a bit fiddlier to use. It is essentially designed for pulling out tokens delimited by fixed substrings. Because of this restriction, it's about twice as fast as String.split(). (See my comparison of String.split()and StringTokenizer.) It also predates the regular expressions API, of which String.split()is a part.
  • Scanner专为需要解析字符串、提取不同类型数据的情况而设计。它非常灵活,但可以说并没有为您提供最简单的 API 来简单地获取由特定表达式分隔的字符串数组。
  • String.split()Pattern.split()为您提供执行后者的简单语法,但这基本上就是他们所做的一切。如果您想解析生成的字符串,或者根据特定标记在中途更改分隔符,它们将无济于事。
  • StringTokenizer甚至比 更严格String.split(),而且使用起来也有点麻烦。它本质上是为提取由固定子串分隔的令牌而设计的。由于这个限制,它的速度大约是String.split(). (参见我String.split()和 的比较StringTokenizer。)它还早于正则表达式 API,它String.split()是其中的一部分。

You'll note from my timings that String.split()can still tokenize thousands of strings in a few millisecondson a typical machine. In addition, it has the advantage over StringTokenizerthat it gives you the output as a string array, which is usually what you want. Using an Enumeration, as provided by StringTokenizer, is too "syntactically fussy" most of the time. From this point of view, StringTokenizeris a bit of a waste of space nowadays, and you may as well just use String.split().

您会从我的计时中注意到,在一台典型的机器上String.split()仍然可以在几毫秒内数千个字符串进行标记。此外,它的优势StringTokenizer在于它以字符串数组的形式为您提供输出,这通常是您想要的。大多数情况下,使用Enumeration提供的StringTokenizer是“语法上的挑剔”。从这个角度来说,StringTokenizer现在有点浪费空间,你还不如直接使用String.split().

回答by Bill the Lizard

If you have a String object you want to tokenize, favor using String's splitmethod over a StringTokenizer. If you're parsing text data from a source outside your program, like from a file, or from the user, that's where a Scanner comes in handy.

如果您有一个想要标记的 String 对象,请优先使用 String 的split方法而不是 StringTokenizer。如果您要从程序外部的源(如文件或用户)解析文本数据,那么扫描仪就派上用场了。

回答by H Marcelo Morales

StringTokenizer was always there. It is the fastest of all, but the enumeration-like idiom might not look as elegant as the others.

StringTokenizer 一直都在。它是最快的,但类似枚举的习语可能看起来不像其他习语那么优雅。

split came to existence on JDK 1.4. Slower than tokenizer but easier to use, since it is callable from the String class.

split 在 JDK 1.4 上出现。比分词器慢但更容易使用,因为它可以从 String 类调用。

Scanner came to be on JDK 1.5. It is the most flexible and fills a long standing gap on the Java API to support an equivalent of the famous Cs scanf function family.

Scanner 出现在 JDK 1.5 上。它是最灵活的,填补了 Java API 长期以来的空白,以支持著名的 Cs scanf 函数系列。

回答by Michael Myers

Let's start by eliminating StringTokenizer. It is getting old and doesn't even support regular expressions. Its documentation states:

让我们从消除StringTokenizer. 它变老了,甚至不支持正则表达式。它的文档指出:

StringTokenizeris a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the splitmethod of Stringor the java.util.regexpackage instead.

StringTokenizer是出于兼容性原因保留的遗留类,尽管不鼓励在新代码中使用它。建议任何寻求此功能的人使用或包的split方法。Stringjava.util.regex

So let's throw it out right away. That leaves split()and Scanner. What's the difference between them?

所以让我们马上把它扔掉。那离开split()Scanner。它们之间有什么区别?

For one thing, split()simply returns an array, which makes it easy to use a foreach loop:

一方面,split()简单地返回一个数组,这使得使用 foreach 循环变得容易:

for (String token : input.split("\s+") { ... }

Scanneris built more like a stream:

Scanner更像是一个流:

while (myScanner.hasNext()) {
    String token = myScanner.next();
    ...
}

or

或者

while (myScanner.hasNextDouble()) {
    double token = myScanner.nextDouble();
    ...
}

(It has a rather large API, so don't think that it's always restricted to such simple things.)

(它有一个相当大的 API,所以不要认为它总是局限于这么简单的事情。)

This stream-style interface can be useful for parsing simple text files or console input, when you don't have (or can't get) all the input before starting to parse.

当您在开始解析之前没有(或无法获得)所有输入时,此流样式界面对于解析简单文本文件或控制台输入非常有用。

Personally, the only time I can remember using Scanneris for school projects, when I had to get user input from the command line. It makes that sort of operation easy. But if I have a Stringthat I want to split up, it's almost a no-brainer to go with split().

就我个人而言,我唯一记得使用的Scanner是学校项目,当时我必须从命令行获取用户输入。它使这种操作变得容易。但是,如果我有一个String想要拆分的对象,那么使用split().

回答by Manish

String.split seems to be much slower than StringTokenizer. The only advantage with split is that you get an array of the tokens. Also you can use any regular expressions in split. org.apache.commons.lang.StringUtils has a split method which works much more faster than any of two viz. StringTokenizer or String.split. But the CPU utilization for all the three is nearly the same. So we also need a method which is less CPU intensive, which I am still not able to find.

String.split 似乎比 StringTokenizer 慢得多。split 的唯一优势是您可以获得一组令牌。您也可以在 split 中使用任何正则表达式。org.apache.commons.lang.StringUtils 有一个 split 方法,它比两个 viz 中的任何一个都快得多。StringTokenizer 或 String.split。但是这三者的 CPU 利用率几乎相同。所以我们还需要一种 CPU 密集程度较低的方法,我仍然找不到。

回答by pdeva

I recently did some experiments about the bad performance of String.split() in highly performance sensitive situations. You may find this useful.

我最近做了一些关于 String.split() 在高性能敏感情况下性能不佳的实验。您可能会发现这很有用。

http://eblog.chrononsystems.com/hidden-evils-of-javas-stringsplit-and-stringr

http://eblog.chrononsystems.com/hidden-evils-of-javas-stringsplit-and-stringr

The gist is that String.split() compiles a Regular Expression pattern each time and can thus slow down your program, compared to if you use a precompiled Pattern object and use it directly to operate on a String.

要点是 String.split() 每次都会编译一个正则表达式模式,因此与使用预编译的 Pattern 对象并直接使用它来操作 String 相比,可能会减慢程序的速度。

回答by Hugh Perkins

Split is slow, but not as slow as Scanner. StringTokenizer is faster than split. However, I found that I could obtain double the speed, by trading some flexibility, to get a speed-boost, which I did at JFastParser https://github.com/hughperkins/jfastparser

拆分很慢,但不如扫描仪慢。StringTokenizer 比拆分更快。然而,我发现我可以通过交换一些灵活性来获得双倍的速度,以获得速度提升,我在 JFastParser https://github.com/hughperkins/jfastparser

Testing on a string containing one million doubles:

测试包含一百万个双精度的字符串:

Scanner: 10642 ms
Split: 715 ms
StringTokenizer: 544ms
JFastParser: 290ms

回答by Mujahid shaik

String.split() works very good but has its own boundaries, like if you wanted to split a string as shown below based on single or double pipe (|) symbol, it doesn't work. In this situation you can use StringTokenizer.

String.split() 工作得很好,但有自己的边界,就像如果你想根据单管道或双管道 (|) 符号分割一个字符串,如下所示,它不起作用。在这种情况下,您可以使用 StringTokenizer。

ABC|IJK

ABC|IJK

回答by Simon

For the default scenarios I would suggest Pattern.split() as well but if you need maximum performance (especially on Android all solutions I tested are quite slow) and you only need to split by a single char, I now use my own method:

对于默认情况,我也建议使用 Pattern.split() 但如果您需要最大性能(特别是在 Android 上我测试的所有解决方案都非常慢)并且您只需要按单个字符拆分,我现在使用我自己的方法:

public static ArrayList<String> splitBySingleChar(final char[] s,
        final char splitChar) {
    final ArrayList<String> result = new ArrayList<String>();
    final int length = s.length;
    int offset = 0;
    int count = 0;
    for (int i = 0; i < length; i++) {
        if (s[i] == splitChar) {
            if (count > 0) {
                result.add(new String(s, offset, count));
            }
            offset = i + 1;
            count = 0;
        } else {
            count++;
        }
    }
    if (count > 0) {
        result.add(new String(s, offset, count));
    }
    return result;
}

Use "abc".toCharArray() to get the char array for a String. For example:

使用 "abc".toCharArray() 获取字符串的字符数组。例如:

String s = "     a bb   ccc  dddd eeeee  ffffff    ggggggg ";
ArrayList<String> result = splitBySingleChar(s.toCharArray(), ' ');

回答by John29

One important difference is that both String.split() and Scanner can produce empty strings but StringTokenizer never does it.

一个重要的区别是 String.split() 和 Scanner 都可以生成空字符串,但 StringTokenizer 从不这样做。

For example:

例如:

String str = "ab cd  ef";

StringTokenizer st = new StringTokenizer(str, " ");
for (int i = 0; st.hasMoreTokens(); i++) System.out.println("#" + i + ": " + st.nextToken());

String[] split = str.split(" ");
for (int i = 0; i < split.length; i++) System.out.println("#" + i + ": " + split[i]);

Scanner sc = new Scanner(str).useDelimiter(" ");
for (int i = 0; sc.hasNext(); i++) System.out.println("#" + i + ": " + sc.next());

Output:

输出:

//StringTokenizer
#0: ab
#1: cd
#2: ef
//String.split()
#0: ab
#1: cd
#2: 
#3: ef
//Scanner
#0: ab
#1: cd
#2: 
#3: ef

This is because the delimiter for String.split() and Scanner.useDelimiter() is not just a string, but a regular expression. We can replace the delimiter " " with " +" in the example above to make them behave like StringTokenizer.

这是因为 String.split() 和 Scanner.useDelimiter() 的分隔符不仅仅是一个字符串,而是一个正则表达式。我们可以在上面的例子中用“+”替换分隔符“”,使它们表现得像 StringTokenizer。