使用分隔符“.”在 Java 中标记化问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2972199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Tokenize problem in Java with separator ". "
提问by poiuytrez
I need to split a text using the separator ". ". For example I want this string :
我需要使用 separator 拆分文本". "。例如我想要这个字符串:
Washington is the U.S Capital. Barack is living there.
To be cut into two parts:
分为两部分:
Washington is the U.S Capital.
Barack is living there.
Here is my code :
这是我的代码:
// Initialize the tokenizer
StringTokenizer tokenizer = new StringTokenizer("Washington is the U.S Capital. Barack is living there.", ". ");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
And the output is unfortunately :
不幸的是,输出是:
Washington
is
the
U
S
Capital
Barack
is
living
there
Can someone explain what's going on?
有人可以解释发生了什么吗?
回答by polygenelubricants
Don't use StringTokenizer; it's a legacy class. Use java.util.Scanneror simply String.splitinstead.
不要使用StringTokenizer; 这是一个遗留类。使用java.util.Scanneror 简单地String.split代替。
String text = "Washington is the U.S Capital. Barack is living there.";
String[] tokens = text.split("\. ");
for (String token : tokens) {
System.out.println("[" + token + "]");
}
This prints:
这打印:
[Washington is the U.S Capital]
[Barack is living there.]
Note that splitand Scannerare "regex"-based (regular expressions), and since .is a special regex "meta-character", it needs to be escaped with \. In turn, since \is itself an escape character for Java string literals, you need to write "\\. "as the delimiter.
请注意,splitandScanner是基于“regex”的(正则表达式),并且由于.是特殊的正则表达式“元字符”,因此需要使用\. 反过来,由于\它本身是 Java 字符串文字的转义字符,因此您需要编写"\\. "为分隔符。
This may sound complicated, but it really isn't. splitand Scannerare much superior to StringTokenizer, and regex isn't that hard to pick up.
这听起来可能很复杂,但实际上并非如此。split并且Scanner比 好得多StringTokenizer,并且正则表达式并不难掌握。
Regular expressions tutorials
正则表达式教程
- Java Lessons/Regular expressions
- regular-expressions.info- Very good tutorial, not Java specific
- Java 课程/正则表达式
- 正则表达式.info- 非常好的教程,不是特定于 Java 的
Related questions
相关问题
API Links
接口链接
java.util.StringTokenizerStringTokenizeris a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use thesplitmethod ofStringor thejava.util.regexpackage instead.
java.util.Scanner- A simple text scanner which can parse primitive types and strings using regular expressions.
- Java Tutorials - Basic I/O - Scanning and formatting
String[] String.split- Splits this string around matches of the given regular expression.
java.util.StringTokenizerStringTokenizer是出于兼容性原因保留的遗留类,尽管不鼓励在新代码中使用它。建议任何寻求此功能的人使用或包的split方法。Stringjava.util.regex
java.util.Scanner- 一个简单的文本扫描器,可以使用正则表达式解析原始类型和字符串。
- Java 教程 - 基本 I/O - 扫描和格式化
String[] String.split- 围绕给定正则表达式的匹配项拆分此字符串。
But what went wrong?
但是出了什么问题呢?
The problem is that StringTokenizertakes each characterin the delimiter string as individual delimiters, i.e. NOTthe entire Stringitself.
问题是,StringTokenizer需要每个字符定界符字符串作为分隔符个人,即不是整个String本身。
From the API:
从API:
StringTokenizer(String str, String delim): Constructs a string tokenizer for the specified string. The characters in thedelimargument are the delimiters for separating tokens.Delimiter characters themselves will not be treated as tokens.
StringTokenizer(String str, String delim): 为指定的字符串构造一个字符串标记器。参数中的字符delim是分隔标记的分隔符。分隔符本身不会被视为标记。
回答by krock
Your StringTokenizer constructor takes the delimiter ". " which matches dot or space as delimiters.
您的 StringTokenizer 构造函数采用与点或空格匹配的分隔符“.”作为分隔符。
回答by Jitendra
- StringTokenizer(String str) : creates StringTokenizer with specified string.
- StringTokenizer(String str, String delim) : creates StringTokenizer with specified string and delimiter.
StringTokenizer(String str, String delim, boolean returnValue) : creates StringTokenizer with specified string, delimiter and returnValue.
If a return value is true, delimiter characters are considered to be tokens. If it is false, then delimiter characters serve to separate tokens.
- StringTokenizer(String str) : 创建具有指定字符串的 StringTokenizer。
- StringTokenizer(String str, String delim) : 创建具有指定字符串和分隔符的 StringTokenizer。
StringTokenizer(String str, String delim, boolean returnValue) :使用指定的字符串、分隔符和返回值创建 StringTokenizer。
如果返回值为真,则分隔符被视为标记。如果为 false,则分隔符用于分隔标记。
回答by bdhar
Try eliminating the blank space after the dot in the delimiter. Use this instead.
尝试消除分隔符中点后的空格。改用这个。
StringTokenizer tokenizer = new StringTokenizer("Washington is the U.S Capital. Barack is living there.", ".");

