Java 在空格和特殊字符上拆分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21054524/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Split on Spaces and Special Characters
提问by Jeremiah Adams
I am trying to split a string on spaces and some specific special characters.
我试图在空格和一些特定的特殊字符上拆分字符串。
Given the string "john - & + $ ? . @ boy" I want to get the array:
鉴于字符串 "john - & + $ ? . @boy" 我想得到数组:
array[0]="john";
array[1]="boy";
I've tried several regular expressions and gotten no where. Here is my current stab:
我尝试了几个正则表达式,但无处可去。这是我目前的刺伤:
String[] terms = uglString.split("\s+|[\-\+\$\?\.@&].*");
Which preserves "john" but not "boy". Can anyone get me the rest of this?
保留“约翰”但不保留“男孩”。谁能把剩下的给我?
采纳答案by nhahtdh
Just use:
只需使用:
String[] terms = input.split("[\s@&.?$+-]+");
You can put a short-hand character class inside a character class (note the \s
), and most meta-character loses their meaning inside a character class, except for [
, ]
, -
, &
, \
. However, &
is meaningful only when comes in pair &&
, and -
is treated as literal character if put at the beginning or the end of the character class.
您可以将速记字符类放在字符类中(注意\s
),并且大多数元字符在字符类中都失去了意义,除了[
, ]
, -
, &
, \
。但是,&
只有当出现在 pair 时才有意义&&
,-
如果放在字符类的开头或结尾,则被视为文字字符。
Other languages may have different rules for parsing the pattern, but the rule about -
applies for most of the engines.
其他语言可能有不同的解析模式的规则,但规则-
适用于大多数引擎。
As @Sean Patrick Floyd mentioned in his answer, the important thing boils down to defining what constitute a word. \w
in Java is equivalent to [a-zA-Z0-9_]
(English letters upper and lower case, digits and underscore), and therefore, \W
consists of all other characters. If you want to consider Unicode letters and digits, you may want to look at Unicode character classes.
正如@Sean Patrick Floyd 在他的回答中提到的,重要的事情归结为定义一个词的构成。\w
在Java中相当于[a-zA-Z0-9_]
(英文字母大小写,数字和下划线),因此,\W
由所有其他字符组成。如果您想考虑 Unicode 字母和数字,您可能需要查看Unicode 字符类。
回答by Sean Patrick Floyd
You could make your code much easier by replacing your pattern with "\\W+"
(one or more occurrences of a non-word character. (This way you are whitelisting characters instead of blacklisting, which is usually a good idea)
您可以通过将您的模式替换为"\\W+"
(一次或多次出现非单词字符。(这样您将字符列入白名单而不是将黑名单列入黑名单,这通常是一个好主意),从而使您的代码更容易
And of Course things could be made more efficient by using Guava's Splitter
class
回答by Алексей
to add to what have been said about Splitter
, you can do something of this sort:
补充一下已经说过的内容Splitter
,您可以执行以下操作:
String str = "john - & + $ ? . @ boy";
Iterable<String> ttt = Splitter.on(Pattern.compile("\W")).trimResults().omitEmptyStrings().split(str);
回答by PopoFibo
Breaking then step by step:
打破然后一步一步:
For your case, you replace non-word chars (as pointed out). Now you might want to preserve the spaces for an easy String split.
对于您的情况,您替换非单词字符(如所指出的)。现在,您可能希望保留空格以方便字符串拆分。
String ugly = "john - & + $ ? . @ boy";
String words = ugly.replaceAll("[^\w\s]", "");
There are a lot of spaces in the resulting String which you might want to generally trim to just 1 space:
结果字符串中有很多空格,您通常可能希望将其修剪为 1 个空格:
String formatted = words.trim().replaceAll(" +", " ");
Now you can easily split the String into the words to a String Array:
现在您可以轻松地将字符串拆分为字符串数组:
String[] terms = formatted.split("\s");
System.out.println(terms[0]);
回答by Chamod Pathirana
Use this format.
使用这种格式。
String s = "john - & + $ ? . @ boy";
String reg = "[!_.',@? ]";
String[] res = s.split(reg);
here include every character that you want to split inside the [ ]
brackets.
这里包括您要在[ ]
括号内拆分的每个字符。
回答by RAHUL KOHLI
You can use something like below
您可以使用以下内容
arrayOfStringType=string.split(" |'|,|.|//+|_");
'|' will work as an or operator here.
'|' 将在这里作为 or 操作员工作。
回答by awinas kannan
Try out this.....
试试这个......
Input.replace("-&+$?.@"," ").split(" ");