当提供正则表达式时,Java 中的 String.split() 方法究竟是如何工作的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22259733/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 14:35:14  来源:igfitidea点击:

How exactly does String.split() method in Java work when regex is provided?

javaregexsplitocpjp

提问by peterremec

I'm preparing for OCPJP exam and I ran into the following example:

我正在准备 OCPJP 考试,我遇到了以下示例:

class Test {
   public static void main(String args[]) {
      String test = "I am preparing for OCPJP";
      String[] tokens = test.split("\S");
      System.out.println(tokens.length);
   }
}

This code prints 16. I was expecting something like no_of_characters + 1. Can someone explain me, what does the split() method actually do in this case? I just don't get it...

此代码打印 16。我期待类似 no_of_characters + 1 的结果。有人可以解释一下,在这种情况下 split() 方法实际上做了什么?我就是不明白...

采纳答案by Pshemo

It splits on every "\\S"which in regex engine represents \Snon-whitespace character.

"\\S"在正则表达式引擎中代表\S非空白字符的每个字符上进行拆分。

So lets try to split "x x"on non-whitespace (\S). Since this regex can be matched by one character lets iterate over them to mark places of split (we will use pipe |for that).

因此,让我们尝试"x x"在非空格 ( \S)上进行拆分。由于此正则表达式可以与一个字符匹配,因此我们可以遍历它们以标记拆分的位置(|为此我们将使用管道)。

  • is 'x'non-whitespace? YES, so lets mark it | x
  • is ' 'non-whitespace? NO, so we leave it as is
  • is last 'x'non-whitespace? YES, so lets mark it | |
  • 'x'一个非空白?是的,所以让我们标记它| x
  • ' '一个非空白?不,所以我们保持原样
  • 最后一个'x'非空格?是的,所以让我们标记它| |

So as result we need to split our string at start and at end which initially gives us result array

因此,我们需要在开始和结束时拆分字符串,这最初为我们提供结果数组

["", " ", ""]
   ^    ^ - here we split

But since trailing empty stringsare removed, result would be

但由于尾随空字符串被删除,结果将是

[""," "]     <- result
        ,""] <- removed trailing empty string

so split returns array ["", " "]which contains only two elements.

所以 split 返回["", " "]只包含两个元素的数组。

BTW. To turn off removing last empty strings you need to use split(regex,limit)with negative value of limit like split("\\S",-1).

顺便提一句。要关闭删除最后一个空字符串,您需要使用split(regex,limit)limit 的负值,例如split("\\S",-1).



Now lets get back to your example. In case of your data you are splitting on each of

现在让我们回到你的例子。如果您的数据正在拆分每个

I am preparing for OCPJP
| || ||||||||| ||| |||||

which means

意思是

 ""|" "|""|" "|""|""|""|""|""|""|""|""|" "|""|""|" "|""|""|""|""|""

So this represents this array

所以这代表这个数组

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  

but since trailing empty strings ""are removed (if their existence was caused by split - more info at: Confusing output from String.split)

但由于尾随空字符串""被删除(如果它们的存在是由 split 引起的 - 更多信息:来自 String.split 的混淆输出

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  
                                                     ^^ ^^ ^^ ^^ ^^

you are getting as result array which contains only this part:

你得到的结果数组只包含这部分:

[""," ",""," ","","","","","","","",""," ","",""," "]  

which are exactly 16 elements.

正好是 16 个元素。