Java string.split("\\S") 如何工作

Question

提问by Frank Brosnan

I was doing a question out of the book oracle_certified_professional_java_se_7_programmer_exams_1z0-804_and_1z0-805 by Ganesh and Sharma.

我在 Ganesh 和 Sharma 的书 oracle_certified_professional_java_se_7_programmer_exams_1z0-804_and_1z0-805 中提出了一个问题。

One question is:

一个问题是：

Consider the following program and predict the output:

  class Test {

    public static void main(String args[]) {
      String test = "I am preparing for OCPJP";
      String[] tokens = test.split("\S");
      System.out.println(tokens.length);
    }
  }

a) 0

b) 5

c) 12

d) 16

考虑以下程序并预测输出：

  class Test {

    public static void main(String args[]) {
      String test = "I am preparing for OCPJP";
      String[] tokens = test.split("\S");
      System.out.println(tokens.length);
    }
  }

一）0

b) 5

c) 12

d) 16

Now I understand that \S is a regex means treat non-space chars as the delimiters. But I was puzzled as to how the regex expression does its matching and what are the actual tokens produced by split.

现在我明白 \S 是一个正则表达式意味着将非空格字符视为分隔符。但我对正则表达式如何进行匹配以及 split 产生的实际标记感到困惑。

I added code to print out the tokens as follows

我添加了代码来打印出如下令牌

for (String str: tokens){
  System.out.println("<" + str + ">");
}

and I got the following output

我得到了以下输出

16

<>

< >

<>

< >

<>

<>

<>

<>

<>

<>

<>

<>

< >

<>

<>

< >

So a lot of empty string tokens. I just do not understand this.

所以很多空字符串标记。我只是不明白这一点。

I would have thought along the lines that if delimiters are non space chars that in the above text then all alphabetic chars serve as delimiters so maybe there should be 21 tokens if we are matching tokens that result in empty strings too. I just don't understand how Java's regex engine is working this out. Are there any regex gurus out there who can shed light on this code for me?

我本来会想，如果分隔符是上面文本中的非空格字符，那么所有字母字符都用作分隔符，所以如果我们匹配导致空字符串的标记，那么可能应该有 21 个标记。我只是不明白 Java 的正则表达式引擎是如何解决这个问题的。有没有正则表达式大师可以为我阐明这段代码？

Answer 1

采纳答案by PeterK

First things start with \s(lower case), which is a regular expression character class for white space, that is space ' ' tabs '\t', new line chars '\n' and '\r', vertical tab '\v' and a bunch of other characters.

首先从\s（小写）开始，它是空格的正则表达式字符类，即空格''制表符'\t'，换行符'\n'和'\r'，垂直制表符'\v'和一堆其他角色。

\S(upper case) is the opposite of this, so that would mean any non white space character.

\S（大写）与此相反，因此这意味着任何非空白字符。

So when you split this String "I am preparing for OCPJP" using \Syou are effectively splitting the string at every letter. The reason your token array has a length of 16.

因此，当您拆分此字符串 " I am preparing for OCPJP" 时，\S您实际上是在每个字母处拆分字符串。您的令牌数组长度为 16 的原因。

Now as for why these are empty.

现在至于为什么这些是空的。

Consider the following String: Hello,World, if we were to split that using ,, we would end up with a String array of length 2, with the following contents: Helloand World. Notice that the ,is not in either of the Strings, it has be erased.

考虑下面的 String: Hello,World，如果我们要使用拆分它,，我们最终会得到一个长度为 2 的字符串数组，其内容如下：Hello和World。请注意，,不在任何一个字符串中，它已被删除。

The same thing has happened with the I am preparing for OCPJPString, it has been split, and the points matched by your regex are not in any of the returned values. And because most of the letters in that String are followed by another letter, you end up with a load of Strings of length zero, only the white space characters are preserved.

I am preparing for OCPJP字符串也发生了同样的事情，它已被拆分，并且您的正则表达式匹配的点不在任何返回值中。并且因为该字符串中的大多数字母后跟另一个字母，所以最终会加载长度为零的字符串，仅保留空白字符。

Answer 2

回答by Pablo Lozano

Copied from the API documentation: (bold are mine)

从 API文档中复制：（粗体是我的）

public String[] split(String regex)
Splits this string around matches of the given regular expression. This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
The string "boo:and:foo", for example, yields the following results with these expressions:
 Regex  Result
   :    { "boo", "and", "foo" }
   o    { "b", "", ":and:f" }

public String[] split(String regex)
围绕给定正则表达式的匹配拆分此字符串。此方法的工作方式就像通过使用给定表达式和零限制参数调用双参数 split 方法一样。因此，结果数组中不包含尾随空字符串。
例如，字符串 "boo:and:foo" 使用这些表达式产生以下结果：
 Regex  Result
   :    { "boo", "and", "foo" }
   o    { "b", "", ":and:f" }

Check the second example, where last 2 "o" are just removed: the answer for your question is "OCPJP"substring is treated as a collection of separators which is not followed for non-empty strings, so that part is trimmed.

检查第二个示例，其中最后 2 个“o”刚刚被删除：您的问题的答案是"OCPJP"子字符串被视为非空字符串不跟随的分隔符集合，因此该部分被修剪。

Answer 3

回答by ajb

The reason the result is 16 and not 21 is this, from the javadoc for Split:

结果是 16 而不是 21 的原因是这个，来自javadoc forSplit：

Trailing empty strings are therefore not included in the resulting array.

因此，结果数组中不包含尾随空字符串。

This means, for example, that if you say

这意味着，例如，如果你说

"/abc//def/ghi///".split("/")

the result will have five elements. The first will be "", since it's not a trailing empty string; the others will be "abc", "", "def", and "ghi". But the remaining empty strings are removed from the array.

结果将有五个元素。第一个将是""，因为它不是尾随的空字符串；别人会"abc"，""，"def"，和"ghi"。但是剩余的空字符串将从数组中删除。

In the posted case:

在发布的案例中：

"I am preparing for OCPJP".split("\S")

it's the same thing. Since non-space characters are delimiters, each letter is a delimiter, butthe OCPJP letters essentially don't count, because those delimiters result in trailing empty strings that are then discarded. So, since there are 15 letters in "I am preparing for", they are treated as delimiting 16 substrings (the first is ""and the last is " ").

这是同一件事。由于非空格字符是分隔符，因此每个字母都是一个分隔符，但OCPJP 字母基本上不计算在内，因为这些分隔符会导致尾随空字符串被丢弃。因此，由于中有 15 个字母"I am preparing for"，它们被视为分隔 16 个子字符串（第一个是""，最后一个是" "）。

Java string.split("\\S") 如何工作

提问by Frank Brosnan

采纳答案by PeterK

回答by Pablo Lozano

回答by ajb

相关推荐

最近更新

标签

Java string.split("\\S") 如何工作

提问by Frank Brosnan

采纳答案by PeterK

回答by Pablo Lozano

回答by ajb

相关推荐

使用 org.json 包将 JSON 对象转换为 Java bean

Java 找不到 Media type=application/json 的 MessageBodyWriter

使用 Java 在 MongoDB 中创建集合

Java 嵌入式 Jetty 服务器 - 没有 JSP 支持 /，没有找到 org.apache.jasper.servlet.JspServlet

相关推荐

最近更新

标签