Java:找到大写字母时拆分字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3752636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 04:24:42  来源:igfitidea点击:

Java: Split string when an uppercase letter is found

javaregexstring

提问by Guido

I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)

我认为这是一个简单的问题,但我找不到一个简单的解决方案(例如,少于 10 行代码:)

I have a Stringsuch as "thisIsMyString"and I need to convert it to a String[] {"this", "Is", "My", "String"}.

我有一个String例如"thisIsMyString",我需要将它转换为String[] {"this", "Is", "My", "String"}.

Please notice the first letter is not uppercase.

请注意第一个字母不是大写。

采纳答案by axtavt

You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:

您可以使用零宽度正前瞻的正则表达式 - 它找到大写字母但不将它们包含在分隔符中:

String s = "thisIsMyString";
String[] r = s.split("(?=\p{Upper})");

Y(?=X)matches Yfollowed by X, but doesn't include Xinto match. So (?=\\p{Upper})matches an empty sequence followed by a uppercase letter, and splituses it as a delimiter.

Y(?=X)匹配Y后跟X,但不包含X在匹配中。So(?=\\p{Upper})匹配后跟大写字母的空序列,并将split其用作分隔符。

See javadocfor more info on Java regexp syntax.

有关Java regexp 语法的更多信息,请参阅javadoc

EDIT:By the way, it doesn't work with thisIsMyüberString. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:

编辑:顺便说一下,它不适用于thisIsMyüberString. 对于非 ASCII 大写字母,您需要一个 Unicode 大写字符类而不是 POSIX 类:

String[] r = s.split("(?=\p{Lu})");

回答by Bozho

String[] camelCaseWords = s.split("(?=[A-Z])");

回答by RoToRa

Since String::splittakes a regular expression you can use a look-ahead:

由于String::split采用正则表达式,您可以使用前瞻:

String[] x = "thisIsMyString".split("(?=[A-Z])");

回答by Spigolo Vivo

Try this;

尝试这个;

static Pattern p = Pattern.compile("(?=\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");

...

回答by Mulder

For anyone that wonders how the Pattern is when the String to split might start with an upper case character:

对于任何想知道当要拆分的字符串可能以大写字符开头时模式如何的人:

String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\p{Lu})");
System.out.println(Arrays.toString(r));

gives: [This, Is, My, String]

给出:[This, Is, My, String]

回答by The Shoe Shiner

This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.

这个正则表达式将在 Caps 上拆分,省略第一个。所以它应该适用于骆驼案例和适当的案例。

(?<=.)(?=(\p{Upper}))

TestText = Test, Text
thisIsATest = this, Is, A, Test

回答by Boris

A simple scala/java suggestion that does not split at entire uppercase strings like NYC:

一个简单的 scala/java 建议,它不会像NYC那样在整个大写字符串处拆分:

def splitAtMiddleUppercase(token: String): Iterator[String] = {
   val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
   regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}

test with:

测试:

val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interüber")
for( example <- examples) {
   println(example + " -> "  + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}

it produces:

它产生:

    catch22 -> [catch22]
    iPhone -> [i, Phone]
    eReplacement -> [e, Replacement]
    TotalRecall -> [Total, Recall]
    NYC -> [NYC]
    JGHSD87 -> [JGHSD87]
    interüber -> [inter, über]

Modify the regex to cut at digits too.

修改正则表达式以减少数字。