Java:找到大写字母时拆分字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3752636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java: Split string when an uppercase letter is found
提问by Guido
I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)
我认为这是一个简单的问题,但我找不到一个简单的解决方案(例如,少于 10 行代码:)
I have a String
such as "thisIsMyString"
and I need to convert it to a String[] {"this", "Is", "My", "String"}
.
我有一个String
例如"thisIsMyString"
,我需要将它转换为String[] {"this", "Is", "My", "String"}
.
Please notice the first letter is not uppercase.
请注意第一个字母不是大写。
采纳答案by axtavt
You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:
您可以使用零宽度正前瞻的正则表达式 - 它找到大写字母但不将它们包含在分隔符中:
String s = "thisIsMyString";
String[] r = s.split("(?=\p{Upper})");
Y(?=X)
matches Y
followed by X
, but doesn't include X
into match. So (?=\\p{Upper})
matches an empty sequence followed by a uppercase letter, and split
uses it as a delimiter.
Y(?=X)
匹配Y
后跟X
,但不包含X
在匹配中。So(?=\\p{Upper})
匹配后跟大写字母的空序列,并将split
其用作分隔符。
See javadocfor more info on Java regexp syntax.
有关Java regexp 语法的更多信息,请参阅javadoc。
EDIT:By the way, it doesn't work with thisIsMyüberString
. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:
编辑:顺便说一下,它不适用于thisIsMyüberString
. 对于非 ASCII 大写字母,您需要一个 Unicode 大写字符类而不是 POSIX 类:
String[] r = s.split("(?=\p{Lu})");
回答by Bozho
String[] camelCaseWords = s.split("(?=[A-Z])");
回答by RoToRa
Since String::split
takes a regular expression you can use a look-ahead:
由于String::split
采用正则表达式,您可以使用前瞻:
String[] x = "thisIsMyString".split("(?=[A-Z])");
回答by Spigolo Vivo
Try this;
尝试这个;
static Pattern p = Pattern.compile("(?=\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");
...
回答by Mulder
For anyone that wonders how the Pattern is when the String to split might start with an upper case character:
对于任何想知道当要拆分的字符串可能以大写字符开头时模式如何的人:
String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\p{Lu})");
System.out.println(Arrays.toString(r));
gives: [This, Is, My, String]
给出:[This, Is, My, String]
回答by The Shoe Shiner
This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.
这个正则表达式将在 Caps 上拆分,省略第一个。所以它应该适用于骆驼案例和适当的案例。
(?<=.)(?=(\p{Upper}))
TestText = Test, Text
thisIsATest = this, Is, A, Test
回答by Boris
A simple scala/java suggestion that does not split at entire uppercase strings like NYC:
一个简单的 scala/java 建议,它不会像NYC那样在整个大写字符串处拆分:
def splitAtMiddleUppercase(token: String): Iterator[String] = {
val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}
test with:
测试:
val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interüber")
for( example <- examples) {
println(example + " -> " + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}
it produces:
它产生:
catch22 -> [catch22]
iPhone -> [i, Phone]
eReplacement -> [e, Replacement]
TotalRecall -> [Total, Recall]
NYC -> [NYC]
JGHSD87 -> [JGHSD87]
interüber -> [inter, über]
Modify the regex to cut at digits too.
修改正则表达式以减少数字。