仅用于字母字符的正则表达式 - Java
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36851740/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex for just alphabetic characters only - Java
提问by David Moya
Sorry I am new to Regex, but I can't seem to achieve the following with any regex I have tried so far.
抱歉,我是 Regex 的新手,但到目前为止我尝试过的任何 regex 似乎都无法实现以下目标。
We are interested in "words" (i.e. the word is wholly alphabetic containing only letters of the alphabet in upper, lower or mixed case. ALL other content is ignored)
我们对“单词”感兴趣(即单词是完全字母的,仅包含大写、小写或混合大小写的字母。所有其他内容都被忽略)
An example String which I have trying to work with is as follows:
我尝试使用的示例字符串如下:
To find the golden ticket you have to buy a bar of chocolate :) Charlie's Granny and Grandad are hoping he gets a ticket but he only has enough money to buy 1 bar. I printed 5 tickets but my Oompa-Loompa workers made more than 1000000 bars :)
要找到金票,您必须购买一块巧克力 :) Charlie 的奶奶和爷爷希望他能拿到一张票,但他的钱只够买 1 块巧克力。我打印了 5 张票,但我的 Oompa-Loompa 工人制作了超过 1000000 条 :)
So words like Charlie's, Oompa-Loompa and the smiley face should not be included in the output. Just the wholly alphabetic words.
所以像 Charlie's、Oompa-Loompa 和笑脸这样的词不应包含在输出中。只是完全字母的单词。
I have tried using some of the examples from other questions such as this one hereattempting to use Regex's such as ^[a-zA-Z]+('[a-zA-Z]+)?$but unfortunately as I stated previously, I am new to Regex so I'm not too sure what I am doing. Any help would be appreciated.
我曾尝试使用其他问题中的一些示例,例如此处尝试使用正则表达式的一些示例,例如^[a-zA-Z]+('[a-zA-Z]+)?$但不幸的是,正如我之前所说的,我是 Regex 的新手,所以我不太确定我在做什么。任何帮助,将不胜感激。
回答by Ro Yo Mi
Description
描述
This regex will do the following:
此正则表达式将执行以下操作:
- Assume words are entirely made up of alphabetical characters A-Z, upper case and lower case
- Find all words
- Ignore all strings that contain non-alphabetical characters or symbols
- Assumes some punctuation like periods or commas are to be ignored but the preceding word should be captured.
- 假设单词完全由字母字符AZ、大写和小写组成
- 查找所有单词
- 忽略所有包含非字母字符或符号的字符串
- 假设一些标点符号如句号或逗号将被忽略,但应捕获前面的单词。
The Regex
正则表达式
(?<=\s|^)[a-zA-Z]*(?=[.,;:]?\s|$)
Explanation
解释
NODE EXPLANATION
----------------------------------------------------------------------
(?<= look behind to see if there is:
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
^ start of the string
----------------------------------------------------------------------
) end of look-behind
----------------------------------------------------------------------
[a-zA-Z]* any character of: 'a' to 'z', 'A' to 'Z'
(0 or more times (matching the most amount
possible))
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[.,;:]? any character of: '.', ',', ';', ':'
(optional (matching the most amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
Examples
例子
Online Regex demo
在线正则表达式演示
Sample Java Code
示例 Java 代码
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "source string to match with pattern";
Pattern re = Pattern.compile("(?<=\s|^)[a-zA-Z]*(?=[.,;:]?\s|$)");
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}
Sample Captures
样本捕获
$matches Array:
(
[0] => Array
(
[0] => To
[1] => find
[2] => the
[3] => golden
[4] => ticket
[5] => you
[6] => have
[7] => to
[8] => buy
[9] => a
[10] => bar
[11] => of
[12] => chocolate
[13] => Granny
[14] => and
[15] => Grandad
[16] => are
[17] => hoping
[18] => he
[19] => gets
[20] => a
[21] => ticket
[22] => but
[23] => he
[24] => only
[25] => has
[26] => enough
[27] => money
[28] => to
[29] => buy
[30] => bar
[31] => I
[32] => printed
[33] => tickets
[34] => but
[35] => my
[36] => workers
[37] => made
[38] => more
[39] => than
[40] => bars
)
)
回答by Laurel
You can use:
您可以使用:
words.split("[ ]+");
Then for each string in that array the following will be true
if it meets your criteria:
然后对于该数组中的每个字符串,true
如果它符合您的条件,则如下所示:
str.matches("[a-zA-Z]+");