仅用于字母字符的正则表达式 - Java

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36851740/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 01:56:28  来源:igfitidea点击:

Regex for just alphabetic characters only - Java

javaregex

提问by David Moya

Sorry I am new to Regex, but I can't seem to achieve the following with any regex I have tried so far.

抱歉,我是 Regex 的新手,但到目前为止我尝试过的任何 regex 似乎都无法实现以下目标。

We are interested in "words" (i.e. the word is wholly alphabetic containing only letters of the alphabet in upper, lower or mixed case. ALL other content is ignored)

我们对“单词”感兴趣(即单词是完全字母的,仅包含大写、小写或混合大小写的字母。所有其他内容都被忽略)

An example String which I have trying to work with is as follows:

我尝试使用的示例字符串如下:

To find the golden ticket you have to buy a bar of chocolate :) Charlie's Granny and Grandad are hoping he gets a ticket but he only has enough money to buy 1 bar. I printed 5 tickets but my Oompa-Loompa workers made more than 1000000 bars :)

要找到金票,您必须购买一块巧克力 :) Charlie 的奶奶和爷爷希望他能拿到一张票,但他的钱只够买 1 块巧克力。我打印了 5 张票,但我的 Oompa-Loompa 工人制作了超过 1000000 条 :)

So words like Charlie's, Oompa-Loompa and the smiley face should not be included in the output. Just the wholly alphabetic words.

所以像 Charlie's、Oompa-Loompa 和笑脸这样的词不应包含在输出中。只是完全字母的单词。

I have tried using some of the examples from other questions such as this one hereattempting to use Regex's such as ^[a-zA-Z]+('[a-zA-Z]+)?$but unfortunately as I stated previously, I am new to Regex so I'm not too sure what I am doing. Any help would be appreciated.

我曾尝试使用其他问题中的一些示例,例如此处尝试使用正则表达式的一些示例,例如^[a-zA-Z]+('[a-zA-Z]+)?$但不幸的是,正如我之前所说的,我是 Regex 的新手,所以我不太确定我在做什么。任何帮助,将不胜感激。

回答by Ro Yo Mi

Description

描述

This regex will do the following:

此正则表达式将执行以下操作:

  • Assume words are entirely made up of alphabetical characters A-Z, upper case and lower case
  • Find all words
  • Ignore all strings that contain non-alphabetical characters or symbols
  • Assumes some punctuation like periods or commas are to be ignored but the preceding word should be captured.
  • 假设单词完全由字母字符AZ、大写和小写组成
  • 查找所有单词
  • 忽略所有包含非字母字符或符号的字符串
  • 假设一些标点符号如句号或逗号将被忽略,但应捕获前面的单词。

The Regex

正则表达式

(?<=\s|^)[a-zA-Z]*(?=[.,;:]?\s|$)

Regular expression visualization

正则表达式可视化

Explanation

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  (?<=                     look behind to see if there is:
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
   ^                         start of the string
----------------------------------------------------------------------
  )                        end of look-behind
----------------------------------------------------------------------
  [a-zA-Z]*                any character of: 'a' to 'z', 'A' to 'Z'
                           (0 or more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    [.,;:]?                  any character of: '.', ',', ';', ':'
                             (optional (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------

Examples

例子

Online Regex demo

在线正则表达式演示

http://fiddle.re/65eqna

http://fiddle.re/65eqna

Sample Java Code

示例 Java 代码

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("(?<=\s|^)[a-zA-Z]*(?=[.,;:]?\s|$)");
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

Sample Captures

样本捕获

$matches Array:
(
    [0] => Array
        (
            [0] => To
            [1] => find
            [2] => the
            [3] => golden
            [4] => ticket
            [5] => you
            [6] => have
            [7] => to
            [8] => buy
            [9] => a
            [10] => bar
            [11] => of
            [12] => chocolate
            [13] => Granny
            [14] => and
            [15] => Grandad
            [16] => are
            [17] => hoping
            [18] => he
            [19] => gets
            [20] => a
            [21] => ticket
            [22] => but
            [23] => he
            [24] => only
            [25] => has
            [26] => enough
            [27] => money
            [28] => to
            [29] => buy
            [30] => bar
            [31] => I
            [32] => printed
            [33] => tickets
            [34] => but
            [35] => my
            [36] => workers
            [37] => made
            [38] => more
            [39] => than
            [40] => bars
        )

)

回答by Laurel

You can use:

您可以使用:

words.split("[ ]+");

Then for each string in that array the following will be trueif it meets your criteria:

然后对于该数组中的每个字符串,true如果它符合您的条件,则如下所示:

str.matches("[a-zA-Z]+");