Java 构建正则表达式模式以匹配句子

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20320719/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 01:03:45  来源:igfitidea点击:

Constructing regex pattern to match sentence

javaregex

提问by user1923

I'm trying to write a regex pattern that will match any sentence that begins with multiple or one tab and/or whitespace. For example, I want my regex pattern to be able to match " hello there I like regex!" but so I'm scratching my head on how to match words after "hello". So far I have this:

我正在尝试编写一个正则表达式模式,该模式将匹配以多个或一个制表符和/或空格开头的任何句子。例如,我希望我的正则表达式模式能够匹配“你好,我喜欢正则表达式!” 但是所以我在如何匹配“你好”之后的单词上摸不着头脑。到目前为止,我有这个:

    String REGEX = "(?s)(\p{Blank}+)([a-z][ ])*";
    Pattern PATTERN = Pattern.compile(REGEX);
    Matcher m = PATTERN.matcher("         asdsada  adf adfah.");
    if (m.matches()) {
        System.out.println("hurray!");
    }

Any help would be appreciated. Thanks.

任何帮助,将不胜感激。谢谢。

采纳答案by Steve P.

String regex = "^\s+[A-Za-z,;'\"\s]+[.?!]$"

^means "begins with"
\\smeans white space
+means 1 or more
[A-Za-z,;'"\\s]means any letter, ,, ;, ', ", or whitespace character
$means "ends with"

^表示“以”开头
\\s表示空格
+表示 1 个或多个
[A-Za-z,;'"\\s]表示任何字母,,, ;, ', ", 或 空格字符
$表示“以”结尾

回答by Taylor Hx

An example regex to match sentences by the definition: "A sentence is a series of characters, starting with at lease one whitespace character, that ends in one of ., !or ?" is as follows:

根据定义匹配句子的示例正则表达式:“句子是一系列字符,以至少一个空格字符开头,以.,!或中的一个结尾?”如下:

\s+[^.!?]*[.!?]

Regular expression visualization

正则表达式可视化

Note that newline characters will also be included in this match.

请注意,此匹配项中也将包含换行符。

回答by hwnd

Based upon what you desire and asked for, the following will work.

根据您的愿望和要求,以下将起作用。

String s  = "    hello there I like regex!";
Pattern p = Pattern.compile("^\s+[a-zA-Z\s]+[.?!]$");
Matcher m = p.matcher(s); 
if (m.matches()) {
    System.out.println("hurray!");
}

See working demo

working demo

回答by Ashish

If you looking to match all strings starting with a white space you can try using "^\s+*" regular expression.

如果您希望匹配所有以空格开头的字符串,您可以尝试使用 "^\s+*" 正则表达式。

This tool could help you to test your regular expression efficiently.

这个工具可以帮助你有效地测试你的正则表达式。

http://www.rubular.com/

http://www.rubular.com/

回答by Eloi Montanaro

String regex = "(?<=^|(\.|!|\?) |\n|\t|\r|\r\n) *\(?[A-Z][^.!?]*((\.|!|\?)(?! |\n|\r|\r\n)[^.!?]*)*(\.|!|\?)(?= |\n|\r|\r\n)"

This match any sentence following the definition 'a sentence start with a capital letter and end with a dot'.

这与定义“以大写字母开头并以点结尾的句子”定义之后的任何句子相匹配。