java 正则表达式匹配一个句子
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5553410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regular expression match a sentence
提问by Tapas Bose
How can I match a sentence of the form "Hello world" or "Hello World". The sentence may contain "- / digit 0-9". Any information will be very helpful to me. Thank you.
如何匹配“Hello world”或“Hello World”形式的句子。句子可能包含“- / digit 0-9”。任何信息都会对我很有帮助。谢谢你。
回答by ridgerunner
This one will do a pretty good job. My definition of a sentence: A sentence begins with a non-whitespace and ends with a period, exclamation point or a question mark (or end of string). There may be a closing quote following the ending punctuation.
这个会做得很好。我对句子的定义:句子以非空格开头,以句号、感叹号或问号(或字符串结尾)结尾。在结束标点之后可能有一个结束语。
[^.!?\s][^.!?]*(?:[.!?](?!['"]?\s|$)[^.!?]*)*[.!?]?['"]?(?=\s|$)
[^.!?\s][^.!?]*(?:[.!?](?!['"]?\s|$)[^.!?]*)*[.!?]?['"]?(?=\s|$)
import java.util.regex.*;
public class TEST {
public static void main(String[] args) {
String subjectString =
"This is a sentence. " +
"So is \"this\"! And is \"this?\" " +
"This is 'stackoverflow.com!' " +
"Hello World";
String[] sentences = null;
Pattern re = Pattern.compile(
"# Match a sentence ending in punctuation or EOS.\n" +
"[^.!?\s] # First char is non-punct, non-ws\n" +
"[^.!?]* # Greedily consume up to punctuation.\n" +
"(?: # Group for unrolling the loop.\n" +
" [.!?] # (special) inner punctuation ok if\n" +
" (?!['\"]?\s|$) # not followed by ws or EOS.\n" +
" [^.!?]* # Greedily consume up to punctuation.\n" +
")* # Zero or more (special normal*)\n" +
"[.!?]? # Optional ending punctuation.\n" +
"['\"]? # Optional closing quote.\n" +
"(?=\s|$)",
Pattern.MULTILINE | Pattern.COMMENTS);
Matcher reMatcher = re.matcher(subjectString);
while (reMatcher.find()) {
System.out.println(reMatcher.group());
}
}
}
Here is the output:
这是输出:
This is a sentence.
So is "this"!
And is "this?"
This is 'stackoverflow.com!'
Hello World
This is a sentence.
So is "this"!
And is "this?"
This is 'stackoverflow.com!'
Hello World
Matching all of these correctly (with the last sentence having no ending punctuation), turns out to be not so easy as it seems!
正确匹配所有这些(最后一句没有结尾标点符号),结果并不像看起来那么容易!
回答by krookedking
If by sentence you mean something that ends with a punctuation mark try this : (.*?)[.?!]
如果你的句子意思是以标点符号结尾的,试试这个: (.*?)[.?!]
Explanation :
解释 :
.*
matches any string. Adding a?
makes it non-greedy matching (matches the smallest string possible)[.?!]
matches any of the three punctuation marks
.*
匹配任何字符串。添加 a?
使其非贪婪匹配(匹配可能的最小字符串)[.?!]
匹配三个标点符号中的任何一个