用于将文本拆分为句子并保留分隔符的 Javascript RegExp

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11761563/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 07:00:41  来源:igfitidea点击:

Javascript RegExp for splitting text into sentences and keeping the delimiter

javascriptregexsentence

提问by daktau

I am trying to use javascript's split to get the sentences out of a string but keep the delimiter eg !?.

我正在尝试使用 javascript 的拆分从字符串中获取句子,但保留分隔符,例如 !?。

So far I have

到目前为止我有

sentences = text.split(/[\.!?]/);

which works but does not include the ending punctuation for each sentence (.!?).

哪个有效,但不包括每个句子的结尾标点符号 (.!?)。

Does anyone know of a way to do this?

有谁知道这样做的方法?

回答by Larry Battle

You need to use match not split.

您需要使用匹配而不是拆分。

Try this.

尝试这个。

var str = "I like turtles. Do you? Awesome! hahaha. lol!!! What's going on????";
var result = str.match( /[^\.!\?]+[\.!\?]+/g );

var expect = ["I like turtles.", " Do you?", " Awesome!", " hahaha.", " lol!!!", " What's going on????"];
console.log( result.join(" ") === expect.join(" ") )
console.log( result.length === 6);

回答by mircealungu

The following is a small addition to Larry's answer which will match also paranthetical sentences:

以下是 Larry 的答案的一个小补充,它也将匹配附加句:

text.match(/\(?[^\.\?\!]+[\.!\?]\)?/g);

applied on:

应用于:

text = "If he's restin', I'll wake him up! (Shouts at the cage.) 
'Ello, Mister Polly Parrot! (Owner hits the cage.) There, he moved!!!"

giveth:

给:

["If he's restin', I'll wake him up!", " (Shouts at the cage.)", 
" 'Ello, Mister Polly Parrot!", " (Owner hits the cage.)", " There, he moved!!!"]

回答by rgvcorley

Try this instead:-

试试这个:-

sentences = text.split(/[\.!\?]/);

?is a special char in regular expressions so need to be escaped.

?是正则表达式中的特殊字符,因此需要进行转义。

Sorry I miss read your question - if you want to keep delimiters then you need to use matchnot splitsee this question

对不起,我想念你的问题 - 如果你想保留分隔符,那么你需要使用matchnot splitsee this question

回答by Mia Chen

A slight improvement on mircealungu's answer:

mircealungu 的回答略有改进:

string.match(/[^.?!]+[.!?]+[\])'"`'”]*/g);
  • There's no need for the opening parenthesis at the beginning.
  • Punctuation like '...', '!!!', '!?'etc. are included inside sentences.
  • Any number of square close brackets and close parentheses are included. [Edit: different closing quotation marks added]
  • 开头不需要左括号。
  • 标点符号,如'...''!!!''!?'等包括内部的句子。
  • 包括任意数量的方括号和右括号。[编辑:添加了不同的右引号]