javascript 使用 RegEx 在段落标记之间提取文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14969810/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 23:02:45  来源:igfitidea点击:

Extract text between paragraph tag using RegEx

javascriptregexnode.jsexpress

提问by tonymx227

I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...

我尝试在 JavaScript 中使用 RegExp 提取段落标记之间的文本。但它不起作用...

My pattern:

我的模式:

<p>(.*?)</p>

Subject:

主题:

<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>

Result :

结果 :

My content

What I want:

我想要的是:

My content. Second sentence.

回答by Explosion Pills

There is no "capture all group matches" (analogous to PHP's preg_match_all) in JavaScript, but you can cheat by using .replace:

preg_match_allJavaScript 中没有“捕获所有组匹配”(类似于 PHP 的),但您可以使用.replace以下方法作弊:

var matches = [];
html.replace(/<p>(.*?)<\/p>/g, function () {
    //arguments[0] is the entire match
    matches.push(arguments[1]);
});

回答by MikeM

To get more than one match of a pattern the global flag gis added.
The matchmethod ignores capture groups ()when matching globally, but the execmethod does not. See MDN exec.

要获得一个模式的多个匹配项,需要g添加全局标志。
match方法()在全局匹配时会忽略捕获组,但该exec方法不会。参见MDN 执行

var m,
    rex = /<p>(.*?)<\/p>/g,
    str = '<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>';

while ( ( m = rex.exec( str ) ) != null ) {
    console.log( m[1] );
}

//  My content. 
//  Second sentence. 

If there may be newlines between the paragraphs, use [\s\S], meaning match any space or non-space character, instead of ..

如果段落之间可能有换行符,请使用[\s\S],意思是匹配任何空格或非空格字符,而不是.

Note that this kind of regex will fail on nested paragraphs as it will match up to the first closing tag.

请注意,这种正则表达式将在嵌套段落上失败,因为它将匹配第一个结束标记。