javascript 使用 RegEx 在段落标记之间提取文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14969810/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract text between paragraph tag using RegEx
提问by tonymx227
I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...
我尝试在 JavaScript 中使用 RegExp 提取段落标记之间的文本。但它不起作用...
My pattern:
我的模式:
<p>(.*?)</p>
Subject:
主题:
<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>
Result :
结果 :
My content
What I want:
我想要的是:
My content. Second sentence.
回答by Explosion Pills
There is no "capture all group matches" (analogous to PHP's preg_match_all
) in JavaScript, but you can cheat by using .replace
:
preg_match_all
JavaScript 中没有“捕获所有组匹配”(类似于 PHP 的),但您可以使用.replace
以下方法作弊:
var matches = [];
html.replace(/<p>(.*?)<\/p>/g, function () {
//arguments[0] is the entire match
matches.push(arguments[1]);
});
回答by MikeM
To get more than one match of a pattern the global flag g
is added.
The match
method ignores capture groups ()
when matching globally, but the exec
method does not. See MDN exec.
要获得一个模式的多个匹配项,需要g
添加全局标志。
该match
方法()
在全局匹配时会忽略捕获组,但该exec
方法不会。参见MDN 执行。
var m,
rex = /<p>(.*?)<\/p>/g,
str = '<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>';
while ( ( m = rex.exec( str ) ) != null ) {
console.log( m[1] );
}
// My content.
// Second sentence.
If there may be newlines between the paragraphs, use [\s\S]
, meaning match any space or non-space character, instead of .
.
如果段落之间可能有换行符,请使用[\s\S]
,意思是匹配任何空格或非空格字符,而不是.
。
Note that this kind of regex will fail on nested paragraphs as it will match up to the first closing tag.
请注意,这种正则表达式将在嵌套段落上失败,因为它将匹配第一个结束标记。