如何在多行上使用 JavaScript 正则表达式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1979884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-22 22:13:02  来源:igfitidea点击:

How to use JavaScript regex over multiple lines?

javascriptregex

提问by akauppi

var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre.*?<\/pre>/gm );
alert(arr);     // null

I'd want the PRE block be picked up, even though it spans over newline characters. I thought the 'm' flag does it. Does not.

我希望 PRE 块被拾取,即使它跨越换行符。我认为 'm' 标志可以做到。才不是。

Found the answer herebefore posting. SInce I thought I knew JavaScript (read three books, worked hours) and there wasn't an existing solution at SO, I'll dare to post anyways. throw stones here

在发布之前在这里找到了答案。由于我认为我了解 JavaScript(阅读三本书,工作时间)并且在 SO 上没有现有的解决方案,所以无论如何我都敢发帖。在这里扔石头

So the solution is:

所以解决办法是:

var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre[\s\S]*?<\/pre>/gm );
alert(arr);     // <pre>...</pre> :)

Does anyone have a less cryptic way?

有没有人有一个不那么神秘的方式?

Edit: thisis a duplicate but since it's harder to find than mine, I don't remove.

编辑:是一个重复,但因为它比我的更难找到,我不删除。

It proposes [^]as a "multiline dot". What I still don't understand is why [.\n]does not work. Guess this is one of the sad parts of JavaScript..

它建议[^]作为“多线点”。我仍然不明白的是为什么[.\n]不起作用。猜猜这是 JavaScript 的可悲部分之一。

采纳答案by Brian Campbell

[.\n]does not work because .has no special meaning inside of [], it just means a literal .. (.|\n)would be a way to specify "any character, including a newline". If you want to match all newlines, you would need to add \ras well to include Windows and classic Mac OS style line endings: (.|[\r\n]).

[.\n]不起作用,因为.里面没有特殊含义[],它只是一个字面意思.(.|\n)将是一种指定“任何字符,包括换行符”的方法。如果要匹配所有换行符,还需要添加\r以包含 Windows 和经典 Mac OS 样式的行结尾:(.|[\r\n]).

That turns out to be somewhat cumbersome, as well as slow, (see KrisWebDev's answer for details), so a better approach would be to match all whitespace characters and all non-whitespace characters, with [\s\S], which will match everything, and is faster and simpler.

事实证明这有点麻烦,而且速度很慢(有关详细信息,请参阅KrisWebDev 的回答),因此更好的方法是匹配所有空白字符和所有非空白字符,使用[\s\S],它将匹配所有内容,并且速度更快更简单。

In general, you shouldn't try to use a regexp to match the actual HTML tags. See, for instance, thesequestionsfor more information on why.

通常,您不应尝试使用正则表达式来匹配实际的 HTML 标签。例如,有关原因的更多信息,请参见这些问题

Instead, try actually searching the DOM for the tag you need (using jQuery makes this easier, but you can always do document.getElementsByTagName("pre")with the standard DOM), and then search the text content of those results with a regexp if you need to match against the contents.

相反,尝试在 DOM 中实际搜索您需要的标签(使用 jQuery 使这更容易,但您始终可以document.getElementsByTagName("pre")使用标准 DOM),然后如果需要匹配内容,则使用正则表达式搜索这些结果的文本内容.

回答by KrisWebDev

DON'T use (.|[\r\n])instead of .for multiline matching.

不要使用(.|[\r\n]),而不是.多行匹配。

DO use [\s\S]instead of .for multiline matching

使用[\s\S]代替.多行匹配

Also, avoid greediness where not needed by using *?or +?quantifier instead of *or +. This can have a huge performance impact.

此外,通过使用*?+?量词代替*or来避免不需要的贪婪+。这会对性能产生巨大的影响。

See the benchmark I have made: http://jsperf.com/javascript-multiline-regexp-workarounds

请参阅我所做的基准测试:http: //jsperf.com/javascript-multiline-regexp-workarounds

Using [^]: fastest
Using [\s\S]: 0.83% slower
Using (.|\r|\n): 96% slower
Using (.|[\r\n]): 96% slower

NB: You can also use [^]but it is deprecated in the below comment.

注意:您也可以使用,[^]但在以下评论中已弃用。

回答by Neek

You do not specify your environment and version of Javascript (ECMAscript), and I realise this post was from 2009, but just for completeness, with the release of ECMA2018 we can now use the sflag to cause .to match '\n', see https://stackoverflow.com/a/36006948/141801

您没有指定您的环境和 Javascript (ECMAscript) 版本,我意识到这篇文章是 2009 年的,但只是为了完整起见,随着 ECMA2018 的发布,我们现在可以使用该s标志.来匹配 '\n',参见https ://stackoverflow.com/a/36006948/141801

Thus:

因此:

let s = 'I am a string\nover several\nlines.';
console.log('String: "' + s + '".');

let r = /string.*several.*lines/s; // Note 's' modifier
console.log('Match? ' + r.test(s); // 'test' returns true

This is a recent addition and will not work in many current environments, for example Node v8.7.0 does not seem to recognise it, but it works in Chromium, and I'm using it in a Typescript test I'm writing and presumably it will become more mainstream as time goes by.

这是最近添加的,在当前的许多环境中都不起作用,例如 Node v8.7.0 似乎无法识别它,但它可以在 Chromium 中运行,我正在编写的 Typescript 测试中使用它,大概是它随着时间的推移,将变得更加主流。

回答by Y. Shoham

[.\n]doesn't work, because dot in [](by regex definition; not javascript only) means the dot-character. You can use (.|\n)(or (.|[\n\r])) instead.

[.\n]不起作用,因为 dot in [](通过正则表达式定义;不仅仅是 javascript)表示点字符。您可以使用(.|\n)(或(.|[\n\r])) 代替。

回答by Hzzkygcs

I have tested it (Chrome) and it working for me( both [^]and [^\0]), by changing the dot (.) by either [^\0]or [^], because dot doesn't match line break (See here: http://www.regular-expressions.info/dot.html).

我已经测试了它(Chrome)并且它对我([^][^\0])都有效,通过将点(.)更改为[^\0][^],因为点不匹配换行符(请参阅此处:http://www.regular-expressions.info/dot.html)。

var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre[^
^[\w\s]*$
]*?<\/pre>/gm ); alert(arr); //Working

回答by azhar22k

In addition to above-said examples, it is an alternate.

除了上述例子,它是一个替代。

##代码##

Where \wis for words and \sis for white spaces

哪里\w是单词,\s是空格