Javascript 在Javascript中获取两个字符串之间的字符串的正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5642315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 18:13:34  来源:igfitidea点击:

Regular expression to get a string between two strings in Javascript

javascriptregexstring

提问by phil

I have found very similar posts, but I can't quite get my regular expression right here.

我发现了非常相似的帖子,但我不能在这里得到我的正则表达式。

I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".

我正在尝试编写一个正则表达式,它返回一个位于其他两个字符串之间的字符串。例如:我想获取位于字符串“cow”和“milk”之间的字符串。

My cow always gives milk

我的牛总是产奶

would return

会回来

"always gives"

“总是给予”

Here is the expression I have pieced together so far:

这是我到目前为止拼凑的表达方式:

(?=cow).*(?=milk)

However, this returns the string "cow always gives".

但是,这将返回字符串“cow always Gives”。

回答by R. Martinho Fernandes

A lookahead (that (?=part) does not consume any input. It is a zero-width assertion(as are boundary checks and lookbehinds).

前瞻(那(?=部分)不消耗任何输入。这是一个零宽度断言(边界检查和后视)。

You want a regular match here, to consume the cowportion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):

您想要在这里进行常规匹配,以消耗该cow部分。要捕获两者之间的部分,请使用捕获组(只需将要捕获的模式部分放在括号内):

cow(.*)milk

No lookaheads are needed at all.

根本不需要前瞻。

回答by Wiktor Stribi?ew

Regular expression to get a string between two strings in JavaScript

在 JavaScript 中获取两个字符串之间的字符串的正则表达式

The most complete solution that will work in the vast majority of cases is using a capturing groupwith a lazy dot matching pattern. However, a dot .in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^]or [\s\S]/[\d\D]/[\w\W]constructs.

适用于绝大多数情况的最完整的解决方案是使用具有惰性点匹配模式捕获组。然而,一个点在JavaScript中的正则表达式不匹配换行符,所以,你会在100%的情况下工作是一种或/ /构造。.[^][\s\S][\d\D][\w\W]

ECMAScript 2018 and newer compatible solution

ECMAScript 2018 和更新的兼容解决方案

In JavaScript environments supporting ECMAScript 2018, smodifier allows .to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like

在支持ECMAScript 2018 的JavaScript 环境中,s修饰符允许.匹配任何字符,包括换行符,并且正则表达式引擎支持可变长度的lookbehinds。所以,你可以使用像这样的正则表达式

var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional

In both cases, the current position is checked for cowwith any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milkis checked for (with any 1/0 or more whitespaces before this substring).

在这两种情况下,当前位置在 之后cow使用任何 1/0 或更多空格进行检查cow,然后匹配并消耗尽可能少的任何 0+ 个字符(=添加到匹配值),然后milk检查(使用任何此子字符串前的 1/0 或更多空格)。

Scenario 1: Single-line input

场景一:单线输入

This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.

所有 JavaScript 环境都支持此方案和以下所有其他方案。请参阅答案底部的使用示例。

cow (.*?) milk

cowis found first, then a space, then any 0+ chars other than line break chars, as few as possible as *?is a lazy quantifier, are captured into Group 1 and then a space with milkmust follow (and those are matched and consumed, too).

cow首先找到,然后是一个空格,然后是除换行符以外的任何 0+ 字符,尽可能少*?的惰性量词,被捕获到组 1 中,然后是一个milk必须跟在后面的空格(这些也被匹配和消耗) )。

Scenario 2: Multiline input

场景 2:多行输入

cow ([\s\S]*?) milk

Here, cowand a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milkare matched.

在这里,cow首先匹配一个空格,然后匹配任何尽可能少的0+字符并捕获到组1中,然后milk匹配一个空格。

Scenario 3: Overlapping matches

场景 3:重叠匹配

If you have a string like >>>15 text>>>67 text2>>>and you need to get 2 matches in-between >>>+number+whitespaceand >>>, you can't use />>>\d+\s(.*?)>>>/gas this will only find 1 match due to the fact the >>>before 67is already consumedupon finding the first match. You may use a positive lookaheadto check for the text presence without actually "gobbling" it (i.e. appending to the match):

如果您有一个像这样的字符串>>>15 text>>>67 text2>>>并且您需要在>>>+ number+whitespace和之间获得 2 个匹配项>>>,则不能使用,/>>>\d+\s(.*?)>>>/g因为这只会找到 1 个匹配项,因为在找到第一个匹配项时已经消耗>>>before 。您可以使用正向前瞻来检查文本是否存在,而无需实际“吞噬”它(即附加到匹配项):67

/>>>\d+\s(.*?)(?=>>>)/g

See the online regex demoyielding text1and text2as Group 1 contents found.

请参阅在线正则表达式演示生成text1text2找到的第 1 组内容。

Also see How to get all possible overlapping matches for a string.

另请参阅如何获取字符串的所有可能重叠匹配项

Performance considerations

性能注意事项

Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop techniquehelps to a greater extent. Trying to grab all between cowand milkfrom "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilkwe can use:

.*?如果给出很长的输入,则正则表达式模式中的惰性点匹配模式 ( ) 可能会减慢脚本执行速度。在许多情况下,展开循环技术在更大程度上有帮助。试图抓住之间的所有cowmilk来自"Their\ncow\ngives\nmore\nmilk"中,我们看到,我们只需要匹配不启动的所有行milk,因此,不是cow\n([\s\S]*?)\nmilk我们可以使用:

/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm

See the regex demo(if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).

请参阅正则表达式演示(如果可以\r\n,请使用/cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm)。使用这个小的测试字符串,性能提升可以忽略不计,但是对于非常大的文本,您会感觉到差异(尤其是在行很长且换行不是很多的情况下)。

Sample regex usage in JavaScript:

//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
  result.push(m[1]);
}
console.log(result);

JavaScript 中的示例正则表达式用法:

//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
  result.push(m[1]);
}
console.log(result);

回答by entropo

Here's a regex which will grab what's between cow and milk (without leading/trailing space):

这是一个正则表达式,它将获取牛和牛奶之间的内容(没有前导/尾随空间):

srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "");

An example: http://jsfiddle.net/entropo/tkP74/

一个例子:http: //jsfiddle.net/entropo/tkP74/

回答by Matt Ball

  • You need capture the .*
  • You can (but don't have to) make the .*nongreedy
  • There's really no need for the lookahead.

    > /cow(.*?)milk/i.exec('My cow always gives milk');
    ["cow always gives milk", " always gives "]
    
  • 你需要捕捉 .*
  • 您可以(但不必)使非.*贪婪
  • 真的不需要前瞻。

    > /cow(.*?)milk/i.exec('My cow always gives milk');
    ["cow always gives milk", " always gives "]
    

回答by phil

I was able to get what I needed using Martinho Fernandes' solution below. The code is:

使用下面的 Martinho Fernandes 的解决方案,我能够得到我需要的东西。代码是:

var test = "My cow always gives milk";

var testRE = test.match("cow(.*)milk");
alert(testRE[1]);

You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:

您会注意到我将 testRE 变量作为数组发出警报。这是因为出于某种原因,testRE 作为数组返回。输出来自:

My cow always gives milk

Changes into:

更改为:

always gives

回答by duduwe

The chosen answer didn't work for me...hmm...

选择的答案对我不起作用......嗯......

Just add space after cow and/or before milk to trim spaces from " always gives "

只需在牛和/或牛奶之前添加空格以修剪“总是给出”中的空格

/(?<=cow ).*(?= milk)/

enter image description here

在此处输入图片说明

回答by Brandon

Just use the following regular expression:

只需使用以下正则表达式:

(?<=My cow\s).*?(?=\smilk)

回答by Chase Oliphant

I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:

鉴于语法,我发现正则表达式既乏味又耗时。由于您已经在使用 javascript,因此在没有正则表达式的情况下更容易执行以下操作:

const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"

回答by Naresh Kumar

If the data is on multiple lines then you may have to use the following,

如果数据在多行上,那么您可能必须使用以下内容,

/My cow ([\s\S]*)milk/gm

My cow always gives 
milk

Regex 101 example

正则表达式 101 示例

回答by Vasily Bodnarchuk

Task

任务

Extract substring between two string (excluding this two strings)

提取两个字符串之间的子字符串(不包括这两个字符串)

Solution

解决方案

let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\s)(.+?)(?=\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
    console.log(results[0]);
}