Javascript 在Javascript中获取两个字符串之间的字符串的正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5642315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regular expression to get a string between two strings in Javascript
提问by phil
I have found very similar posts, but I can't quite get my regular expression right here.
我发现了非常相似的帖子,但我不能在这里得到我的正则表达式。
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
我正在尝试编写一个正则表达式,它返回一个位于其他两个字符串之间的字符串。例如:我想获取位于字符串“cow”和“milk”之间的字符串。
My cow always gives milk
我的牛总是产奶
would return
会回来
"always gives"
“总是给予”
Here is the expression I have pieced together so far:
这是我到目前为止拼凑的表达方式:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".
但是,这将返回字符串“cow always Gives”。
回答by R. Martinho Fernandes
A lookahead (that (?=
part) does not consume any input. It is a zero-width assertion(as are boundary checks and lookbehinds).
前瞻(那(?=
部分)不消耗任何输入。这是一个零宽度断言(边界检查和后视)。
You want a regular match here, to consume the cow
portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
您想要在这里进行常规匹配,以消耗该cow
部分。要捕获两者之间的部分,请使用捕获组(只需将要捕获的模式部分放在括号内):
cow(.*)milk
No lookaheads are needed at all.
根本不需要前瞻。
回答by Wiktor Stribi?ew
Regular expression to get a string between two strings in JavaScript
在 JavaScript 中获取两个字符串之间的字符串的正则表达式
The most complete solution that will work in the vast majority of cases is using a capturing groupwith a lazy dot matching pattern. However, a dot .
in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^]
or [\s\S]
/[\d\D]
/[\w\W]
constructs.
适用于绝大多数情况的最完整的解决方案是使用具有惰性点匹配模式的捕获组。然而,一个点在JavaScript中的正则表达式不匹配换行符,所以,你会在100%的情况下工作是一种或/ /构造。.
[^]
[\s\S]
[\d\D]
[\w\W]
ECMAScript 2018 and newer compatible solution
ECMAScript 2018 和更新的兼容解决方案
In JavaScript environments supporting ECMAScript 2018, s
modifier allows .
to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
在支持ECMAScript 2018 的JavaScript 环境中,s
修饰符允许.
匹配任何字符,包括换行符,并且正则表达式引擎支持可变长度的lookbehinds。所以,你可以使用像这样的正则表达式
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow
with any 1/0 or more whitespaces after cow
, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk
is checked for (with any 1/0 or more whitespaces before this substring).
在这两种情况下,当前位置在 之后cow
使用任何 1/0 或更多空格进行检查cow
,然后匹配并消耗尽可能少的任何 0+ 个字符(=添加到匹配值),然后milk
检查(使用任何此子字符串前的 1/0 或更多空格)。
Scenario 1: Single-line input
场景一:单线输入
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
所有 JavaScript 环境都支持此方案和以下所有其他方案。请参阅答案底部的使用示例。
cow (.*?) milk
cow
is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *?
is a lazy quantifier, are captured into Group 1 and then a space with milk
must follow (and those are matched and consumed, too).
cow
首先找到,然后是一个空格,然后是除换行符以外的任何 0+ 字符,尽可能少*?
的惰性量词,被捕获到组 1 中,然后是一个milk
必须跟在后面的空格(这些也被匹配和消耗) )。
Scenario 2: Multiline input
场景 2:多行输入
cow ([\s\S]*?) milk
Here, cow
and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk
are matched.
在这里,cow
首先匹配一个空格,然后匹配任何尽可能少的0+字符并捕获到组1中,然后milk
匹配一个空格。
Scenario 3: Overlapping matches
场景 3:重叠匹配
If you have a string like >>>15 text>>>67 text2>>>
and you need to get 2 matches in-between >>>
+number
+whitespace
and >>>
, you can't use />>>\d+\s(.*?)>>>/g
as this will only find 1 match due to the fact the >>>
before 67
is already consumedupon finding the first match. You may use a positive lookaheadto check for the text presence without actually "gobbling" it (i.e. appending to the match):
如果您有一个像这样的字符串>>>15 text>>>67 text2>>>
并且您需要在>>>
+ number
+whitespace
和之间获得 2 个匹配项>>>
,则不能使用,/>>>\d+\s(.*?)>>>/g
因为这只会找到 1 个匹配项,因为在找到第一个匹配项时已经消耗了>>>
before 。您可以使用正向前瞻来检查文本是否存在,而无需实际“吞噬”它(即附加到匹配项):67
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demoyielding text1
and text2
as Group 1 contents found.
请参阅在线正则表达式演示生成text1
和text2
找到的第 1 组内容。
Also see How to get all possible overlapping matches for a string.
另请参阅如何获取字符串的所有可能重叠匹配项。
Performance considerations
性能注意事项
Lazy dot matching pattern (.*?
) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop techniquehelps to a greater extent. Trying to grab all between cow
and milk
from "Their\ncow\ngives\nmore\nmilk"
, we see that we just need to match all lines that do not start with milk
, thus, instead of cow\n([\s\S]*?)\nmilk
we can use:
.*?
如果给出很长的输入,则正则表达式模式中的惰性点匹配模式 ( ) 可能会减慢脚本执行速度。在许多情况下,展开循环技术在更大程度上有帮助。试图抓住之间的所有cow
和milk
来自"Their\ncow\ngives\nmore\nmilk"
中,我们看到,我们只需要匹配不启动的所有行milk
,因此,不是cow\n([\s\S]*?)\nmilk
我们可以使用:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo(if there can be \r\n
, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm
). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
请参阅正则表达式演示(如果可以\r\n
,请使用/cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm
)。使用这个小的测试字符串,性能提升可以忽略不计,但是对于非常大的文本,您会感觉到差异(尤其是在行很长且换行不是很多的情况下)。
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1] console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]); // Multiple matches: get multiple matches with a global modifier and // trim the results if length of leading/trailing delimiters is known var s = "My cow always gives milk, thier cow also gives milk"; console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);})); //or use RegExp#exec inside a loop to collect all the Group 1 contents var result = [], m, rx = /cow (.*?) milk/g; while ((m=rx.exec(s)) !== null) { result.push(m[1]); } console.log(result);
JavaScript 中的示例正则表达式用法:
//Single/First match expected: use no global modifier and access match[1] console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]); // Multiple matches: get multiple matches with a global modifier and // trim the results if length of leading/trailing delimiters is known var s = "My cow always gives milk, thier cow also gives milk"; console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);})); //or use RegExp#exec inside a loop to collect all the Group 1 contents var result = [], m, rx = /cow (.*?) milk/g; while ((m=rx.exec(s)) !== null) { result.push(m[1]); } console.log(result);
回答by entropo
Here's a regex which will grab what's between cow and milk (without leading/trailing space):
这是一个正则表达式,它将获取牛和牛奶之间的内容(没有前导/尾随空间):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "");
An example: http://jsfiddle.net/entropo/tkP74/
回答by Matt Ball
- You need capture the
.*
- You can (but don't have to) make the
.*
nongreedy There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk'); ["cow always gives milk", " always gives "]
- 你需要捕捉
.*
- 您可以(但不必)使非
.*
贪婪 真的不需要前瞻。
> /cow(.*?)milk/i.exec('My cow always gives milk'); ["cow always gives milk", " always gives "]
回答by phil
I was able to get what I needed using Martinho Fernandes' solution below. The code is:
使用下面的 Martinho Fernandes 的解决方案,我能够得到我需要的东西。代码是:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
您会注意到我将 testRE 变量作为数组发出警报。这是因为出于某种原因,testRE 作为数组返回。输出来自:
My cow always gives milk
Changes into:
更改为:
always gives
回答by duduwe
回答by Brandon
Just use the following regular expression:
只需使用以下正则表达式:
(?<=My cow\s).*?(?=\smilk)
回答by Chase Oliphant
I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
鉴于语法,我发现正则表达式既乏味又耗时。由于您已经在使用 javascript,因此在没有正则表达式的情况下更容易执行以下操作:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"
回答by Naresh Kumar
If the data is on multiple lines then you may have to use the following,
如果数据在多行上,那么您可能必须使用以下内容,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
回答by Vasily Bodnarchuk
Task
任务
Extract substring between two string (excluding this two strings)
提取两个字符串之间的子字符串(不包括这两个字符串)
Solution
解决方案
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\s)(.+?)(?=\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}