Javascript 和正则表达式:拆分字符串并保留分隔符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12001953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Javascript and regex: split string and keep the separator
提问by Milo?
I have a string:
我有一个字符串:
var string = "aaaaaa<br />† bbbb<br />‡ cccc"
And I would like to split this string with the delimiter <br />
followed by a special character.
我想用分隔符<br />
后跟一个特殊字符来分割这个字符串。
To do that, I am using this:
为此,我正在使用它:
string.split(/<br \/>&#?[a-zA-Z0-9]+;/g);
I am getting what I need, except that I am losing the delimiter. Here is the example: http://jsfiddle.net/JwrZ6/1/
我得到了我需要的东西,除了我丢失了分隔符。这是示例:http: //jsfiddle.net/JwrZ6/1/
How can I keep the delimiter?
我怎样才能保留分隔符?
采纳答案by Jon
Use (positive) lookaheadso that the regular expression asserts that the special character exists, but does not actually match it:
使用(正)前瞻,以便正则表达式断言特殊字符存在,但实际上并不匹配它:
string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g);
See it in action:
看看它在行动:
var string = "aaaaaa<br />† bbbb<br />‡ cccc";
console.log(string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g));
回答by jichi
I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.
我遇到了类似但略有不同的问题。无论如何,这里是三个不同场景的示例,用于保留分隔符的位置。
"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]
Warning:The fourth will only work to split single characters. ConnorsFanpresents an alternative:
警告:第四个仅适用于拆分单个字符。ConnorsFan提出了一个替代方案:
// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);
回答by Torsten Walter
If you wrap the delimiter in parantheses it will be part of the returned array.
如果将分隔符包装在括号中,它将成为返回数组的一部分。
string.split(/(<br \/>&#?[a-zA-Z0-9]+);/g);
// returns ["aaaaaa", "<br />†", "bbbb", "<br />‡", "cccc"]
Depending on which part you want to keep change which subgroup you match
根据您要保留的部分更改您匹配的子组
string.split(/(<br \/>)&#?[a-zA-Z0-9]+;/g);
// returns ["aaaaaa", "<br />", "bbbb", "<br />", "cccc"]
You could improve the expression by ignoring the case of letters string.split(/()&#?[a-z0-9]+;/gi);
您可以通过忽略字母的大小写来改进表达式 string.split(/()&#?[a-z0-9]+;/gi);
And you can match for predefined groups like this: \d
equals [0-9]
and \w
equals [a-zA-Z0-9_]
. This means your expression could look like this.
您可以像这样匹配预定义的组:\d
equals[0-9]
和\w
equals [a-zA-Z0-9_]
。这意味着您的表情可能如下所示。
string.split(/<br \/>(&#?[a-z\d]+;)/gi);
There is a good Regular Expression Reference on JavaScriptKit.
JavaScriptKit 上有一个很好的正则表达式参考。
回答by Fry
answered it here also JavaScript Split Regular Expression keep the delimiter
在这里也回答了 JavaScript 拆分正则表达式保留分隔符
use the (?=pattern) lookahead pattern in the regex example
在正则表达式示例中使用 (?=pattern) 前瞻模式
var string = '500x500-11*90~1+1';
string = string.replace(/(?=[$-/:-?{-~!"^_`\[\]])/gi, ",");
string = string.split(",");
this will give you the following result.
这将为您提供以下结果。
[ '500x500', '-11', '*90', '~1', '+1' ]
Can also be directly split
也可以直接拆分
string = string.split(/(?=[$-/:-?{-~!"^_`\[\]])/gi);
giving the same result
给出相同的结果
[ '500x500', '-11', '*90', '~1', '+1' ]
回答by SwiftNinjaPro
I made a modification to jichi's answer, and put it in a function which also supports multiple letters.
我对jichi的回答做了修改,放到了一个也支持多字母的函数中。
String.prototype.splitAndKeep = function(separator, method='seperate'){
var str = this;
if(method == 'seperate'){
str = str.split(new RegExp(`(${separator})`, 'g'));
}else if(method == 'infront'){
str = str.split(new RegExp(`(?=${separator})`, 'g'));
}else if(method == 'behind'){
str = str.split(new RegExp(`(.*?${separator})`, 'g'));
str = str.filter(function(el){return el !== "";});
}
return str;
};
jichi's answers 3rd method would not work in this function, so I took the 4th method, and removed the empty spaces to get the same result.
jichi 的答案第 3 种方法在此函数中不起作用,因此我采用了第 4 种方法,并删除了空格以获得相同的结果。
edit: second method which excepts an array to split char1 or char2
编辑:除数组之外的第二种方法来拆分 char1 或 char2
String.prototype.splitAndKeep = function(separator, method='seperate'){
var str = this;
function splitAndKeep(str, separator, method='seperate'){
if(method == 'seperate'){
str = str.split(new RegExp(`(${separator})`, 'g'));
}else if(method == 'infront'){
str = str.split(new RegExp(`(?=${separator})`, 'g'));
}else if(method == 'behind'){
str = str.split(new RegExp(`(.*?${separator})`, 'g'));
str = str.filter(function(el){return el !== "";});
}
return str;
}
if(Array.isArray(separator)){
var parts = splitAndKeep(str, separator[0], method);
for(var i = 1; i < separator.length; i++){
var partsTemp = parts;
parts = [];
for(var p = 0; p < partsTemp.length; p++){
parts = parts.concat(splitAndKeep(partsTemp[p], separator[i], method));
}
}
return parts;
}else{
return splitAndKeep(str, separator, method);
}
};
usage:
用法:
str = "first1-second2-third3-last";
str.splitAndKeep(["1", "2", "3"]) == ["first", "1", "-second", "2", "-third", "3", "-last"];
str.splitAndKeep("-") == ["first1", "-", "second2", "-", "third3", "-", "last"];
回答by Berezh
An extension function splits string with substring or RegEx and the delimiter is putted according to second parameter ahead or behind.
扩展函数用子字符串或正则表达式拆分字符串,并根据前面或后面的第二个参数放置分隔符。
String.prototype.splitKeep = function (splitter, ahead) {
var self = this;
var result = [];
if (splitter != '') {
var matches = [];
// Getting mached value and its index
var replaceName = splitter instanceof RegExp ? "replace" : "replaceAll";
var r = self[replaceName](splitter, function (m, i, e) {
matches.push({ value: m, index: i });
return getSubst(m);
});
// Finds split substrings
var lastIndex = 0;
for (var i = 0; i < matches.length; i++) {
var m = matches[i];
var nextIndex = ahead == true ? m.index : m.index + m.value.length;
if (nextIndex != lastIndex) {
var part = self.substring(lastIndex, nextIndex);
result.push(part);
lastIndex = nextIndex;
}
};
if (lastIndex < self.length) {
var part = self.substring(lastIndex, self.length);
result.push(part);
};
// Substitution of matched string
function getSubst(value) {
var substChar = value[0] == '0' ? '1' : '0';
var subst = '';
for (var i = 0; i < value.length; i++) {
subst += substChar;
}
return subst;
};
}
else {
result.add(self);
};
return result;
};
The test:
考试:
test('splitKeep', function () {
// String
deepEqual("1231451".splitKeep('1'), ["1", "231", "451"]);
deepEqual("123145".splitKeep('1', true), ["123", "145"]);
deepEqual("1231451".splitKeep('1', true), ["123", "145", "1"]);
deepEqual("hello man how are you!".splitKeep(' '), ["hello ", "man ", "how ", "are ", "you!"]);
deepEqual("hello man how are you!".splitKeep(' ', true), ["hello", " man", " how", " are", " you!"]);
// Regex
deepEqual("mhellommhellommmhello".splitKeep(/m+/g), ["m", "hellomm", "hellommm", "hello"]);
deepEqual("mhellommhellommmhello".splitKeep(/m+/g, true), ["mhello", "mmhello", "mmmhello"]);
});
回答by Berezh
I've been using this:
我一直在用这个:
String.prototype.splitBy = function (delimiter) {
var
delimiterPATTERN = '(' + delimiter + ')',
delimiterRE = new RegExp(delimiterPATTERN, 'g');
return this.split(delimiterRE).reduce((chunks, item) => {
if (item.match(delimiterRE)){
chunks.push(item)
} else {
chunks[chunks.length - 1] += item
};
return chunks
}, [])
}
Except that you shouldn't mess with String.prototype
, so here's a function version:
除了你不应该弄乱String.prototype
,所以这里有一个函数版本:
var splitBy = function (text, delimiter) {
var
delimiterPATTERN = '(' + delimiter + ')',
delimiterRE = new RegExp(delimiterPATTERN, 'g');
return text.split(delimiterRE).reduce(function(chunks, item){
if (item.match(delimiterRE)){
chunks.push(item)
} else {
chunks[chunks.length - 1] += item
};
return chunks
}, [])
}
So you could do:
所以你可以这样做:
var haystack = "aaaaaa<br />† bbbb<br />‡ cccc"
var needle = '<br \/>&#?[a-zA-Z0-9]+;';
var result = splitBy(haystack , needle)
console.log( JSON.stringify( result, null, 2) )
And you'll end up with:
你最终会得到:
[
"<br />† bbbb",
"<br />‡ cccc"
]