用逗号分割一个字符串,但使用 Javascript 忽略双引号内的逗号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11456850/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 05:54:02  来源:igfitidea点击:

Split a string by commas but ignore commas within double-quotes using Javascript

javascriptregex

提问by jpecht

I'm looking for [a, b, c, "d, e, f", g, h]to turn into an array of 6 elements: a, b, c, "d,e,f", g, h. I'm trying to do this through Javascript. This is what I have so far:

我正在寻找[a, b, c, "d, e, f", g, h]变成 6 个元素的数组:a、b、c、“d、e、f”、g、h。我正在尝试通过 Javascript 来做到这一点。这是我到目前为止:

str = str.split(/,+|"[^"]+"/g); 

But right now it's splitting out everything that's in the double-quotes, which is incorrect.

但是现在它正在拆分双引号中的所有内容,这是不正确的。

Edit: Okay sorry I worded this question really poorly. I'm being given a string not an array.

编辑:好的,对不起,我对这个问题的措辞非常糟糕。我得到的是一个字符串而不是一个数组。

var str = 'a, b, c, "d, e, f", g, h';

And I want to turn thatinto an array using something like the "split" function.

我想使用类似“split”函数的东西把变成一个数组。

回答by inhan

Here's what I would do.

这就是我要做的。

var str = 'a, b, c, "d, e, f", g, h';
var arr = str.match(/(".*?"|[^",\s]+)(?=\s*,|\s*$)/g);
/* will match:

    (
        ".*?"       double quotes + anything but double quotes + double quotes
        |           OR
        [^",\s]+    1 or more characters excl. double quotes, comma or spaces of any kind
    )
    (?=             FOLLOWED BY
        \s*,        0 or more empty spaces and a comma
        |           OR
        \s*$        0 or more empty spaces and nothing else (end of string)
    )

*/
arr = arr || [];
// this will prevent JS from throwing an error in
// the below loop when there are no matches
for (var i = 0; i < arr.length; i++) console.log('arr['+i+'] =',arr[i]);

回答by shifu.zheng

Here is a JavaScript function to do it:

这是一个 JavaScript 函数来做到这一点:

function splitCSVButIgnoreCommasInDoublequotes(str) {  
    //split the str first  
    //then merge the elments between two double quotes  
    var delimiter = ',';  
    var quotes = '"';  
    var elements = str.split(delimiter);  
    var newElements = [];  
    for (var i = 0; i < elements.length; ++i) {  
        if (elements[i].indexOf(quotes) >= 0) {//the left double quotes is found  
            var indexOfRightQuotes = -1;  
            var tmp = elements[i];  
            //find the right double quotes  
            for (var j = i + 1; j < elements.length; ++j) {  
                if (elements[j].indexOf(quotes) >= 0) {  
                    indexOfRightQuotes = j; 
                    break;
                }  
            }  
            //found the right double quotes  
            //merge all the elements between double quotes  
            if (-1 != indexOfRightQuotes) {   
                for (var j = i + 1; j <= indexOfRightQuotes; ++j) {  
                    tmp = tmp + delimiter + elements[j];  
                }  
                newElements.push(tmp);  
                i = indexOfRightQuotes;  
            }  
            else { //right double quotes is not found  
                newElements.push(elements[i]);  
            }  
        }  
        else {//no left double quotes is found  
            newElements.push(elements[i]);  
        }  
    }  

    return newElements;  
}  

回答by John Fisher

This works well for me. (I used semicolons so the alert message would show the difference between commas added when turning the array into a string and the actual captured values.)

这对我很有效。(我使用了分号,因此警报消息将显示将数组转换为字符串时添加的逗号与实际捕获的值之间的差异。)

var str = 'a; b; c; "d; e; f"; g; h; "i"';
var array = str.match(/("[^"]*")|[^;]+/g); 
alert(array);

回答by Andrew Ulrich

Here's a non-regex one that assumes doublequotes will come in pairs:

这是一个假设双引号成对出现的非正则表达式:

function splitCsv(str) {
  return str.split(',').reduce((accum,curr)=>{
    if(accum.isConcatting) {
      accum.soFar[accum.soFar.length-1] += ','+curr
    } else {
      accum.soFar.push(curr)
    }
    if(curr.split('"').length % 2 == 0) {
      accum.isConcatting= !accum.isConcatting
    }
    return accum;
  },{soFar:[],isConcatting:false}).soFar
}

console.log(splitCsv('asdf,"a,d",fdsa'),' should be ',['asdf','"a,d"','fdsa'])
console.log(splitCsv(',asdf,,fds,'),' should be ',['','asdf','','fds',''])
console.log(splitCsv('asdf,"a,,,d",fdsa'),' should be ',['asdf','"a,,,d"','fdsa'])

回答by f-society

regex: /,(?=(?:(?:[^"]*"){2})*[^"]*$)/

正则表达式: /,(?=(?:(?:[^"]*"){2})*[^"]*$)/

const input_line = '"2C95699FFC68","201 S BOULEVARDRICHMOND, VA 23220","8299600062754882","2018-09-23"'

let my_split = input_line.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/)[4]

Output: 
my_split[0]: "2C95699FFC68", 
my_split[1]: "201 S BOULEVARDRICHMOND, VA 23220", 
my_split[2]: "8299600062754882", 
my_split[3]: "2018-09-23"

Reference following link for an explanation: regexr.com/44u6o

参考以下链接进行解释:regexr.com/44u6o

回答by thisismydesign

Here's the regex we're usingto extract valid arguments from a comma-separated argument list, supporting double-quoted arguments. It works for the outlined edge cases. E.g.

这是我们用来从逗号分隔的参数列表中提取有效参数的正则表达式,支持双引号参数。它适用于概述的边缘情况。例如

  • doesn't include quotes in the matches
  • works with white spaces in matches
  • works with empty fields
  • 不包括匹配中的引号
  • 使用匹配中的空格
  • 适用于空字段

(?<=")[^"]+?(?="(?:\s*?,|\s*?$))|(?<=(?:^|,)\s*?)(?:[^,"\s][^,"]*[^,"\s])|(?:[^,"\s])(?![^"]*?"(?:\s*?,|\s*?$))(?=\s*?(?:,|$))

(?<=")[^"]+?(?="(?:\s*?,|\s*?$))|(?<=(?:^|,)\s*?)(?:[^,"\s][^,"]*[^,"\s])|(?:[^,"\s])(?![^"]*?"(?:\s*?,|\s*?$))(?=\s*?(?:,|$))

Proof: https://regex101.com/r/UL8kyy/3/tests(Note: currently only works in Chrome because the regex uses lookbehinds which are only supported in ECMA2018)

证明:https ://regex101.com/r/UL8kyy/3/tests注意:目前仅适用于Chrome,因为正则表达式使用仅在ECMA2018中支持的lookbehinds

According to our guidelinesit avoids non-capturing groups and greedy matching.

根据我们的指导方针,它避免了非捕获组和贪婪匹配。

I'm sure it can be simplified, I'm open to suggestions / additional test cases.

我确定它可以简化,我愿意接受建议/其他测试用例。

For anyone interested, the first part matches double-quoted, comma-delimited arguments:

对于任何感兴趣的人,第一部分匹配双引号、逗号分隔的参数:

(?<=")[^"]+?(?="(?:\s*?,|\s*?$))

(?<=")[^"]+?(?="(?:\s*?,|\s*?$))

And the second part matches comma-delimited arguments by themselves:

第二部分自己匹配逗号分隔的参数:

(?<=(?:^|,)\s*?)(?:[^,"\s][^,"]*[^,"\s])|(?:[^,"\s])(?![^"]*?"(?:\s*?,|\s*?$))(?=\s*?(?:,|$))

(?<=(?:^|,)\s*?)(?:[^,"\s][^,"]*[^,"\s])|(?:[^,"\s])(?![^"]*?"(?:\s*?,|\s*?$))(?=\s*?(?:,|$))

回答by ling

I almost liked the accepted answer, but it didn't parse the space correctly, and/or it left the double quotes untrimmed, so here is my function:

我几乎喜欢接受的答案,但它没有正确解析空格,和/或它没有修剪双引号,所以这是我的函数:

    /**
     * Splits the given string into components, and returns the components array.
     * Each component must be separated by a comma.
     * If the component contains one or more comma(s), it must be wrapped with double quotes.
     * The double quote must not be used inside components (replace it with a special string like __double__quotes__ for instance, then transform it again into double quotes later...).
     *
     * https://stackoverflow.com/questions/11456850/split-a-string-by-commas-but-ignore-commas-within-double-quotes-using-javascript
     */
    function splitComponentsByComma(str){
        var ret = [];
        var arr = str.match(/(".*?"|[^",]+)(?=\s*,|\s*$)/g);
        for (let i in arr) {
            let element = arr[i];
            if ('"' === element[0]) {
                element = element.substr(1, element.length - 2);
            } else {
                element = arr[i].trim();
            }
            ret.push(element);
        }
        return ret;
    }
    console.log(splitComponentsByComma('Hello World, b, c, "d, e, f", c')); // [ 'Hello World', 'b', 'c', 'd, e, f', 'c' ]

回答by Ioannis Karadimas

I know it's a bit long, but here's my take:

我知道这有点长,但这是我的看法:

var sample="[a, b, c, \"d, e, f\", g, h]";

var inQuotes = false, items = [], currentItem = '';

for(var i = 0; i < sample.length; i++) {
  if (sample[i] == '"') { 
    inQuotes = !inQuotes; 

    if (!inQuotes) {
      if (currentItem.length) items.push(currentItem);
      currentItem = '';
    }

    continue; 
  }

  if ((/^[\"\[\]\,\s]$/gi).test(sample[i]) && !inQuotes) {
    if (currentItem.length) items.push(currentItem);
    currentItem = '';
    continue;
  }

  currentItem += sample[i];
}

if (currentItem.length) items.push(currentItem);

console.log(items);

As a side note, it will work both with, and without the braces in the start and end.

作为旁注,它可以在开始和结束时使用和不使用大括号。

回答by JamesHennigan

This takes a csv file one line at a time and spits back an array with commas inside speech marks intact. if there are no speech marks detected it just .split(",")s as normal... could probs replace that second loop with something but it does the job as is

这一次需要一个 csv 文件,并在完整的语音标记内吐出一个带有逗号的数组。如果没有检测到语音标记,它只是 .split(",")s 正常......可以用某些东西替换第二个循环,但它按原样完成工作

function parseCSVLine(str){
    if(str.indexOf("\"")>-1){
        var aInputSplit = str.split(",");
        var aOutput = [];
        var iMatch = 0;
        //var adding = 0;
        for(var i=0;i<aInputSplit.length;i++){
            if(aInputSplit[i].indexOf("\"")>-1){
                var sWithCommas = aInputSplit[i];
                for(var z=i;z<aInputSplit.length;z++){
                    if(z !== i && aInputSplit[z].indexOf("\"") === -1){
                        sWithCommas+= ","+aInputSplit[z];
                    }else if(z !== i && aInputSplit[z].indexOf("\"") > -1){
                        sWithCommas+= ","+aInputSplit[z];
                        sWithCommas.replace(new RegExp("\"", 'g'), "");
                        aOutput.push(sWithCommas);
                        i=z;
                        z=aInputSplit.length+1;
                        iMatch++;
                    }
                    if(z === aInputSplit.length-1){
                        if(iMatch === 0){
                            aOutput.push(aInputSplit[z]);
                        }                  
                        iMatch = 0;
                    }
                }
            }else{
                aOutput.push(aInputSplit[i]);
            }
        }
        return aOutput
    }else{
        return str.split(",")
    }
}

回答by Shashank Saxena

jsfiddle setting imagecode output image

jsfiddle设置图片代码输出图片

The code works if your input string in the format of stringTocompare. Run the code on https://jsfiddle.net/to see output for fiddlejs setting. Please refer to the screenshot. You can either use split function for the same for the code below it and tweak the code according to you need. Remove the bold or word with in ** from the code if you dont want to have comma after split attach=attach**+","**+actualString[t+1].

如果您的输入字符串采用 stringTocompare 格式,则该代码有效。在https://jsfiddle.net/上运行代码以查看 fiddlejs 设置的输出。请参考截图。您可以对其下方的代码使用 split 函数,并根据需要调整代码。如果在 split attach=attach**+","**+actualString[t+1] 之后不想有逗号,请从代码中删除粗体或带有 in ** 的单词。

var stringTocompare='"Manufacturer","12345","6001","00",,"Calfe,eto,lin","Calfe,edin","4","20","10","07/01/2018","01/01/2006",,,,,,,,"03/31/2004"';

console.log(stringTocompare);

var actualString=stringTocompare.split(',');
console.log("Before");
for(var i=0;i<actualString.length;i++){
console.log(actualString[i]);
}
//var actualString=stringTocompare.split(/,(?=(?:(?:[^"]*"){2})*[^"]*$)/);
for(var i=0;i<actualString.length;i++){
var flag=0;
var x=actualString[i];
if(x!==null)
{
if(x[0]=='"' && x[x.length-1]!=='"'){
   var p=0;
   var t=i;
   var b=i;
   for(var k=i;k<actualString.length;k++){
   var y=actualString[k];
        if(y[y.length-1]!=='"'){        
        p++;
        }
        if(y[y.length-1]=='"'){

                flag=1;
        }
        if(flag==1)
        break;
   }
   var attach=actualString[t];
for(var s=p;s>0;s--){

  attach=attach+","+actualString[t+1];
  t++;
}
actualString[i]=attach;
actualString.splice(b+1,p);
}
}


}
console.log("After");
for(var i=0;i<actualString.length;i++){
console.log(actualString[i]);
}




  [1]: https://i.stack.imgur.com/3FcxM.png