拆分字符串，忽略带引号的部分-IGI

时间：2020-03-05 18:38:39 　来源:igfitidea点击:

给定这样的字符串：

a,"string, with",various,"values, and some",quoted

有什么好的算法可以基于逗号将其拆分，而忽略引号内的逗号呢？

输出应该是一个数组：

[ "a", "string, with", "various", "values, and some", "quoted" ]

解决方案

回答

当然，使用CSV解析器会更好，但仅出于乐趣，我们可以：

Loop on the string letter by letter.
    If current_letter == quote : 
        toggle inside_quote variable.
    Else if (current_letter ==comma and not inside_quote) : 
        push current_word into array and clear current_word.
    Else 
        append the current_letter to current_word
When the loop is done push the current_word into array

回答

这里的作者放入了一个Ccode斑点，该斑点处理我们遇到问题的情况：

NET中的CSV文件导入

翻译起来应该不太困难。

回答

如果我选择的语言没有提供一种做到这一点的方法，那么我将首先考虑两种选择，这是简单的方法：

预解析并用另一个控制字符替换字符串中的逗号，然后分割它们，然后在数组上进行后解析，以用逗号替换以前使用的控制字符。
或者用逗号将它们分开，然后将结果数组后解析为另一个数组，检查每个数组条目上的前导引号，然后将条目连接起来，直到到达终止引号为止。

但是，这些都是骇客，如果这只是纯粹的"心理"练习，那么我怀疑它们将无济于事。如果这是一个现实世界的问题，那么这将有助于我们了解该语言，以便我们提供一些具体的建议。

回答

我用它来解析字符串，不确定在这里是否有帮助；但是也许做了一些小的修改？

function getstringbetween($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);   
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

$fullstring = "this is my [tag]dog[/tag]";
$parsed = getstringbetween($fullstring, "[tag]", "[/tag]");

echo $parsed; // (result = dog)

/ mp

回答

这是一个简单的算法：

确定字符串是否以''"'`字符开头
将字符串拆分为以''"'`字符分隔的数组。
否则，在数组中标记索引％2 == 1的那些项
连接数组中的项目以形成修改后的输入字符串。
将字符串拆分为以'，'字符分隔的数组。
将＃COMMA＃占位符数组中的所有实例替换为''，'字符。
该数组是输出。

这是python的实现：
(固定为处理'" a，b"，c，" d，e，f，h"，" i，j，k"')

def parse_input(input):

    quote_mod = int(not input.startswith('"'))

    input = input.split('"')
    for item in input:
        if item == '':
            input.remove(item)
    for i in range(len(input)):
        if i % 2 == quoted_mod:
            input[i] = input[i].replace(",", "#COMMA#")

    input = "".join(input).split(",")
    for item in input:
        if item == '':
            input.remove(item)
    for i in range(len(input)):
        input[i] = input[i].replace("#COMMA#", ",")
    return input

# parse_input('a,"string, with",various,"values, and some",quoted')
#  -> ['a,string', ' with,various,values', ' and some,quoted']
# parse_input('"a,b",c,"d,e,f,h","i,j,k"')
#  -> ['a,b', 'c', 'd,e,f,h', 'i,j,k']

回答

这是标准的CSV样式解析。许多人尝试使用正则表达式来执行此操作。使用正则表达式可以达到90％左右，但是我们确实需要一个真正的CSV解析器来正确执行。几个月前，我强烈建议在CodeProject上找到一个快速，出色的CCSV解析器！

回答

def parsecsv(instr):
    i = 0
    j = 0

    outstrs = []

    # i is fixed until a match occurs, then it advances
    # up to j. j inches forward each time through:

    while i < len(instr):

        if j < len(instr) and instr[j] == '"':
            # skip the opening quote...
            j += 1
            # then iterate until we find a closing quote.
            while instr[j] != '"':
                j += 1
                if j == len(instr):
                    raise Exception("Unmatched double quote at end of input.")

        if j == len(instr) or instr[j] == ',':
            s = instr[i:j]  # get the substring we've found
            s = s.strip()    # remove extra whitespace

            # remove surrounding quotes if they're there
            if len(s) > 2 and s[0] == '"' and s[-1] == '"':
                s = s[1:-1]

            # add it to the result
            outstrs.append(s)

            # skip over the comma, move i up (to where
            # j will be at the end of the iteration)
            i = j+1

        j = j+1

    return outstrs

def testcase(instr, expected):
    outstr = parsecsv(instr)
    print outstr
    assert expected == outstr

# Doesn't handle things like '1, 2, "a, b, c" d, 2' or
# escaped quotes, but those can be added pretty easily.

testcase('a, b, "1, 2, 3", c', ['a', 'b', '1, 2, 3', 'c'])
testcase('a,b,"1, 2, 3" , c', ['a', 'b', '1, 2, 3', 'c'])

# odd number of quotes gives a "unmatched quote" exception
#testcase('a,b,"1, 2, 3" , "c', ['a', 'b', '1, 2, 3', 'c'])

回答

看来我们在这里得到了一些不错的答案。

对于那些希望处理自己的CSV文件解析的人，请听从专家的建议，不要滚动自己的CSV解析器。

第一个想法是，"我需要在引号内处理逗号"。

下一个想法是，"哦，糟糕，我需要处理引号内的引号。转义的引号。双引号。单引号..."

这是通往疯狂的道路。不要自己写。查找具有广泛的单元测试覆盖面的库，该库涵盖了所有困难的部分，并为我们解决了所有难关。对于.NET，请使用免费的FileHelpers库。

回答

我只是忍不住想看看是否可以使它在Python一线模式下工作：

arr = [i.replace("|", ",") for i in re.sub('"([^"]*)\,([^"]*)"',"\g<1>|\g<2>", str_to_test).split(",")]

回答

Python：

import csv
reader = csv.reader(open("some.csv"))
for row in reader:
    print row

回答

What if an odd number of quotes appear
  in the original string?

这看起来与CSV解析异常相似，它在处理带引号的字段方面有一些特殊之处。仅当该字段用双引号引起来时才对该字段进行转义，因此：

field1, "field2, field3", field4, "field5, field6" field7

变成

field1
  
  field2, field3
  
  field4
  
  "field5
  
  field6" field7

请注意，如果它既不是以引号引起来又不是以引号结尾，则它不是带引号的字段，并且双引号将被简单地视为双引号。

如果我没记错的话，有人链接到的我的代码实际上并不能正确地处理此问题。

回答

这是一个基于Pat伪代码的简单python实现：

def splitIgnoringSingleQuote(string, split_char, remove_quotes=False):
    string_split = []
    current_word = ""
    inside_quote = False
    for letter in string:
      if letter == "'":
        if not remove_quotes:
           current_word += letter
        if inside_quote:
          inside_quote = False
        else:
          inside_quote = True
      elif letter == split_char and not inside_quote:
        string_split.append(current_word)
        current_word = ""
      else:
        current_word += letter
    string_split.append(current_word)
    return string_split

拆分字符串，忽略带引号的部分

解决方案

回答

回答

回答

回答

回答

回答

回答

回答

回答

回答

回答

回答

相关推荐

最近更新

标签

拆分字符串，忽略带引号的部分

解决方案

回答

回答

回答

回答

回答

回答

回答

回答

回答

回答

回答

回答

相关推荐

链接问题(VC6)

家谱树控制

用户访问SQL Server的日志

在Python中下载之前获取文件的大小

相关推荐

最近更新

标签