C# 拆分具有空格的字符串,除非它们包含在“引号”中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14655023/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 12:36:16  来源:igfitidea点击:

Split a string that has white spaces, unless they are enclosed within "quotes"?

c#split

提问by Teachme

To make things simple:

为了使事情简单:

string streamR = sr.ReadLine();  // sr.Readline results in:
                                 //                         one "two two"

I want to be able to save them as two different strings, remove all spaces EXCEPT for the spaces found between quotation marks. Therefore, what I need is:

我希望能够将它们保存为两个不同的字符串,删除除引号之间的空格之外的所有空格。因此,我需要的是:

string 1 = one
string 2 = two two

So far what I have found that works is the following code, but it removes the spaces within the quotes.

到目前为止,我发现有效的是以下代码,但它删除了引号内的空格。

//streamR.ReadLine only has two strings
  string[] splitter = streamR.Split(' ');
    str1 = splitter[0];
    // Only set str2 if the length is >1
    str2 = splitter.Length > 1 ? splitter[1] : string.Empty;

The output of this becomes

这个的输出变成

one
two

I have looked into Regular Expression to split on spaces unless in quoteshowever I can't seem to get regex to work/understand the code, especially how to split them so they are two different strings. All the codes there give me a compiling error (I am using System.Text.RegularExpressions)

我已经研究了正则表达式来拆分空格,除非在引号中,但是我似乎无法让正则表达式工作/理解代码,尤其是如何拆分它们,使它们成为两个不同的字符串。那里的所有代码都给我一个编译错误(我正在使用System.Text.RegularExpressions

采纳答案by I4V

string input = "one \"two two\" three \"four four\" five six";
var parts = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
                .Cast<Match>()
                .Select(m => m.Value)
                .ToList();

回答by psubsee2003

As custom parser might be more suitable for this.

因为自定义解析器可能更适合于此。

This is something I wrote once when I had a specific (and very strange) parsing requirement that involved parenthesis and spaces, but it is generic enough that it should work with virtually any delimiter and text qualifier.

这是我曾经写过的东西,当我有一个涉及括号和空格的特定(并且非常奇怪)的解析要求时,但它足够通用,几乎可以与任何分隔符和文本限定符一起使用。

public static IEnumerable<String> ParseText(String line, Char delimiter, Char textQualifier)
{

    if (line == null)
        yield break;

    else
    {
        Char prevChar = '
var parsedText = ParseText(streamR, ' ', '"');
'; Char nextChar = '
var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();
'; Char currentChar = '
This is a test for "Splitting a string" that has white spaces, unless they are "enclosed within quotes"
'; Boolean inString = false; StringBuilder token = new StringBuilder(); for (int i = 0; i < line.Length; i++) { currentChar = line[i]; if (i > 0) prevChar = line[i - 1]; else prevChar = '
This
is
a
test
for
Splitting a string
that
has
white
spaces,
unless
they
are
enclosed within quotes
'; if (i + 1 < line.Length) nextChar = line[i + 1]; else nextChar = '
string myString = "WordOne \"Word Two\"";
var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();
'; if (currentChar == textQualifier && (prevChar == '
string inputString = "This is \"a test\" of the parser.";

using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(inputString)))
{
    using (Microsoft.VisualBasic.FileIO.TextFieldParser tfp = new TextFieldParser(ms))
    {
        tfp.Delimiters = new string[] { " " };
        tfp.HasFieldsEnclosedInQuotes = true;
        string[] output = tfp.ReadFields();

        for (int i = 0; i < output.Length; i++)
        {
            Console.WriteLine("{0}:{1}", i, output[i]);
        }
    }
}
' || prevChar == delimiter) && !inString) { inString = true; continue; } if (currentChar == textQualifier && (nextChar == '
0:This
1:is
2:a test
3:of
4:the
5:parser.
' || nextChar == delimiter) && inString) { inString = false; continue; } if (currentChar == delimiter && !inString) { yield return token.ToString(); token = token.Remove(0, token.Length); continue; } token = token.Append(currentChar); } yield return token.ToString(); } }

The usage would be:

用法是:

string myString = "WordOne \"Word Two";
int placement = myString.LastIndexOf("\"", StringComparison.Ordinal);
if (placement >= 0)
myString = myString.Remove(placement, 1);

var result = myString.Split('"')
                     .Select((element, index) => index % 2 == 0  // If even index
                                           ? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                           : new string[] { element })  // Keep the entire item
                     .SelectMany(element => element).ToList();

Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();

回答by Cédric Bignon

You can even do that without Regex: a LINQ expression with String.Splitcan do the job.

你甚至可以在没有 Regex 的情况下做到这一点:一个 LINQ 表达式String.Split可以完成这项工作。

You can split your string before by "then split only the elements with even index in the resulting array by .

您可以在之前拆分字符串,"然后仅拆分结果数组中具有偶数索引的元素 by

string myString = "WordOne \"Word Two\" Three"

For the string:

对于字符串:

    public static List<String> Split(this string myString, char separator, char escapeCharacter)
    {
        int nbEscapeCharactoers = myString.Count(c => c == escapeCharacter);
        if (nbEscapeCharactoers % 2 != 0) // uneven number of escape characters
        {
            int lastIndex = myString.LastIndexOf("" + escapeCharacter, StringComparison.Ordinal);
            myString = myString.Remove(lastIndex, 1); // remove the last escape character
        }
        var result = myString.Split(escapeCharacter)
                             .Select((element, index) => index % 2 == 0  // If even index
                                                   ? element.Split(new[] { separator }, StringSplitOptions.RemoveEmptyEntries)  // Split the item
                                                   : new string[] { element })  // Keep the entire item
                             .SelectMany(element => element).ToList();
        return result;
    }

It gives the result:

它给出了结果:

a "b b" "c ""c"" c"

UPDATE

更新

a 
"b b"
"c ""c"" c"

UPDATE 2

更新 2

How do you define a quoted portion of the string?

你如何定义字符串的引用部分?

We will assume that the string before the first "is non-quoted.

我们将假设第一个之前的字符串"没有被引用。

Then, the string placed between the first "and before the second "is quoted. The string between the second "and the third "is non-quoted. The string between the third and the fourth is quoted, ...

然后,引用放在第一个"和第二个之前的字符串"。第二个"和第三个之间的字符串"没有被引用。第三个和第四个之间的字符串被引用,...

The general rule is: Each string between the (2*n-1)th (odd number) "and (2*n)th (even number) "is quoted. (1)

一般规则是:第 (2*n-1) 个(奇数)"和第 (2*n) 个(偶数)之间的每个字符串都"被引用。(1)

What is the relation with String.Split?

与什么关系String.Split

String.Split with the default StringSplitOption (define as StringSplitOption.None) creates an list of 1 string and then add a new string in the list for each splitting character found.

String.Split 使用默认的 StringSplitOption(定义为 StringSplitOption.None)创建一个包含 1 个字符串的列表,然后在列表中为找到的每个拆分字符添加一个新字符串。

So, before the first ", the string is at index 0 in the splitted array, between the first and second ", the string is at index 1 in the array, between the third and fourth, index 2, ...

因此,在第一个之前",字符串位于拆分数组中的索引 0,在第一个和第二个之间",字符串位于数组中的索引 1,在第三个和第四个之间,索引 2,...

The general rule is: The string between the nth and (n+1)th "is at index n in the array. (2)

一般规则是:第 n 个和第 (n+1) 个之间的字符串位于"数组中的索引 n。(2)

The given (1)and (2), we can conclude that: Quoted portion are at odd index in the splitted array.

给定(1)(2),我们可以得出结论: 引用部分在拆分数组中处于奇数索引。

回答by John Koerner

You can use the TextFieldParserclass that is part of the Microsoft.VisualBasic.FileIOnamespace. (You'll need to add a reference to Microsoft.VisualBasicto your project.):

您可以使用作为命名空间一部分的TextFieldParserMicrosoft.VisualBasic.FileIO。(您需要添加Microsoft.VisualBasic对项目的引用。):

var list=Regex.Matches(value, @"\""(\""\""|[^\""])+\""|[^ ]+", 
    RegexOptions.ExplicitCapture)
            .Cast<Match>()
            .Select(m => m.Value)
            .ToList();

Which generates the output:

生成输出:

Select(m => m.StartsWith("\"") ? m.Substring(1, m.Length - 2).Replace("\"\"", "\"") : m)

回答by Squazz

OP wanted to

OP想要

... remove all spaces EXCEPT for the spaces found betweenquotation marks

... 删除除引号之间的空格之外的所有空格

The solution from Cédric Bignon almost did this, but didn't take into account that there could be an uneven number of quotation marks. Starting out by checking for this, and then removing the excess ones, ensures that we only stop splitting if the element really is encapsulated by quotation marks.

Cédric Bignon 的解决方案几乎做到了这一点,但没有考虑到引号的数量可能是奇数。通过检查这一点开始,然后删除多余的,确保我们只有在元素确实被引号封装时才停止拆分。

a 
b b
c "c" c

Credit for the logic goes to Cédric Bignon, I only added a safeguard.

归功于 Cédric Bignon 的逻辑,我只添加了一个保护措施。

回答by user3566056

There's just a tiny problem with Squazz' answer.. it works for his string, but not if you add more items. E.g.

Squazz 的回答只是一个小问题......它适用于他的字符串,但如果你添加更多项目则不行。例如

##代码##

In that case, the removal of the last quotation mark would get us 4 results, not three.

在这种情况下,删除最后一个引号将得到 4 个结果,而不是三个。

That's easily fixed though.. just count the number of escape characters, and if it's uneven, strip the last (adapt as per your requirements..)

不过,这很容易解决……只需计算转义字符的数量,如果不均匀,则去掉最后一个(根据您的要求进行调整……)

##代码##

I also turned it into an extension method and made separator and escape character configurable.

我还把它变成了一个扩展方法,并使分隔符和转义字符可配置。

回答by kux

With support for double quotes.

支持双引号。

String:

细绳:

##代码##

Result:

结果:

##代码##

Code:

代码:

##代码##

Optional remove double quotes:

可选删除双引号:

##代码##

Result

结果

##代码##