C# 如何将此字符串拆分为数组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/483702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 05:12:13  来源:igfitidea点击:

How can I split this string into an array?

c#string

提问by

My string is as follows:

我的字符串如下:

smtp:[email protected];SMTP:[email protected];X400:C=US;A= ;P=Test;O=Exchange;S=Hyman;G=Black;

I need back:

我需要回来:

smtp:[email protected]
SMTP:[email protected]
X400:C=US;A= ;P=Test;O=Exchange;S=Hyman;G=Black;

The problem is the semi-colons seperate the addresses and also part of the X400 address. Can anyone suggest how best to split this?

问题是分号分隔地址和 X400 地址的一部分。谁能建议如何最好地拆分它?

PS I should mentioned the order differs so it could be:

PS我应该提到顺序不同所以它可能是:

X400:C=US;A= ;P=Test;O=Exchange;S=Hyman;G=Black;;smtp:[email protected];SMTP:[email protected]

There can be more than 3 address, 4, 5.. 10 etc including an X500 address, however they do all start with either smtp: SMTP: X400 or X500.

可以有 3 个以上的地址,4、5.. 10 等,包括一个 X500 地址,但是它们都以 smtp: SMTP: X400 或 X500 开头。

回答by Jon Skeet

EDIT: With the updated information, this answer certainly won't do the trick - but it's still potentially useful, so I'll leave it here.

编辑:有了更新的信息,这个答案肯定不会成功 - 但它仍然可能有用,所以我会把它留在这里。

Will you always have three parts, and you just want to split on the first two semi-colons?

您是否总是有三个部分,而您只想在前两个分号上拆分?

If so, just use the overload of Split which lets you specify the number of substrings to return:

如果是这样,只需使用 Split 的重载,它可以让您指定要返回的子字符串的数量:

string[] bits = text.Split(new char[]{';'}, 3);

回答by The.Anti.9

http://msdn.microsoft.com/en-us/library/c1bs0eda.aspxcheck there, you can specify the number of splits you want. so in your case you would do

http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx检查那里,您可以指定所需的拆分数量。所以在你的情况下你会做

string.split(new char[]{';'}, 3);

回答by Amy B

Do the semicolon (;) split and then loop over the result, re-combining each element where there is no colon (:) with the previous element.

将分号 (;) 拆分,然后循环遍历结果,将没有冒号 (:) 的每个元素与前一个元素重新组合。

string input = "X400:C=US;A= ;P=Test;O=Exchange;S=Hyman;G="
  +"Black;;smtp:[email protected];SMTP:[email protected]";

string[] rawSplit = input.Split(';');

List<string> result = new List<string>();
  //now the fun begins
string buffer = string.Empty;
foreach (string s in rawSplit)
{
  if (buffer == string.Empty)
  {
    buffer = s;
  }
  else if (s.Contains(':'))
  {   
    result.Add(buffer);
    buffer = s;
  }
  else
  {
    buffer += ";" + s;
  }
}
result.Add(buffer);

foreach (string s in result)
  Console.WriteLine(s);

回答by Rob

This caught my curiosity.... So this code actually does the job, but again, wants tidying :)

这引起了我的好奇......所以这段代码实际上可以完成这项工作,但同样需要整理:)

My final attempt- stop changing what you need ;=)

我的最后一次尝试- 停止改变你需要的东西 ;=)

static void Main(string[] args)
{
    string fneh = "X400:C=US400;A= ;P=Test;O=Exchange;S=Hyman;G=Black;x400:C=US400l;A= l;P=Testl;O=Exchangel;S=Hymanl;G=Blackl;smtp:[email protected];X500:C=US500;A= ;P=Test;O=Exchange;S=Hyman;G=Black;SMTP:[email protected];";

    string[] parts = fneh.Split(new char[] { ';' });

    List<string> addresses = new List<string>();
    StringBuilder address = new StringBuilder();
    foreach (string part in parts)
    {
        if (part.Contains(":"))
        {
            if (address.Length > 0)
            {
                addresses.Add(semiColonCorrection(address.ToString()));
            }
            address = new StringBuilder();
            address.Append(part);
        }
        else
        {
            address.AppendFormat(";{0}", part);
        }
    }
    addresses.Add(semiColonCorrection(address.ToString()));

    foreach (string emailAddress in addresses)
    {
        Console.WriteLine(emailAddress);
    }
    Console.ReadKey();
}
private static string semiColonCorrection(string address)
{
    if ((address.StartsWith("x", StringComparison.InvariantCultureIgnoreCase)) && (!address.EndsWith(";")))
    {
        return string.Format("{0};", address);
    }
    else
    {
        return address;
    }
}

回答by Rad

Try these regexes. You can extract what you're looking for using named groups.

试试这些正则表达式。您可以使用命名组提取您要查找的内容。

X400:(?<X400>.*?)(?:smtp|SMTP|$)
smtp:(?<smtp>.*?)(?:;+|$)
SMTP:(?<SMTP>.*?)(?:;+|$)

Make sure when constructing them you specify case insensitive. They seem to work with the samples you gave

确保在构建它们时指定不区分大小写。他们似乎可以处理您提供的样品

回答by Greg

Not the fastest if you are doing this a lot but it will work for all cases I believe.

如果您经常这样做,则不是最快的,但我相信它适用于所有情况。

        string input1 = "smtp:[email protected];SMTP:[email protected];X400:C=US;A= ;P=Test;O=Exchange;S=Hyman;G=Black;";
        string input2 = "X400:C=US;A= ;P=Test;O=Exchange;S=Hyman;G=Black;;smtp:[email protected];SMTP:[email protected]";
        Regex splitEmailRegex = new Regex(@"(?<key>\w+?):(?<value>.*?)(\w+:|$)");

        List<string> sets = new List<string>();

        while (input2.Length > 0)
        {
            Match m1 = splitEmailRegex.Matches(input2)[0];
            string s1 = m1.Groups["key"].Value + ":" + m1.Groups["value"].Value;
            sets.Add(s1);
            input2 = input2.Substring(s1.Length);
        }

        foreach (var set in sets)
        {
            Console.WriteLine(set);
        }

        Console.ReadLine();

Of course many will claim Regex: Now you have two problems. There may even be a better regex answer than this.

当然,许多人会声称 Regex:现在您有两个问题。甚至可能有比这更好的正则表达式答案。

回答by Samuel

You could always split on the colon and have a little logic to grab the key and value.

你总是可以在冒号上拆分,并有一些逻辑来获取键和值。

string[] bits = text.Split(':');
List<string> values = new List<string>();
for (int i = 1; i < bits.Length; i++)
{
    string value = bits[i].Contains(';') ? bits[i].Substring(0, bits[i].LastIndexOf(';') + 1) : bits[i];
    string key = bits[i - 1].Contains(';') ? bits[i - 1].Substring(bits[i - 1].LastIndexOf(';') + 1) : bits[i - 1];
    values.Add(String.Concat(key, ":", value));
}

Tested it with both of your samples and it works fine.

用您的两个样品对其进行了测试,效果很好。

回答by Orion Adrian

May I suggest building a regular expression

我可以建议建立一个正则表达式吗

(smtp|SMTP|X400|X500):((?!smtp:|SMTP:|X400:|X500:).)*;?

or protocol-less

或无协议

.*?:((?![^:;]*:).)*;?

in other words find anything that starts with one of your protocols. Match the colon. Then continue matching characters as long as you're not matching one of your protocols. Finish with a semicolon (optionally).

换句话说,找到任何以您的协议之一开头的内容。匹配冒号。然后继续匹配字符,只要您不匹配您的协议之一。以分号(可选)结束。

You can then parse through the list of matches splitting on ':' and you'll have your protocols. Additionally if you want to add protocols, just add them to the list.

然后,您可以解析在 ':' 上拆分的匹配列表,您将拥有自己的协议。此外,如果您想添加协议,只需将它们添加到列表中即可。

Likely however you're going to want to specify the whole thing as case-insensitive and only list the protocols in their uppercase or lowercase versions.

但是,您可能希望将整个内容指定为不区分大小写,并且仅以大写或小写版本列出协议。

The protocol-less version doesn't care what the names of the protocols are. It just finds them all the same, by matching everything up to, but excluding a string followed by a colon or a semi-colon.

无协议版本不关心协议的名称是什么。它只是通过匹配所有内容来找到它们,但不包括后跟冒号或分号的字符串。

回答by Dennis C

Split by the following regex pattern

按以下正则表达式模式拆分

string[] items = System.Text.RegularExpressions.Split(text, ";(?=\w+:)");

EDIT: better one can accept more special chars in the protocol name.

编辑:更好的是可以在协议名称中接受更多特殊字符。

string[] items = System.Text.RegularExpressions.Split(text, ";(?=[^;:]+:)");

回答by mangokun

here is another possible solution.

这是另一种可能的解决方案。

string[] bits = text.Replace(";smtp", "|smtp").Replace(";SMTP", "|SMTP").Replace(";X400", "|X400").Split(new char[] { '|' });

string[] bits = text.Replace(";smtp", "|smtp").Replace(";SMTP", "|SMTP").Replace(";X400", "|X400").Split(new char[] { '|' });

bits[0], bits[1], and bits[2] will then contains the three parts in the order from your original string.

bits[0]、bits[1] 和 bits[2] 将按照原始字符串的顺序包含三个部分。