C# 扩展方法 - 也接受转义字符的字符串拆分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/634777/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C# Extension Method - String Split that also accepts an Escape Character
提问by BuddyJoe
I'd like to write an extension method for the .NET String class. I'd like it to be a special varation on the Split method - one that takes an escape character to prevent splitting the string when a escape character is used before the separator.
我想为 .NET String 类编写一个扩展方法。我希望它是 Split 方法的一个特殊变体 - 当在分隔符之前使用转义字符时,它采用转义字符来防止拆分字符串。
What's the best way to write this? I'm curious about the best non-regex way to approach it.
Something with a signature like...
写这个最好的方法是什么?我很好奇处理它的最佳非正则表达式方法。
带有签名的东西,例如...
public static string[] Split(this string input, string separator, char escapeCharacter)
{
// ...
}
UPDATE:Because it came up in one the comments, the escaping...
更新:因为它出现在评论中,转义...
In C# when escaping non-special characters you get the error - CS1009: Unrecognized escape sequence.
在 C# 中,当转义非特殊字符时,您会收到错误 - CS1009:无法识别的转义序列。
In IE JScript the escape characters are throw out. Unless you try \u and then you get a "Expected hexadecimal digit" error. I tested Firefox and it has the same behavior.
在 IE JScript 中,转义字符被丢弃。除非你尝试 \u 然后你得到一个“预期的十六进制数字”错误。我测试了 Firefox,它具有相同的行为。
I'd like this method to be pretty forgiving and follow the JavaScript model. If you escape on a non-separator it should just "kindly" remove the escape character.
我希望这种方法非常宽容并遵循 JavaScript 模型。如果您在非分隔符上转义,它应该只是“友好地”删除转义字符。
采纳答案by Jon Skeet
How about:
怎么样:
public static IEnumerable<string> Split(this string input,
string separator,
char escapeCharacter)
{
int startOfSegment = 0;
int index = 0;
while (index < input.Length)
{
index = input.IndexOf(separator, index);
if (index > 0 && input[index-1] == escapeCharacter)
{
index += separator.Length;
continue;
}
if (index == -1)
{
break;
}
yield return input.Substring(startOfSegment, index-startOfSegment);
index += separator.Length;
startOfSegment = index;
}
yield return input.Substring(startOfSegment);
}
That seems to work (with a few quick test strings), but it doesn't remove the escape character - that will depend on your exact situation, I suspect.
这似乎有效(使用一些快速测试字符串),但它不会删除转义字符 - 我怀疑这取决于您的具体情况。
回答by James Curran
This will need to be cleaned up a bit, but this is essentially it....
这将需要清理一下,但基本上就是这样......
List<string> output = new List<string>();
for(int i=0; i<input.length; ++i)
{
if (input[i] == separator && (i==0 || input[i-1] != escapeChar))
{
output.Add(input.substring(j, i-j);
j=i;
}
}
return output.ToArray();
回答by RvdK
The signature is incorrect, you need to return a string array
签名不正确,需要返回一个字符串数组
WARNIG NEVER USED EXTENSIONs, so forgive me about some errors ;)
警告从未使用过扩展,所以请原谅我的一些错误;)
public static List<String> Split(this string input, string separator, char escapeCharacter)
{
String word = "";
List<String> result = new List<string>();
for (int i = 0; i < input.Length; i++)
{
//can also use switch
if (input[i] == escapeCharacter)
{
break;
}
else if (input[i] == separator)
{
result.Add(word);
word = "";
}
else
{
word += input[i];
}
}
return result;
}
回答by tvanfosson
My first observation is that the separator ought to be a char not a string since escaping a string using a single character may be hard -- how much of the following string does the escape character cover? Other than that, @James Curran's answer is pretty much how I would handle it - though, as he says it needs some clean up. Initializing j to 0 in the loop initializer, for instance. Figuring out how to handle null inputs, etc.
我的第一个观察是分隔符应该是字符而不是字符串,因为使用单个字符转义字符串可能很困难——转义字符覆盖了以下字符串的多少?除此之外,@James Curran 的回答几乎就是我的处理方式——不过,正如他所说,它需要一些清理。例如,在循环初始化程序中将 j 初始化为 0。弄清楚如何处理空输入等。
You probably want to also support StringSplitOptions and specify whether empty string should be returned in the collection.
您可能还希望支持 StringSplitOptions 并指定是否应在集合中返回空字符串。
回答by si618
Personally I'd cheat and have a peek at string.Split using reflector... InternalSplitOmitEmptyEntrieslooks useful ;-)
就我个人而言,我会作弊并偷看 string.Split 使用反射器......InternalSplitOmitEmptyEntries看起来很有用;-)
回答by BFree
public static string[] Split(this string input, string separator, char escapeCharacter)
{
Guid g = Guid.NewGuid();
input = input.Replace(escapeCharacter.ToString() + separator, g.ToString());
string[] result = input.Split(new string []{separator}, StringSplitOptions.None);
for (int i = 0; i < result.Length; i++)
{
result[i] = result[i].Replace(g.ToString(), escapeCharacter.ToString() + separator);
}
return result;
}
Probably not the best way of doing it, but it's another alternative. Basically, everywhere the sequence of escape+seperator is found, replace it with a GUID (you can use any other random crap in here, doesn't matter). Then use the built in split function. Then replace the guid in each element of the array with the escape+seperator.
可能不是最好的方法,但它是另一种选择。基本上,无论在何处找到转义符+分隔符的序列,都将其替换为 GUID(您可以在此处使用任何其他随机废话,无所谓)。然后使用内置的拆分功能。然后将数组中每个元素中的 guid 替换为转义符+分隔符。
回答by Chaowlert Chaisrichalermpol
Here is solution if you want to remove the escape character.
如果要删除转义字符,这是解决方案。
public static IEnumerable<string> Split(this string input,
string separator,
char escapeCharacter) {
string[] splitted = input.Split(new[] { separator });
StringBuilder sb = null;
foreach (string subString in splitted) {
if (subString.EndsWith(escapeCharacter.ToString())) {
if (sb == null)
sb = new StringBuilder();
sb.Append(subString, 0, subString.Length - 1);
} else {
if (sb == null)
yield return subString;
else {
sb.Append(subString);
yield return sb.ToString();
sb = null;
}
}
}
if (sb != null)
yield return sb.ToString();
}
回答by Chaowlert Chaisrichalermpol
public string RemoveMultipleDelimiters(string sSingleLine)
{
string sMultipleDelimitersLine = "";
string sMultipleDelimitersLine1 = "";
int iDelimeterPosition = -1;
iDelimeterPosition = sSingleLine.IndexOf('>');
iDelimeterPosition = sSingleLine.IndexOf('>', iDelimeterPosition + 1);
if (iDelimeterPosition > -1)
{
sMultipleDelimitersLine = sSingleLine.Substring(0, iDelimeterPosition - 1);
sMultipleDelimitersLine1 = sSingleLine.Substring(sSingleLine.IndexOf('>', iDelimeterPosition) - 1);
sMultipleDelimitersLine1 = sMultipleDelimitersLine1.Replace('>', '*');
sSingleLine = sMultipleDelimitersLine + sMultipleDelimitersLine1;
}
return sSingleLine;
}
回答by Biscuits
You can try something like this. Although, I would suggest implementing with unsafe code for performance critical tasks.
你可以尝试这样的事情。虽然,我建议使用不安全的代码来执行性能关键任务。
public static class StringExtensions
{
public static string[] Split(this string text, char escapeChar, params char[] seperator)
{
return Split(text, escapeChar, seperator, int.MaxValue, StringSplitOptions.None);
}
public static string[] Split(this string text, char escapeChar, char[] seperator, int count)
{
return Split(text, escapeChar, seperator, count, StringSplitOptions.None);
}
public static string[] Split(this string text, char escapeChar, char[] seperator, StringSplitOptions options)
{
return Split(text, escapeChar, seperator, int.MaxValue, options);
}
public static string[] Split(this string text, char escapeChar, char[] seperator, int count, StringSplitOptions options)
{
if (text == null)
{
throw new ArgumentNullException("text");
}
if (text.Length == 0)
{
return new string[0];
}
var segments = new List<string>();
bool previousCharIsEscape = false;
var segment = new StringBuilder();
for (int i = 0; i < text.Length; i++)
{
if (previousCharIsEscape)
{
previousCharIsEscape = false;
if (seperator.Contains(text[i]))
{
// Drop the escape character when it escapes a seperator character.
segment.Append(text[i]);
continue;
}
// Retain the escape character when it escapes any other character.
segment.Append(escapeChar);
segment.Append(text[i]);
continue;
}
if (text[i] == escapeChar)
{
previousCharIsEscape = true;
continue;
}
if (seperator.Contains(text[i]))
{
if (options != StringSplitOptions.RemoveEmptyEntries || segment.Length != 0)
{
// Only add empty segments when options allow.
segments.Add(segment.ToString());
}
segment = new StringBuilder();
continue;
}
segment.Append(text[i]);
}
if (options != StringSplitOptions.RemoveEmptyEntries || segment.Length != 0)
{
// Only add empty segments when options allow.
segments.Add(segment.ToString());
}
return segments.ToArray();
}
}
回答by Stefan Steinegger
I had this problem as well and didn't find a solution. So I wrote such a method myself:
我也遇到了这个问题,没有找到解决办法。于是我自己写了一个这样的方法:
public static IEnumerable<string> Split(
this string text,
char separator,
char escapeCharacter)
{
var builder = new StringBuilder(text.Length);
bool escaped = false;
foreach (var ch in text)
{
if (separator == ch && !escaped)
{
yield return builder.ToString();
builder.Clear();
}
else
{
// separator is removed, escape characters are kept
builder.Append(ch);
}
// set escaped for next cycle,
// or reset unless escape character is escaped.
escaped = escapeCharacter == ch && !escaped;
}
yield return builder.ToString();
}
It goes in combination with Escape and Unescape, which escapes the separator and escape character and removes escape characters again:
它与 Escape 和 Unescape 结合使用,后者转义分隔符和转义字符并再次删除转义字符:
public static string Escape(this string text, string controlChars, char escapeCharacter)
{
var builder = new StringBuilder(text.Length + 3);
foreach (var ch in text)
{
if (controlChars.Contains(ch))
{
builder.Append(escapeCharacter);
}
builder.Append(ch);
}
return builder.ToString();
}
public static string Unescape(string text, char escapeCharacter)
{
var builder = new StringBuilder(text.Length);
bool escaped = false;
foreach (var ch in text)
{
escaped = escapeCharacter == ch && !escaped;
if (!escaped)
{
builder.Append(ch);
}
}
return builder.ToString();
}
Examples for escape / unescape
转义/取消转义的示例
separator = ','
escapeCharacter = '\'
//controlCharacters is always separator + escapeCharacter
@"AB,CD\EF\," <=> @"AB\,CD\EF\\,"
Split:
分裂:
@"AB,CD\,EF\,GH\\,IJ" => [@"AB", @"CD\,EF\", @"GH\\,IJ"]
So to use it, Escape before Join, and Unescape after Split.
所以要使用它,加入前转义,拆分后取消转义。

