在 C# 中规范化换行符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/140926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Normalize newlines in C#
提问by ctacke
I have a data stream that may contain \r, \n, \r\n, \n\r or any combination of them. Is there a simple way to normalize the data to make all of them simply become \r\n pairs to make display more consistent?
我有一个数据流,可能包含 \r、\n、\r\n、\n\r 或它们的任意组合。有没有一种简单的方法来规范化数据,使它们全部变成 \r\n 对,使显示更加一致?
So something that would yield this kind of translation table:
所以会产生这种转换表的东西:
\r --> \r\n
\n --> \r\n
\n\n --> \r\n\r\n
\n\r --> \r\n
\r\n --> \r\n
\r\n\n --> \r\n\r\n
采纳答案by Derek Park
I believe this will do what you need:
我相信这会做你需要的:
using System.Text.RegularExpressions;
// ...
string normalized = Regex.Replace(originalString, @"\r\n|\n\r|\n|\r", "\r\n");
I'm not 100% sure on the exact syntax, and I don't have a .Net compiler handy to check. I wrote it in perl, and converted it into (hopefully correct) C#. The only real trick is to match "\r\n" and "\n\r" first.
我不是 100% 确定确切的语法,而且我没有方便检查的 .Net 编译器。我是用 perl 写的,然后把它转换成(希望是正确的)C#。唯一真正的技巧是首先匹配 "\r\n" 和 "\n\r"。
To apply it to an entire stream, just run in on chunks of input. (You could do this with a stream wrapper if you want.)
要将其应用于整个流,只需在输入块上运行即可。(如果需要,您可以使用流包装器执行此操作。)
The original perl:
原始的perl:
$str =~ s/\r\n|\n\r|\n|\r/\r\n/g;
The test results:
测试结果:
[bash$] ./test.pl
\r -> \r\n
\n -> \r\n
\n\n -> \r\n\r\n
\n\r -> \r\n
\r\n -> \r\n
\r\n\n -> \r\n\r\n
Update: Now converts \n\r to \r\n, though I wouldn't call that normalization.
更新:现在将 \n\r 转换为 \r\n,尽管我不会称之为规范化。
回答by Quintin Robinson
A Regex would help.. could do something roughly like this..
正则表达式会有所帮助.. 可以大致做这样的事情..
(\r\n|\n\n|\n\r|\r|\n) replace with \r\n
(\r\n|\n\n|\n\r|\r|\n) 替换为 \r\n
This regex produced these results from the table posted (just testing left side) so a replace should normalize.
这个正则表达式从张贴的表中产生了这些结果(仅测试左侧),因此替换应该正常化。
\r => \r
\n => \n
\n\n => \n\n
\n\r => \n\r
\r\n => \r\n
\r\n => \r\n
\n => \n
回答by VVS
You're thinking too complicated. Ignore every \r and turn every \n into an \r\n.
你想的太复杂了。忽略每一个 \r 并将每一个 \n 变成一个 \r\n。
In Pseudo-C#:
在伪 C# 中:
char[] chunk = new char[X];
StringBuffer output = new StringBuffer();
buffer.Read(chunk);
foreach (char c in chunk)
{
switch (c)
{
case '\r' : break; // ignore
case '\n' : output.Append("\r\n");
default : output.Append(c);
}
}
EDIT: \r alone is no line-terminator so I doubt you really want to expand \r to \r\n.
编辑: \r 本身不是行终止符,所以我怀疑您是否真的想将 \r 扩展为 \r\n。
回答by Joe
I'm with Jamie Zawinski on RegEx:
我在 RegEx 上和 Jamie Zawinski 在一起:
"Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems"
“有些人在遇到问题时会想‘我知道,我会使用正则表达式。’现在他们有两个问题”
For those of us who prefer readability:
对于我们这些喜欢可读性的人:
Step 1
Replace \r\n by \n
Replace \n\r by \n (if you really want this, some posters seem to think not)
Replace \r by \n
Step 2 Replace \n by Environment.NewLine or \r\n or whatever.
第1步
用 \n 替换 \r\n
用 \n 替换 \n\r (如果你真的想要这个,有些海报似乎不这么认为)
用 \n 替换 \r
步骤 2 将 \n 替换为 Environment.NewLine 或 \r\n 或其他任何内容。
回答by leppie
I agree Regex is the answer, however, everyone else fails to mention Unicode line separators. Those (and their variations with \n) should be included.
我同意正则表达式是答案,但是,其他人都没有提到 Unicode 行分隔符。那些(以及它们与 \n 的变体)应该包括在内。
回答by Phil
Normalise breaks, so that they are all \r\n
使休息正常化,以便它们都是 \r\n
var normalisedString =
sourceString
.Replace("\r\n", "\n")
.Replace("\n\r", "\n")
.Replace("\r", "\n")
.Replace("\n", "\r\n");
回答by Roberto B
This is the answer to the question. The given solution replaces a string by the given translation table. It does not use an expensive regex function. It also does not use multiple replacement functions that each individually did loop over the data with several checks etc.
这就是问题的答案。给定的解决方案用给定的转换表替换字符串。它不使用昂贵的正则表达式函数。它也不使用多个替换函数,每个替换函数都通过多次检查等单独循环数据。
So the search is done directly in 1 for loop. For the number of times that the capacity of the result array has to be increased, a loop is also used within the Array.Copy function. That are all the loops. In some cases, a larger page size might be more efficient.
所以搜索是直接在 1 for 循环中完成的。对于必须增加结果数组容量的次数,Array.Copy 函数中还使用了一个循环。这就是所有的循环。在某些情况下,较大的页面大小可能更有效。
public static string NormalizeNewLine(this string val)
{
if (string.IsNullOrEmpty(val))
return val;
const int page = 6;
int a = page;
int j = 0;
int len = val.Length;
char[] res = new char[len];
for (int i = 0; i < len; i++)
{
char ch = val[i];
if (ch == '\r')
{
int ni = i + 1;
if (ni < len && val[ni] == '\n')
{
res[j++] = '\r';
res[j++] = '\n';
i++;
}
else
{
if (a == page) //ensure capacity
{
char[] nres = new char[res.Length + page];
Array.Copy(res, 0, nres, 0, res.Length);
res = nres;
a = 0;
}
res[j++] = '\r';
res[j++] = '\n';
a++;
}
}
else if (ch == '\n')
{
int ni = i + 1;
if (ni < len && val[ni] == '\r')
{
res[j++] = '\r';
res[j++] = '\n';
i++;
}
else
{
if (a == page) //ensure capacity
{
char[] nres = new char[res.Length + page];
Array.Copy(res, 0, nres, 0, res.Length);
res = nres;
a = 0;
}
res[j++] = '\r';
res[j++] = '\n';
a++;
}
}
else
{
res[j++] = ch;
}
}
return new string(res, 0, j);
}
The translation table really appeals to me even if '\n\r' is not actually used on basic platforms. Who would use two types of linebreaks for indicate 2 linebreaks? If you want to know that, than you need to take a look before to know if the \n and \r both are used seperatly in the same document.
即使 '\n\r' 没有在基本平台上实际使用,转换表对我来说真的很有吸引力。谁会使用两种类型的换行符来表示 2 个换行符?如果您想知道这一点,那么您需要先了解一下 \n 和 \r 是否在同一文档中单独使用。