在 C#/.NET 中强制 CRLF 的快速方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/841396/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:45:38  来源:igfitidea点击:

What is a quick way to force CRLF in C# / .NET?

c#.netstringnewline

提问by Neil C. Obremski

How would you normalize all new-line sequences in a string to one type?

您如何将字符串中的所有换行序列标准化为一种类型?

I'm looking to make them all CRLF for the purpose of email (MIME documents). Ideally this would be wrapped in a static method, executing very quickly, and not using regular expressions (since the variances of line breaks, carriage returns, etc. are limited). Perhaps there's even a BCL method I've overlooked?

为了电子邮件(MIME 文档),我希望将它们全部设为 CRLF。理想情况下,这将被包装在一个静态方法中,执行得非常快,并且不使用正则表达式(因为换行符、回车符等的差异是有限的)。也许我忽略了一种 BCL 方法?

ASSUMPTION: After giving this a bit more thought, I think it's a safe assumption to say that CR's are either stand-alone or part of the CRLF sequence. That is, if you see CRLF then you know all CR's can be removed. Otherwise it's difficult to tell how many lines should come out of something like "\r\n\n\r".

假设:在再考虑一下之后,我认为可以安全地假设 CR 要么是独立的,要么是 CRLF 序列的一部分。也就是说,如果您看到 CRLF,那么您就知道可以删除所有 CR。否则很难判断像“\r\n\n\r”这样的东西应该有多少行。

采纳答案by Daniel Brückner

input.Replace("\r\n", "\n").Replace("\r", "\n").Replace("\n", "\r\n")

This will work if the input contains only one type of line breaks - either CR, or LF, or CR+LF.

如果输入仅包含一种类型的换行符 - CR、LF 或 CR+LF,这将起作用。

回答by Nathan

string nonNormalized = "\r\n\n\r";

string normalized = nonNormalized.Replace("\r", "\n").Replace("\n", "\r\n");

回答by Jon Skeet

It depends on exactlywhat the requirements are. In particular, how do you want to handle "\r" on its own? Should that count as a line break or not? As an example, how should "a\n\rb" be treated? Is that one very odd line break, one "\n" break and then a rogue "\r", or two separate linebreaks? If "\r" and "\n" can both be linebreaks on their own, why should "\r\n" not be treated as two linebreaks?

完全取决于要求是什么。特别是,您想如何单独处理“\r”?这应该算作换行符吗?例如,“a\n\rb”应该如何处理?这是一个非常奇怪的换行符,一个“\n”换行符,然后是一个流氓“\r”,还是两个单独的换行符?如果 "\r" 和 "\n" 都可以单独作为换行符,为什么不应该将 "\r\n" 视为两个换行符?

Here's some code which I suspect is reasonablyefficient.

这是我怀疑相当有效的一些代码。

using System;
using System.Text;

class LineBreaks
{    
    static void Main()
    {
        Test("a\nb");
        Test("a\nb\r\nc");
        Test("a\r\nb\r\nc");
        Test("a\rb\nc");
        Test("a\r");
        Test("a\n");
        Test("a\r\n");
    }

    static void Test(string input)
    {
        string normalized = NormalizeLineBreaks(input);
        string debug = normalized.Replace("\r", "\r")
                                 .Replace("\n", "\n");
        Console.WriteLine(debug);
    }

    static string NormalizeLineBreaks(string input)
    {
        // Allow 10% as a rough guess of how much the string may grow.
        // If we're wrong we'll either waste space or have extra copies -
        // it will still work
        StringBuilder builder = new StringBuilder((int) (input.Length * 1.1));

        bool lastWasCR = false;

        foreach (char c in input)
        {
            if (lastWasCR)
            {
                lastWasCR = false;
                if (c == '\n')
                {
                    continue; // Already written \r\n
                }
            }
            switch (c)
            {
                case '\r':
                    builder.Append("\r\n");
                    lastWasCR = true;
                    break;
                case '\n':
                    builder.Append("\r\n");
                    break;
                default:
                    builder.Append(c);
                    break;
            }
        }
        return builder.ToString();
    }
}

回答by Zotta

Simple variant:

简单变体:

Regex.Replace(input, @"\r\n|\r|\n", "\r\n")

For better performance:

为获得更好的性能:

static Regex newline_pattern = new Regex(@"\r\n|\r|\n", RegexOptions.Compiled);
[...]
    newline_pattern.Replace(input, "\r\n");

回答by Roberto B

This is a quick way to do that, I mean.

这是一个快速的方法,我的意思是。

It does not use an expensive regex function. It also does not use multiple replacement functions that each individually did loop over the data with several checks, allocations, etc.

它不使用昂贵的正则表达式函数。它还没有使用多个替换函数,每个替换函数都通过多次检查、分配等单独循环数据。

So the search is done directly in one forloop. For the number of times that the capacity of the result array has to be increased, a loop is also used within the Array.Copyfunction. That are all the loops. In some cases, a larger page size might be more efficient.

所以搜索是直接在一个for循环中完成的。对于必须增加结果数组容量的次数,Array.Copy函数内还使用了循环。这就是所有的循环。在某些情况下,较大的页面大小可能更有效。

public static string NormalizeNewLine(this string val)
{
    if (string.IsNullOrEmpty(val))
        return val;

    const int page = 6;
    int a = page;
    int j = 0;
    int len = val.Length;
    char[] res = new char[len];

    for (int i = 0; i < len; i++)
    {
        char ch = val[i];

        if (ch == '\r')
        {
            int ni = i + 1;
            if (ni < len && val[ni] == '\n')
            {
                res[j++] = '\r';
                res[j++] = '\n';
                i++;
            }
            else
            {
                if (a == page) // Ensure capacity
                {
                    char[] nres = new char[res.Length + page];
                    Array.Copy(res, 0, nres, 0, res.Length);
                    res = nres;
                    a = 0;
                }

                res[j++] = '\r';
                res[j++] = '\n';
                a++;
            }
        }
        else if (ch == '\n')
        {
            int ni = i + 1;
            if (ni < len && val[ni] == '\r')
            {
                res[j++] = '\r';
                res[j++] = '\n';
                i++;
            }
            else
            {
                if (a == page) // Ensure capacity
                {
                    char[] nres = new char[res.Length + page];
                    Array.Copy(res, 0, nres, 0, res.Length);
                    res = nres;
                    a = 0;
                }

                res[j++] = '\r';
                res[j++] = '\n';
                a++;
            }
        }
        else
        {
            res[j++] = ch;
        }
    }

    return new string(res, 0, j);
}

I now that '\n\r' is not actually used on basic platforms. But who would use two types of linebreaks in succession to indicate two linebreaks?

我现在认为 '\n\r' 实际上并没有在基本平台上使用。但是谁会连续使用两种类型的换行符来表示两个换行符呢?

If you want to know that, then you need to take a look before to know if the \n and \r both are used separately in the same document.

如果你想知道这一点,那么你需要先看看 \n 和 \r 是否在同一个文档中单独使用。