.net String.Join 与 StringBuilder:哪个更快?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/585860/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 12:12:40  来源:igfitidea点击:

String.Join vs. StringBuilder: which is faster?

.netperformancestringstringbuilder

提问by Hosam Aly

In a previous questionabout formatting a double[][]to CSV format, it was suggestedthat using StringBuilderwould be faster than String.Join. Is this true?

先前的问题有关格式化double[][]为CSV格式,有人建议,使用StringBuilder会比快String.Join。这是真的?

回答by Jon Skeet

Short answer: it depends.

简短的回答:这取决于。

Long answer: if you already have an array of strings to concatenate together (with a delimiter), String.Joinis the fastest way of doing it.

长答案:如果您已经有一个字符串数组要连接在一起(使用分隔符),这String.Join是最快的方法。

String.Joincan look through all of the strings to work out the exact length it needs, then go again and copy all the data. This means there will be noextra copying involved. The onlydownside is that it has to go through the strings twice, which means potentially blowing the memory cache more times than necessary.

String.Join可以查看所有字符串以计算出所需的确切长度,然后再次复制所有数据。这意味着不会涉及额外的复制。该唯一的缺点是,它要经过串两次,这可能手段吹内存缓存更多的时间比必要的。

If you don'thave the strings as an array beforehand, it's probablyfaster to use StringBuilder- but there will be situations where it isn't. If using a StringBuildermeans doing lots and lots of copies, then building an array and then calling String.Joinmay well be faster.

如果您事先没有将字符串作为数组,那么使用它可能会更快StringBuilder- 但在某些情况下不会。如果使用一种StringBuilder方法进行大量复制,那么构建一个数组然后调用String.Join可能会更快。

EDIT: This is in terms of a single call to String.Joinvs a bunch of calls to StringBuilder.Append. In the original question, we had two different levels of String.Joincalls, so each of the nested calls would have created an intermediate string. In other words, it's even more complex and harder to guess about. I would be surprised to see either way "win" significantly (in complexity terms) with typical data.

编辑:这是对单个调用String.Join与对StringBuilder.Append. 在最初的问题中,我们有两个不同级别的String.Join调用,因此每个嵌套调用都会创建一个中间字符串。换句话说,它更复杂,更难猜测。我会惊讶地看到任何一种方式都以典型数据显着(在复杂性方面)“获胜”。

EDIT: When I'm at home, I'll write up a benchmark which is as painful as possibly for StringBuilder. Basically if you have an array where each element is about twice the size of the previous one, and you get it just right, you should be able to force a copy for every append (of elements, not of the delimiter, although that needs to be taken into account too). At that point it's nearly as bad as simple string concatenation - but String.Joinwill have no problems.

编辑:当我在家时,我会写一个基准测试,它对StringBuilder. 基本上,如果您有一个数组,其中每个元素的大小大约是前一个元素的两倍,并且您做得恰到好处,您应该能够为每个追加(元素,而不是分隔符,尽管这需要也要考虑)。那时它几乎和简单的字符串连接一样糟糕 - 但String.Join不会有任何问题。

回答by Marc Gravell

Here's my test rig, using int[][]for simplicity; results first:

这是我的测试设备,int[][]为简单起见;结果第一:

Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000

(update for doubleresults:)

double结果更新:)

Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000

(update re 2048 * 64 * 150)

(更新为 2048 * 64 * 150)

Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600

and with OptimizeForTesting enabled:

并启用 OptimizeForTesting:

Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600

So faster, but not massively so; rig (run at console, in release mode, etc):

如此之快,但并非如此之快;装备(在控制台运行,在发布模式下等):

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Collect()
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
        }
        static void Main(string[] args)
        {
            const int ROWS = 500, COLS = 20, LOOPS = 2000;
            int[][] data = new int[ROWS][];
            Random rand = new Random(123456);
            for (int row = 0; row < ROWS; row++)
            {
                int[] cells = new int[COLS];
                for (int col = 0; col < COLS; col++)
                {
                    cells[col] = rand.Next();
                }
                data[row] = cells;
            }
            Collect();
            int chksum = 0;
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += Join(data).Length;
            }
            watch.Stop();
            Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Collect();
            chksum = 0;
            watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += OneBuilder(data).Length;
            }
            watch.Stop();
            Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Console.WriteLine("done");
            Console.ReadLine();
        }
        public static string Join(int[][] array)
        {
            return String.Join(Environment.NewLine,
                    Array.ConvertAll(array,
                      row => String.Join(",",
                        Array.ConvertAll(row, x => x.ToString()))));
        }
        public static string OneBuilder(IEnumerable<int[]> source)
        {
            StringBuilder sb = new StringBuilder();
            bool firstRow = true;
            foreach (var row in source)
            {
                if (firstRow)
                {
                    firstRow = false;
                }
                else
                {
                    sb.AppendLine();
                }
                if (row.Length > 0)
                {
                    sb.Append(row[0]);
                    for (int i = 1; i < row.Length; i++)
                    {
                        sb.Append(',').Append(row[i]);
                    }
                }
            }
            return sb.ToString();
        }
    }
}

回答by Hosam Aly

I don't think so. Looking through Reflector, the implementation of String.Joinlooks very optimized. It also has the added benefit of knowing the total size of the string to be created in advance, so it doesn't need any reallocation.

我不这么认为。通过反射器String.Join查看,实现看起来非常优化。它还具有额外的好处,即提前知道要创建的字符串的总大小,因此不需要任何重新分配。

I have created two test methods to compare them:

我创建了两种测试方法来比较它们:

public static string TestStringJoin(double[][] array)
{
    return String.Join(Environment.NewLine,
        Array.ConvertAll(array,
            row => String.Join(",",
                       Array.ConvertAll(row, x => x.ToString()))));
}

public static string TestStringBuilder(double[][] source)
{
    // based on Marc Gravell's code

    StringBuilder sb = new StringBuilder();
    foreach (var row in source)
    {
        if (row.Length > 0)
        {
            sb.Append(row[0]);
            for (int i = 1; i < row.Length; i++)
            {
                sb.Append(',').Append(row[i]);
            }
        }
    }
    return sb.ToString();
}

I ran each method 50 times, passing in an array of size [2048][64]. I did this for two arrays; one filled with zeros and another filled with random values. I got the following results on my machine (P4 3.0 GHz, single-core, no HT, running Release mode from CMD):

我运行每个方法 50 次,传入一个 size 数组[2048][64]。我为两个数组做了这个;一个填充零,另一个填充随机值。我在我的机器上得到以下结果(P4 3.0 GHz,单核,无 HT,从 CMD 运行 Release 模式):

// with zeros:
TestStringJoin    took 00:00:02.2755280
TestStringBuilder took 00:00:02.3536041

// with random values:
TestStringJoin    took 00:00:05.6412147
TestStringBuilder took 00:00:05.8394650

Increasing the size of the array to [2048][512], while decreasing the number of iterations to 10 got me the following results:

将数组的大小增加到[2048][512],同时将迭代次数减少到 10 得到以下结果:

// with zeros:
TestStringJoin    took 00:00:03.7146628
TestStringBuilder took 00:00:03.8886978

// with random values:
TestStringJoin    took 00:00:09.4991765
TestStringBuilder took 00:00:09.3033365

The results are repeatable (almost; with small fluctuations caused by different random values). Apparently String.Joinis a little faster most of the time (although by a very small margin).

结果是可重复的(几乎;由不同的随机值引起的小波动)。显然String.Join大部分时间都快一点(尽管幅度很小)。

This is the code I used for testing:

这是我用于测试的代码:

const int Iterations = 50;
const int Rows = 2048;
const int Cols = 64; // 512

static void Main()
{
    OptimizeForTesting(); // set process priority to RealTime

    // test 1: zeros
    double[][] array = new double[Rows][];
    for (int i = 0; i < array.Length; ++i)
        array[i] = new double[Cols];

    CompareMethods(array);

    // test 2: random values
    Random random = new Random();
    double[] template = new double[Cols];
    for (int i = 0; i < template.Length; ++i)
        template[i] = random.NextDouble();

    for (int i = 0; i < array.Length; ++i)
        array[i] = template;

    CompareMethods(array);
}

static void CompareMethods(double[][] array)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < Iterations; ++i)
        TestStringJoin(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringJoin    took " + stopwatch.Elapsed);

    stopwatch.Reset(); stopwatch.Start();
    for (int i = 0; i < Iterations; ++i)
        TestStringBuilder(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed);

}

static void OptimizeForTesting()
{
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process currentProcess = Process.GetCurrentProcess();
    currentProcess.PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1) {
        // use last core only
        currentProcess.ProcessorAffinity
            = new IntPtr(1 << (Environment.ProcessorCount - 1));
    }
}

回答by tvanfosson

Unless the 1% difference turns into something significant in terms of the time the entire program takes to run, this looks like micro-optimization. I'd write the code that's the most readable/understandable and not worry about the 1% performance difference.

除非 1% 的差异在整个程序运行的时间方面变得很重要,否则这看起来像是微优化。我会编写最易读/最易理解的代码,而不用担心 1% 的性能差异。

回答by Adam Neal

Atwood had a post kind of related to this about a month ago:

大约一个月前,阿特伍德有一篇与此相关的帖子:

http://www.codinghorror.com/blog/archives/001218.html

http://www.codinghorror.com/blog/archives/001218.html

回答by jalf

yes. If you do more than a couple of joins, it will be a lotfaster.

是的。如果您执行多个连接,速度会快很多

When you do a string.join, the runtime has to:

当您执行 string.join 时,运行时必须:

  1. Allocate memory for the resulting string
  2. copy the contents of the first string to the beginning of the output string
  3. copy the contents of the second string to the end of the output string.
  1. 为结果字符串分配内存
  2. 将第一个字符串的内容复制到输出字符串的开头
  3. 将第二个字符串的内容复制到输出字符串的末尾。

If you do two joins, it has to copy the data twice, and so on.

如果执行两次连接,则必须复制数据两次,依此类推。

StringBuilder allocates one buffer with space to spare, so data can be appended without having to copy the original string. As there is space left over in the buffer, the appended string can be written into the buffer directly. Then it just has to copy the entire string once, at the end.

StringBuilder 分配了一个有空闲空间的缓冲区,因此无需复制原始字符串即可追加数据。由于缓冲区有剩余空间,可以将附加的字符串直接写入缓冲区。然后它只需要在最后复制整个字符串一次。