C++ 字符串中最优化的连接方式

Question

提问by Bhupesh Pant

We always came across many situation on daily basis wherein we have to do tedious and very many string operations in our code. We all know that string manipulations are expensive operations. I would like to know which is the least expensive among the available versions.

我们每天都会遇到许多情况，其中我们必须在代码中进行繁琐且非常多的字符串操作。我们都知道字符串操作是昂贵的操作。我想知道可用版本中哪个最便宜。

The most common operations is concatenation(This is something that we can control to some extent). What is the best way to concatenate std::strings in C++ and various workarounds to speed up concatenation?

最常见的操作是串联（这是我们可以在一定程度上控制的）。在 C++ 中连接 std::strings 的最佳方法是什么，以及加速连接的各种解决方法？

I mean,

我的意思是，

std::string l_czTempStr;

1).l_czTempStr = "Test data1" + "Test data2" + "Test data3";

2). l_czTempStr =  "Test data1"; 
    l_czTempStr += "Test data2";
    l_czTempStr += "Test data3";

3). using << operator

4). using append()

Also, do we get any advantage of using CString over std::string?

另外，与 std::string 相比，使用 CString 有什么优势吗？

Answer 1

回答by Jesse Good

Here is a small test suite:

这是一个小型测试套件：

#include <iostream>
#include <string>
#include <chrono>
#include <sstream>

int main ()
{
    typedef std::chrono::high_resolution_clock clock;
    typedef std::chrono::duration<float, std::milli> mil;
    std::string l_czTempStr;
    std::string s1="Test data1";
    auto t0 = clock::now();
    #if VER==1
    for (int i = 0; i < 100000; ++i)
    {
        l_czTempStr = s1 + "Test data2" + "Test data3";
    }
    #elif VER==2
    for (int i = 0; i < 100000; ++i)
    {
        l_czTempStr =  "Test data1"; 
        l_czTempStr += "Test data2";
        l_czTempStr += "Test data3";
    }
    #elif VER==3
    for (int i = 0; i < 100000; ++i)
    {
        l_czTempStr =  "Test data1"; 
        l_czTempStr.append("Test data2");
        l_czTempStr.append("Test data3");
    }
    #elif VER==4
    for (int i = 0; i < 100000; ++i)
    {
        std::ostringstream oss;
        oss << "Test data1";
        oss << "Test data2";
        oss << "Test data3";
        l_czTempStr = oss.str();
    }
    #endif
    auto t1 = clock::now();
    std::cout << l_czTempStr << '\n';
    std::cout << mil(t1-t0).count() << "ms\n";
}

On coliru:

关于大肠杆菌：

Compile with the following:

编译如下：

clang++ -std=c++11 -O3 -DVER=1 -Wall -pedantic -pthread main.cpp

21.6463ms

-DVER=2

6.61773ms

-DVER=3

6.7855ms

-DVER=4

102.015ms

It looks like 2), +=is the winner.

看起来2)，+=是赢家。

(Also compiling with and without -pthreadseems to affect the timings)

（同时编译和不编译-pthread似乎会影响时间）

Answer 2

回答by syam

In addition to other answers...

除了其他答案...

I made extensive benchmarks about this problem some time ago, and came to the conclusion that the most efficient solution (GCC 4.7 & 4.8 on Linux x86 / x64 / ARM) in alluse cases is first to reserve()the result string with enough space to hold all the concatenated strings, and then only append()them (or use operator +=(), that makes no difference).

前段时间我对这个问题做了广泛的基准测试，得出的结论是，所有用例中最有效的解决方案（Linux x86 / x64 / ARM 上的 GCC 4.7 & 4.8）首先reserve()是结果字符串有足够的空间容纳所有连接的字符串，然后只有append()它们（或使用operator +=()，没有区别）。

Unfortunately it seems I deleted that benchmark so you only have my word (but you can easily adapt Mats Petersson's benchmark to verify this by yourself, if my word isn't enough).

不幸的是，我似乎删除了该基准测试，因此您只有我的话（但如果我的话还不够，您可以轻松地调整 Mats Petersson 的基准测试来自己验证这一点）。

In a nutshell:

简而言之：

const string space = " ";
string result;
result.reserve(5 + space.size() + 5);
result += "hello";
result += space;
result += "world";

Depending on the exact use case (number, types and sizes of the concatenated strings), sometimes this method is by far the most efficient, and other times it is on par with other methods, but it is never worse.

根据具体的用例（连接字符串的数量、类型和大小），有时这种方法是迄今为止最有效的，有时它与其他方法相当，但永远不会更糟。

Problem is, this is really painful to compute the total required size in advance, especially when mixing string literals and std::string(the example above is clear enough on that matter, I believe). The maintainability of such code is absolutely horrible as soon as you modify one of the literals or add another string to be concatenated.

问题是，提前计算所需的总大小真的很痛苦，特别是在混合字符串文字和std::string（我相信上面的例子在这方面已经足够清楚了）。一旦您修改其中一个文字或添加另一个要连接的字符串，此类代码的可维护性就绝对可怕。

One approach would be to use sizeofto compute the size of the literals, but IMHO it creates as much mess than it solves, the maintainability is still terrible:

一种方法是使用sizeof来计算文字的大小，但恕我直言，它造成的混乱比它解决的要多，可维护性仍然很糟糕：

#define STR_HELLO "hello"
#define STR_WORLD "world"

const string space = " ";
string result;
result.reserve(sizeof(STR_HELLO)-1 + space.size() + sizeof(STR_WORLD)-1);
result += STR_HELLO;
result += space;
result += STR_WORLD;

A usable solution (C++11, variadic templates)

一个可用的解决方案（C++11，可变参数模板）

I finally settled for a set of variadic templates that efficiently take care of calculating the string sizes (eg. the size of string literals is determined at compile time), reserve()as needed, and then concatenate everything.

我最终确定了一组可变参数模板，这些模板可以根据需要有效地计算字符串大小（例如，字符串文字的大小在编译时确定），reserve()然后连接所有内容。

Here it is, hope this is useful:

在这里，希望这是有用的：

namespace detail {

  template<typename>
  struct string_size_impl;

  template<size_t N>
  struct string_size_impl<const char[N]> {
    static constexpr size_t size(const char (&) [N]) { return N - 1; }
  };

  template<size_t N>
  struct string_size_impl<char[N]> {
    static size_t size(char (&s) [N]) { return N ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<const char*> {
    static size_t size(const char* s) { return s ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<char*> {
    static size_t size(char* s) { return s ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<std::string> {
    static size_t size(const std::string& s) { return s.size(); }
  };

  template<typename String> size_t string_size(String&& s) {
    using noref_t = typename std::remove_reference<String>::type;
    using string_t = typename std::conditional<std::is_array<noref_t>::value,
                                              noref_t,
                                              typename std::remove_cv<noref_t>::type
                                              >::type;
    return string_size_impl<string_t>::size(s);
  }

  template<typename...>
  struct concatenate_impl;

  template<typename String>
  struct concatenate_impl<String> {
    static size_t size(String&& s) { return string_size(s); }
    static void concatenate(std::string& result, String&& s) { result += s; }
  };

  template<typename String, typename... Rest>
  struct concatenate_impl<String, Rest...> {
    static size_t size(String&& s, Rest&&... rest) {
      return string_size(s)
           + concatenate_impl<Rest...>::size(std::forward<Rest>(rest)...);
    }
    static void concatenate(std::string& result, String&& s, Rest&&... rest) {
      result += s;
      concatenate_impl<Rest...>::concatenate(result, std::forward<Rest>(rest)...);
    }
  };

} // namespace detail

template<typename... Strings>
std::string concatenate(Strings&&... strings) {
  std::string result;
  result.reserve(detail::concatenate_impl<Strings...>::size(std::forward<Strings>(strings)...));
  detail::concatenate_impl<Strings...>::concatenate(result, std::forward<Strings>(strings)...);
  return result;
}

The only interesting part, as far as the public interface is concerned, is the very last template<typename... Strings> std::string concatenate(Strings&&... strings)template. Usage is straightforward:

就公共接口而言，唯一有趣的部分是最后一个template<typename... Strings> std::string concatenate(Strings&&... strings)模板。用法很简单：

int main() {
  const string space = " ";
  std::string result = concatenate("hello", space, "world");
  std::cout << result << std::endl;
}

With optimizations turned on, any decent compiler should be able to expand the concatenatecall to the same code as my first example where I manually wrote everything. As far as GCC 4.7 & 4.8 are concerned, the generated code is pretty much identical as well as the performance.

启用优化后，任何体面的编译器都应该能够将concatenate调用扩展到与我手动编写所有内容的第一个示例相同的代码。就 GCC 4.7 和 4.8 而言，生成的代码和性能几乎相同。

Answer 3

回答by Mats Petersson

The WORST possible scenario is using plain old strcat(or sprintf), since strcattakes a C string, and that has to be "counted" to find the end. For long strings, that's a real performance sufferer. C++ style strings are much better, and the performance problems are likely to be with the memory allocation, rather than counting lengths. But then again, the string grows geometrically (doubles each time it needs to grow), so it's not that terrible.

最糟糕的可能情况是使用普通的旧strcat（或sprintf），因为strcat需要一个 C 字符串，并且必须“计数”才能找到结尾。对于长字符串，这是一个真正的性能问题。C++ 风格的字符串要好得多，性能问题很可能与内存分配有关，而不是计算长度。但是话又说回来，字符串以几何方式增长（每次需要增长时都会增加一倍），所以它并没有那么可怕。

I'd very much suspect that all of the above methods end up with the same, or at least very similar, performance. If anything, I'd expect that stringstreamis slower, because of the overhead in supporting formatting - but I also suspect it's marginal.

我非常怀疑上述所有方法最终都具有相同或至少非常相似的性能。如果有的话，我希望它stringstream会更慢，因为支持格式化的开销 - 但我也怀疑它是微不足道的。

As this sort of thing is "fun", I will get back with a benchmark...

由于这种事情很“有趣”，我会用一个基准回来......

Edit:

编辑：

Note that these result apply to MY machine, running x86-64 Linux, compiled with g++ 4.6.3. Other OS's, compilers and C++ runtime library implementations may vary. If performance is important to your application, then benchmark on the system(s) that are critical for you, using the compiler(s) that you use.

请注意，这些结果适用于我的机器，运行 x86-64 Linux，使用 g++ 4.6.3 编译。其他操作系统、编译器和 C++ 运行时库实现可能会有所不同。如果性能对您的应用程序很重要，则使用您使用的编译器在对您至关重要的系统上进行基准测试。

Here's the code I wrote to test this. It may not be the perfect representation of a real scenario, but I think it's a representative scenario:

这是我编写的用于测试的代码。它可能不是真实场景的完美代表，但我认为这是一个具有代表性的场景：

#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
#include <cstring>

using namespace std;

static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}

string build_string_1(const string &a, const string &b, const string &c)
{
    string out = a + b + c;
    return out;
}

string build_string_1a(const string &a, const string &b, const string &c)
{
    string out;
    out.resize(a.length()*3);
    out = a + b + c;
    return out;
}

string build_string_2(const string &a, const string &b, const string &c)
{
    string out = a;
    out += b;
    out += c;
    return out;
}

string build_string_3(const string &a, const string &b, const string &c)
{
    string out;
    out = a;
    out.append(b);
    out.append(c);
    return out;
}


string build_string_4(const string &a, const string &b, const string &c)
{
    stringstream ss;

    ss << a << b << c;
    return ss.str();
}


char *build_string_5(const char *a, const char *b, const char *c)
{
    char* out = new char[strlen(a) * 3+1];
    strcpy(out, a);
    strcat(out, b);
    strcat(out, c);
    return out;
}



template<typename T>
size_t len(T s)
{
    return s.length();
}

template<>
size_t len(char *s)
{
    return strlen(s);
}

template<>
size_t len(const char *s)
{
    return strlen(s);
}



void result(const char *name, unsigned long long t, const string& out)
{
    cout << left << setw(22) << name << " time:" << right << setw(10) <<  t;
    cout << "   (per character: " 
         << fixed << right << setw(8) << setprecision(2) << (double)t / len(out) << ")" << endl;
}

template<typename T>
void benchmark(const char name[], T (Func)(const T& a, const T& b, const T& c), const char *strings[])
{
    unsigned long long t;

    const T s1 = strings[0];
    const T s2 = strings[1];
    const T s3 = strings[2];
    t = rdtsc();
    T out = Func(s1, s2, s3);
    t = rdtsc() - t; 

    if (len(out) != len(s1) + len(s2) + len(s3))
    {
        cout << "Error: out is different length from inputs" << endl;
        cout << "Got `" << out << "` from `" << s1 << "` + `" << s2 << "` + `" << s3 << "`";
    }
    result(name, t, out);
}


void benchmark(const char name[], char* (Func)(const char* a, const char* b, const char* c), 
               const char *strings[])
{
    unsigned long long t;

    const char* s1 = strings[0];
    const char* s2 = strings[1];
    const char* s3 = strings[2];
    t = rdtsc();
    char *out = Func(s1, s2, s3);
    t = rdtsc() - t; 

    if (len(out) != len(s1) + len(s2) + len(s3))
    {
        cout << "Error: out is different length from inputs" << endl;
        cout << "Got `" << out << "` from `" << s1 << "` + `" << s2 << "` + `" << s3 << "`";
    }
    result(name, t, out);
    delete [] out;
}


#define BM(func, size) benchmark(#func " " #size, func, strings ## _ ## size)


#define BM_LOT(size) BM(build_string_1, size); \
    BM(build_string_1a, size); \
    BM(build_string_2, size); \
    BM(build_string_3, size); \
    BM(build_string_4, size); \
    BM(build_string_5, size);

int main()
{
    const char *strings_small[]  = { "Abc", "Def", "Ghi" };
    const char *strings_medium[] = { "abcdefghijklmnopqrstuvwxyz", 
                                     "defghijklmnopqrstuvwxyzabc", 
                                     "ghijklmnopqrstuvwxyzabcdef" };
    const char *strings_large[]   = 
        { "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
          "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz", 

          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"

          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc" 
          "defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc", 

          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
          "ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
        };

    for(int i = 0; i < 5; i++)
    {
        BM_LOT(small);
        BM_LOT(medium);
        BM_LOT(large);
        cout << "---------------------------------------------" << endl;
    }
}

Here are some representative results:

以下是一些有代表性的结果：

build_string_1 small   time:      4075   (per character:   452.78)
build_string_1a small  time:      5384   (per character:   598.22)
build_string_2 small   time:      2669   (per character:   296.56)
build_string_3 small   time:      2427   (per character:   269.67)
build_string_4 small   time:     19380   (per character:  2153.33)
build_string_5 small   time:      6299   (per character:   699.89)
build_string_1 medium  time:      3983   (per character:    51.06)
build_string_1a medium time:      6970   (per character:    89.36)
build_string_2 medium  time:      4072   (per character:    52.21)
build_string_3 medium  time:      4000   (per character:    51.28)
build_string_4 medium  time:     19614   (per character:   251.46)
build_string_5 medium  time:      6304   (per character:    80.82)
build_string_1 large   time:      8491   (per character:     3.63)
build_string_1a large  time:      9563   (per character:     4.09)
build_string_2 large   time:      6154   (per character:     2.63)
build_string_3 large   time:      5992   (per character:     2.56)
build_string_4 large   time:     32450   (per character:    13.87)
build_string_5 large   time:     15768   (per character:     6.74)

Same code, run as 32-bit:

相同的代码，以 32 位运行：

build_string_1 small   time:      4289   (per character:   476.56)
build_string_1a small  time:      5967   (per character:   663.00)
build_string_2 small   time:      3329   (per character:   369.89)
build_string_3 small   time:      3047   (per character:   338.56)
build_string_4 small   time:     22018   (per character:  2446.44)
build_string_5 small   time:      3026   (per character:   336.22)
build_string_1 medium  time:      4089   (per character:    52.42)
build_string_1a medium time:      8075   (per character:   103.53)
build_string_2 medium  time:      4569   (per character:    58.58)
build_string_3 medium  time:      4326   (per character:    55.46)
build_string_4 medium  time:     22751   (per character:   291.68)
build_string_5 medium  time:      2252   (per character:    28.87)
build_string_1 large   time:      8695   (per character:     3.72)
build_string_1a large  time:     12818   (per character:     5.48)
build_string_2 large   time:      8202   (per character:     3.51)
build_string_3 large   time:      8351   (per character:     3.57)
build_string_4 large   time:     38250   (per character:    16.35)
build_string_5 large   time:      8143   (per character:     3.48)

From this, we can conclude:

由此，我们可以得出结论：

The best option is appending a bit at a time (out.append()or out +=), with the "chained" approach reasonably close.
Pre-allocating the string is not helpful.
Using stringstreamis pretty poor idea (between 2-4x slower).
The char *uses new char[]. Using a local variable in the calling function makes it the fastest - but slightly unfairly to compare that.
There is a fair bit of overhead in combining short string - just copying data should be at most one cycle per byte [unless the data doesn't fit in the cache].

最好的选择是一次添加一点（out.append()或out +=），“链式”方法相当接近。
预先分配字符串没有帮助。
使用stringstream是非常糟糕的主意（慢 2-4 倍之间）。
的char *用途new char[]。在调用函数中使用局部变量使其速度最快 - 但比较它有点不公平。
组合短字符串有相当多的开销 - 只是复制数据每个字节最多应该是一个周期[除非数据不适合缓存]。

edit2

编辑2

Added, as per comments:

添加，根据评论：

string build_string_1b(const string &a, const string &b, const string &c)
{
    return a + b + c;
}

and

和

string build_string_2a(const string &a, const string &b, const string &c)
{
    string out;
    out.reserve(a.length() * 3);
    out += a;
    out += b;
    out += c;
    return out;
}

Which gives these results:

这给出了这些结果：

build_string_1 small   time:      3845   (per character:   427.22)
build_string_1b small  time:      3165   (per character:   351.67)
build_string_2 small   time:      3176   (per character:   352.89)
build_string_2a small  time:      1904   (per character:   211.56)

build_string_1 large   time:      9056   (per character:     3.87)
build_string_1b large  time:      6414   (per character:     2.74)
build_string_2 large   time:      6417   (per character:     2.74)
build_string_2a large  time:      4179   (per character:     1.79)

(A 32-bit run, but the 64-bit shows very similar results on these).

（32 位运行，但 64 位在这些上显示非常相似的结果）。

Answer 4

回答by Mike Seymour

As with most micro-optimisations, you will need to measure the effect of each option, having first established through measurement that this is indeed a bottle-neck worth optimising. There is no definitive answer.

与大多数微观优化一样，您需要测量每个选项的效果，首先通过测量确定这确实是一个值得优化的瓶颈。没有确定的答案。

appendand +=should do exactly the same thing.

append并且+=应该做完全相同的事情。

+is conceptually less efficient, since you're creating and destroying temporaries. Your compiler may or may not be able to optimise this to be as fast as appending.

+从概念上讲效率较低，因为您正在创建和销毁临时文件。您的编译器可能会也可能不会将其优化为与追加一样快。

Calling reservewith the total size may reduce the number of memory allocations needed - they will probably be the biggest bottleneck.

reserve使用总大小调用可能会减少所需的内存分配数量 - 它们可能是最大的瓶颈。

<<(presumably on a stringstream) may or may not be faster; you'll need to measure that. It's useful if you need to format non-string types, but probably won't be particularly better or worse at dealing with strings.

<<（大概在 a 上stringstream）可能会或可能不会更快；你需要测量它。如果您需要格式化非字符串类型，它很有用，但在处理字符串方面可能不会特别好或更差。

CStringhas the disadvantage that it's not portable, and that a Unix hacker like me can't tell you what its advantages may or may not be.

CString它的缺点是不可移植，而且像我这样的 Unix 黑客无法告诉您它的优点可能是什么，可能不是。

Answer 5

回答by DrHell

I decided to run a test with the code provided by user Jesse Good, slightly modified to take into account the observation of Rapptz, specifically the fact that the ostringstream was constructed in each single iteration of the loop. Therefore I added some cases, a couple of them being the ostringstream cleared with the sequence "oss.str(""); oss.clear()"

我决定使用用户Jesse Good提供的代码运行一个测试，稍微修改以考虑到Rapptz的观察，特别是在循环的每次迭代中构造 ostringstream 的事实。因此，我添加了一些案例，其中一些是使用序列“ oss.str(""); oss.clear()”清除的 ostringstream

Here is the code

这是代码

#include <iostream>
#include <string>
#include <chrono>
#include <sstream>
#include <functional>


template <typename F> void time_measurement(F f, const std::string& comment)
{
    typedef std::chrono::high_resolution_clock clock;
    typedef std::chrono::duration<float, std::milli> mil;
    std::string r;
    auto t0 = clock::now();
    f(r);
    auto t1 = clock::now();
    std::cout << "\n-------------------------" << comment << "-------------------\n" <<r << '\n';
    std::cout << mil(t1-t0).count() << "ms\n";
    std::cout << "---------------------------------------------------------------------------\n";

}

inline void clear(std::ostringstream& x)
{
    x.str("");
    x.clear();
}

void test()
{
    std:: cout << std::endl << "----------------String Comparison---------------- " << std::endl;
    const int n=100000;
    {
        auto f=[](std::string& l_czTempStr)
        {
            std::string s1="Test data1";
            for (int i = 0; i < n; ++i)
            {
                l_czTempStr = s1 + "Test data2" + "Test data3";
            }
        };
        time_measurement(f, "string, plain addition");
   }

   {
        auto f=[](std::string& l_czTempStr)
        {
            for (int i = 0; i < n; ++i)
            {
                l_czTempStr =  "Test data1";
                l_czTempStr += "Test data2";
                l_czTempStr += "Test data3";
            }
        };
        time_measurement(f, "string, incremental");
    }

    {
         auto f=[](std::string& l_czTempStr)
         {
            for (int i = 0; i < n; ++i)
            {
                l_czTempStr =  "Test data1";
                l_czTempStr.append("Test data2");
                l_czTempStr.append("Test data3");
            }
         };
         time_measurement(f, "string, append");
     }

    {
         auto f=[](std::string& l_czTempStr)
         {
            for (int i = 0; i < n; ++i)
            {
                std::ostringstream oss;
                oss << "Test data1";
                oss << "Test data2";
                oss << "Test data3";
                l_czTempStr = oss.str();
            }
         };
         time_measurement(f, "oss, creation in each loop, incremental");
     }

    {
         auto f=[](std::string& l_czTempStr)
         {
            std::ostringstream oss;
            for (int i = 0; i < n; ++i)
            {
                oss.str("");
                oss.clear();
                oss << "Test data1";
                oss << "Test data2";
                oss << "Test data3";
            }
            l_czTempStr = oss.str();
         };
         time_measurement(f, "oss, 1 creation, incremental");
     }

    {
         auto f=[](std::string& l_czTempStr)
         {
            std::ostringstream oss;
            for (int i = 0; i < n; ++i)
            {
                oss.str("");
                oss.clear();
                oss << "Test data1" << "Test data2" << "Test data3";
            }
            l_czTempStr = oss.str();
         };
         time_measurement(f, "oss, 1 creation, plain addition");
     }

    {
         auto f=[](std::string& l_czTempStr)
         {
            std::ostringstream oss;
            for (int i = 0; i < n; ++i)
            {
                clear(oss);
                oss << "Test data1" << "Test data2" << "Test data3";
            }
            l_czTempStr = oss.str();
         };
         time_measurement(f, "oss, 1 creation, clearing calling inline function, plain addition");
     }


    {
         auto f=[](std::string& l_czTempStr)
         {
            for (int i = 0; i < n; ++i)
            {
                std::string x;
                x =  "Test data1";
                x.append("Test data2");
                x.append("Test data3");
                l_czTempStr=x;
            }
         };
         time_measurement(f, "string, creation in each loop");
     }

}

Here are the results:

结果如下：

/*

g++ "qtcreator debug mode"
----------------String Comparison---------------- 

-------------------------string, plain addition-------------------
Test data1Test data2Test data3
11.8496ms
---------------------------------------------------------------------------

-------------------------string, incremental-------------------
Test data1Test data2Test data3
3.55597ms
---------------------------------------------------------------------------

-------------------------string, append-------------------
Test data1Test data2Test data3
3.53099ms
---------------------------------------------------------------------------

-------------------------oss, creation in each loop, incremental-------------------
Test data1Test data2Test data3
58.1577ms
---------------------------------------------------------------------------

-------------------------oss, 1 creation, incremental-------------------
Test data1Test data2Test data3
11.1069ms
---------------------------------------------------------------------------

-------------------------oss, 1 creation, plain addition-------------------
Test data1Test data2Test data3
10.9946ms
---------------------------------------------------------------------------

-------------------------oss, 1 creation, clearing calling inline function, plain addition-------------------
Test data1Test data2Test data3
10.9502ms
---------------------------------------------------------------------------

-------------------------string, creation in each loop-------------------
Test data1Test data2Test data3
9.97495ms
---------------------------------------------------------------------------


g++ "qtcreator release mode" (optimized)
----------------String Comparison----------------

-------------------------string, plain addition-------------------
Test data1Test data2Test data3
8.41622ms
---------------------------------------------------------------------------

-------------------------string, incremental-------------------
Test data1Test data2Test data3
2.55462ms
---------------------------------------------------------------------------

-------------------------string, append-------------------
Test data1Test data2Test data3
2.5154ms
---------------------------------------------------------------------------

-------------------------oss, creation in each loop, incremental-------------------
Test data1Test data2Test data3
54.3232ms
---------------------------------------------------------------------------

-------------------------oss, 1 creation, incremental-------------------
Test data1Test data2Test data3
8.71854ms
---------------------------------------------------------------------------

-------------------------oss, 1 creation, plain addition-------------------
Test data1Test data2Test data3
8.80526ms
---------------------------------------------------------------------------

-------------------------oss, 1 creation, clearing calling inline function, plain addition-------------------
Test data1Test data2Test data3
8.78186ms
---------------------------------------------------------------------------

-------------------------string, creation in each loop-------------------
Test data1Test data2Test data3
8.4034ms
---------------------------------------------------------------------------
*/

Now using std::string is still faster, and the append is still the fastest way of concatenation, but ostringstream is no more so incredibly terrible like it was before.

现在使用 std::string 仍然更快，而且 append 仍然是最快的连接方式，但是 ostringstream 不再像以前那样可怕了。

Answer 6

回答by yatsek

So as this question's accepted answer is quite old I've decided to update it's benchmarks with modern compiler and compare both solutions by @jesse-good and template version from @syam

因此，由于这个问题已被接受的答案已经很旧了，我决定用现代编译器更新它的基准，并比较@jesse-good 的两个解决方案和@syam 的模板版本

Here is the combined code:

这是组合代码：

#include <iostream>
#include <string>
#include <chrono>
#include <sstream>
#include <vector>
#include <cstring>


#if VER==TEMPLATE
namespace detail {

  template<typename>
  struct string_size_impl;

  template<size_t N>
  struct string_size_impl<const char[N]> {
    static constexpr size_t size(const char (&) [N]) { return N - 1; }
  };

  template<size_t N>
  struct string_size_impl<char[N]> {
    static size_t size(char (&s) [N]) { return N ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<const char*> {
    static size_t size(const char* s) { return s ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<char*> {
    static size_t size(char* s) { return s ? strlen(s) : 0; }
  };

  template<>
  struct string_size_impl<std::string> {
    static size_t size(const std::string& s) { return s.size(); }
  };

  template<typename String> size_t string_size(String&& s) {
    using noref_t = typename std::remove_reference<String>::type;
    using string_t = typename std::conditional<std::is_array<noref_t>::value,
                                              noref_t,
                                              typename std::remove_cv<noref_t>::type
                                              >::type;
    return string_size_impl<string_t>::size(s);
  }

  template<typename...>
  struct concatenate_impl;

  template<typename String>
  struct concatenate_impl<String> {
    static size_t size(String&& s) { return string_size(s); }
    static void concatenate(std::string& result, String&& s) { result += s; }
  };

  template<typename String, typename... Rest>
  struct concatenate_impl<String, Rest...> {
    static size_t size(String&& s, Rest&&... rest) {
      return string_size(s)
           + concatenate_impl<Rest...>::size(std::forward<Rest>(rest)...);
    }
    static void concatenate(std::string& result, String&& s, Rest&&... rest) {
      result += s;
      concatenate_impl<Rest...>::concatenate(result, std::forward<Rest>(rest)...);
    }
  };

} // namespace detail

template<typename... Strings>
std::string concatenate(Strings&&... strings) {
  std::string result;
  result.reserve(detail::concatenate_impl<Strings...>::size(std::forward<Strings>(strings)...));
  detail::concatenate_impl<Strings...>::concatenate(result, std::forward<Strings>(strings)...);
  return result;
}

#endif

int main ()
{
typedef std::chrono::high_resolution_clock clock;
typedef std::chrono::duration<float, std::milli> ms;
std::string l_czTempStr;


std::string s1="Test data1";


auto t0 = clock::now();
#if VER==PLUS
for (int i = 0; i < 100000; ++i)
{
    l_czTempStr = s1 + "Test data2" + "Test data3";
}
#elif VER==PLUS_EQ
for (int i = 0; i < 100000; ++i)
{
    l_czTempStr =  "Test data1"; 
    l_czTempStr += "Test data2";
    l_czTempStr += "Test data3";
}
#elif VER==APPEND
for (int i = 0; i < 100000; ++i)
{
    l_czTempStr =  "Test data1"; 
    l_czTempStr.append("Test data2");
    l_czTempStr.append("Test data3");
}
#elif VER==STRSTREAM
for (int i = 0; i < 100000; ++i)
{
    std::ostringstream oss;
    oss << "Test data1";
    oss << "Test data2";
    oss << "Test data3";
    l_czTempStr = oss.str();
}
#elif VER=TEMPLATE
for (int i = 0; i < 100000; ++i)
{
    l_czTempStr = concatenate(s1, "Test data2", "Test data3");
}
#endif

#define STR_(x) #x
#define STR(x) STR_(x)

auto t1 = clock::now();
//std::cout << l_czTempStr << '\n';
std::cout << STR(VER) ": " << ms(t1-t0).count() << "ms\n";
}

The test instruction:

测试说明：

for ARGTYPE in PLUS PLUS_EQ APPEND STRSTREAM TEMPLATE; do for i in `seq 4` ; do clang++ -std=c++11 -O3 -DVER=$ARGTYPE -Wall -pthread -pedantic main.cpp && ./a.out ; rm ./a.out ; done; done

And results (processed through spreadsheet to show average time):

和结果（通过电子表格处理以显示平均时间）：

PLUS       23.5792   
PLUS       23.3812   
PLUS       35.1806   
PLUS       15.9394   24.5201
PLUS_EQ    15.737    
PLUS_EQ    15.3353   
PLUS_EQ    10.7764   
PLUS_EQ    25.245    16.773425
APPEND     22.954    
APPEND     16.9031   
APPEND     10.336    
APPEND     19.1348   17.331975
STRSTREAM  10.2063   
STRSTREAM  10.7765   
STRSTREAM  13.262    
STRSTREAM  22.3557   14.150125
TEMPLATE   16.6531   
TEMPLATE   16.629    
TEMPLATE   22.1885   
TEMPLATE   16.9288   18.09985

The surprise is strstreamwhich seems to have a lot of benefit from C++11 and later improvements. Probably removal of necessary allocation due to introduction of move semantics has some influence.

令人惊讶的是strstream，它似乎从 C++11 和更高版本的改进中获益良多。可能由于引入移动语义而删除必要的分配有一些影响。

You can test it on your own on coliru

您可以在coliru上自行测试

Edit: I've updated test on coliru to use g++-4.8: http://coliru.stacked-crooked.com/a/593dcfe54e70e409. Results in graph here:

编辑：我已经更新了对 coliru 的测试以使用 g++-4.8：http://coliru.stacked-crooked.com/a/593dcfe54e70e409 。结果在这里：

(explanation - "stat. average" means average over all values except two extreme ones - one minimal and one maximal value)

（解释 - “stat.average”是指除两个极端值之外的所有值的平均值 - 一个最小值和一个最大值）

Answer 7

回答by parasrish

There are some significant parameters, which has potential impact on deciding the "most optimized way". Some of these are - string/content size, number of operations, compiler optimization, etc.

有一些重要参数对决定“最优化方式”有潜在影响。其中一些是 - 字符串/内容大小、操作数量、编译器优化等。

In most of the cases, string::operator+=seems to be working best. However at times, on some compilers, it is also observed that ostringstream::operator<<works best [like - MingW g++ 3.2.3, 1.8 GHz single processor Dell PC]. When compiler context comes, then it is majorly the optimizations at compiler which would impact. Also to mention, that stringstreamsare complex objects as compared to simple strings, and therefore adds to the overhead.

在大多数情况下，string::operator+=似乎效果最好。然而，有时，在某些编译器上，也观察到ostringstream::operator<<效果最佳 [例如 - MingW g++ 3.2.3, 1.8 GHz 单处理器戴尔 PC]。当编译器上下文出现时，主要是编译器的优化会产生影响。还要提到的stringstreams是，与简单字符串相比，它们是复杂的对象，因此会增加开销。

For more info - discussion, article.

欲了解更多信息 -讨论，文章。

C++ 字符串中最优化的连接方式

提问by Bhupesh Pant

回答by Jesse Good

回答by syam

A usable solution (C++11, variadic templates)

一个可用的解决方案（C++11，可变参数模板）

回答by Mats Petersson

回答by Mike Seymour

回答by DrHell

回答by yatsek

回答by parasrish

相关推荐

最近更新

标签

C++ 字符串中最优化的连接方式

提问by Bhupesh Pant

回答by Jesse Good

回答by syam

A usable solution (C++11, variadic templates)

一个可用的解决方案（C++11，可变参数模板）

回答by Mats Petersson

回答by Mike Seymour

回答by DrHell

回答by yatsek

回答by parasrish

相关推荐

C++ 在 switch 语句中使用 continue

c++ - 将基类指针转换为派生类指针

你如何在 C++ 中生成随机字符串？

C++ x 的对数基数 n

相关推荐

最近更新

标签