C++ 中的高效字符串连接

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/611263/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 16:14:57  来源:igfitidea点击:

Efficient string concatenation in C++

c++performancestringconcatenation

提问by sneg

I heard a few people expressing worries about "+" operator in std::string and various workarounds to speed up concatenation. Are any of these really necessary? If so, what is the best way to concatenate strings in C++?

我听到一些人对 std::string 中的“+”运算符和各种加速连接的解决方法表示担忧。这些真的有必要吗?如果是这样,在 C++ 中连接字符串的最佳方法是什么?

采纳答案by Brian R. Bondy

The extra work is probably not worth it, unless you really really need efficiency.You probably will have much better efficiency simply by using operator += instead.

额外的工作可能不值得,除非你真的很需要效率。只需使用 operator += 代替,您可能会获得更高的效率。

Now after that disclaimer, I will answer your actual question...

现在在免责声明之后,我将回答您的实际问题......

The efficiency of the STL string class depends on the implementation of STL you are using.

STL 字符串类的效率取决于您使用的 STL 的实现。

You could guarantee efficiencyand have greater controlyourself by doing concatenation manually via c built-in functions.

您可以通过 c 内置函数手动进行连接来保证效率更好地控制自己。

Why operator+ is not efficient:

为什么 operator+ 效率不高:

Take a look at this interface:

看看这个界面:

template <class charT, class traits, class Alloc>
basic_string<charT, traits, Alloc>
operator+(const basic_string<charT, traits, Alloc>& s1,
          const basic_string<charT, traits, Alloc>& s2)

You can see that a new object is returned after each +. That means that a new buffer is used each time. If you are doing a ton of extra + operations it is not efficient.

可以看到在每个+之后都返回了一个新对象。这意味着每次都使用一个新的缓冲区。如果您正在执行大量额外的 + 操作,则效率不高。

Why you can make it more efficient:

为什么可以提高效率:

  • You are guaranteeing efficiency instead of trusting a delegate to do it efficiently for you
  • the std::string class knows nothing about the max size of your string, nor how often you will be concatenating to it. You may have this knowledge and can do things based on having this information. This will lead to less re-allocations.
  • You will be controlling the buffers manually so you can be sure that you won't copy the whole string into new buffers when you don't want that to happen.
  • You can use the stack for your buffers instead of the heap which is much more efficient.
  • string + operator will create a new string object and return it hence using a new buffer.
  • 你是在保证效率,而不是信任一个代表为你高效地做这件事
  • std::string 类对字符串的最大大小一无所知,也不知道您连接它的频率。您可能拥有这些知识,并且可以根据这些信息做事。这将导致更少的重新分配。
  • 您将手动控制缓冲区,因此您可以确保在您不希望发生这种情况时不会将整个字符串复制到新缓冲区中。
  • 您可以将堆栈用于缓冲​​区,而不是效率更高的堆。
  • string + 运算符将创建一个新的字符串对象并使用新的缓冲区返回它。

Considerations for implementation:

实施注意事项:

  • Keep track of the string length.
  • Keep a pointer to the end of the string and the start, or just the start and use the start + the length as an offset to find the end of the string.
  • Make sure the buffer you are storing your string in, is big enough so you don't need to re-allocate data
  • Use strcpy instead of strcat so you don't need to iterate over the length of the string to find the end of the string.
  • 跟踪字符串长度。
  • 保留一个指向字符串结尾和开头的指针,或者只是开头并使用开头 + 长度作为偏移量来查找字符串的结尾。
  • 确保您存储字符串的缓冲区足够大,这样您就不需要重新分配数据
  • 使用 strcpy 而不是 strcat 这样您就不需要遍历字符串的长度来找到字符串的结尾。

Rope data structure:

绳索数据结构:

If you need really fast concatenations consider using a rope data structure.

如果您需要非常快速的连接,请考虑使用绳索数据结构

回答by Carlos A. Ibarra

Reserve your final space before, then use the append method with a buffer. For example, say you expect your final string length to be 1 million characters:

之前保留您的最终空间,然后使用带有缓冲区的 append 方法。例如,假设您希望最终的字符串长度为 100 万个字符:

std::string s;
s.reserve(1000000);

while (whatever)
{
  s.append(buf,len);
}

回答by Johannes Schaub - litb

I would not worry about it. If you do it in a loop, strings will always preallocate memory to minimize reallocations - just use operator+=in that case. And if you do it manually, something like this or longer

我不会担心的。如果您在循环中执行此操作,字符串将始终预先分配内存以最大程度地减少重新分配 - 仅operator+=在这种情况下使用。如果你手动完成,像这样或更长的时间

a + " : " + c

Then it's creating temporaries - even if the compiler could eliminate some return value copies. That is because in a successively called operator+it does not know whether the reference parameter references a named object or a temporary returned from a sub operator+invocation. I would rather not worry about it before not having profiled first. But let's take an example for showing that. We first introduce parentheses to make the binding clear. I put the arguments directly after the function declaration that's used for clarity. Below that, i show what the resulting expression then is:

然后它会创建临时文件 - 即使编译器可以消除一些返回值副本。那是因为在连续调用中,operator+它不知道引用参数是引用命名对象还是从子operator+调用返回的临时对象。在没有先进行分析之前,我宁愿不担心它。但让我们举一个例子来说明这一点。我们首先引入括号来明确绑定。为了清楚起见,我将参数直接放在函数声明之后。在此之下,我展示了结果表达式是什么:

((a + " : ") + c) 
calls string operator+(string const&, char const*)(a, " : ")
  => (tmp1 + c)

Now, in that addition, tmp1is what was returned by the first call to operator+ with the shown arguments. We assume the compiler is really clever and optimizes out the return value copy. So we end up with one new string that contains the concatenation of aand " : ". Now, this happens:

现在,除此之外,tmp1是第一次使用显示的参数调用 operator+ 返回的内容。我们假设编译器非常聪明并优化了返回值副本。所以我们最终得到一个包含aand串联的新字符串" : "。现在,这发生了:

(tmp1 + c)
calls string operator+(string const&, string const&)(tmp1, c)
  => tmp2 == <end result>

Compare that to the following:

将其与以下内容进行比较:

std::string f = "hello";
(f + c)
calls string operator+(string const&, string const&)(f, c)
  => tmp1 == <end result>

It's using the same function for a temporary and for a named string! So the compiler hasto copy the argument into a new string and append to that and return it from the body of operator+. It cannot take the memory of a temporary and append to that. The bigger the expression is, the more copies of strings have to be done.

它对临时字符串和命名字符串使用相同的函数!因此,编译器必须将参数复制到一个新字符串中并附加到该字符串中,然后从operator+. 它不能占用临时内存并附加到该内存中。表达式越大,必须完成的字符串副本就越多。

Next Visual Studio and GCC will support c++1x's move semantics(complementing copy semantics) and rvalue references as an experimental addition. That allows figuring out whether the parameter references a temporary or not. This will make such additions amazingly fast, as all the above will end up in one "add-pipeline" without copies.

下一步 Visual Studio 和 GCC 将支持 c++1x 的移动语义(补充复制语义)和右值引用作为实验性补充。这允许确定参数是否引用了临时参数。这将使此类添加速度惊人,因为以上所有内容都将在一个没有副本的“添加管道”中结束。

If it turns out to be a bottleneck, you can still do

如果结果是瓶颈,你仍然可以做

 std::string(a).append(" : ").append(c) ...

The appendcalls append the argument to *thisand then return a reference to themselves. So no copying of temporaries is done there. Or alternatively, the operator+=can be used, but you would need ugly parentheses to fix precedence.

append调用参数追加到*this,然后返回一个引用到自己。所以没有在那里复制临时文件。或者,operator+=可以使用 ,但您需要丑陋的括号来修复优先级。

回答by Pesto

For most applications, it just won't matter. Just write your code, blissfully unaware of how exactly the + operator works, and only take matters into your own hands if it becomes an apparent bottleneck.

对于大多数应用程序,这无关紧要。只需编写您的代码,幸福地不知道 + 运算符的工作原理,并且只有在它成为明显的瓶颈时才将事情掌握在自己手中。

回答by James Curran

Unlike .NET System.Strings, C++'s std::strings aremutable, and therefore can be built through simple concatenation just as fast as through other methods.

与 .NET System.Strings 不同,C++ 的 std::strings可变的,因此可以像通过其他方法一样通过简单的串联来构建。

回答by Tim

perhaps std::stringstream instead?

也许 std::stringstream 代替?

But I agree with the sentiment that you should probably just keep it maintainable and understandable and then profile to see if you are really having problems.

但是我同意这样的观点,即您应该保持它的可维护性和可理解性,然后进行分析以查看您是否真的遇到了问题。

回答by Luc Hermitte

In Imperfect C++, Matthew Wilson presents a dynamicstring concatenator that pre-computes the length of the final string in order to have only one allocation before concatenating all parts. We can also implement a static concatenator by playing with expression templates.

Imperfect C++ 中,Matthew Wilson 提供了一个动态字符串连接器,它预先计算最终字符串的长度,以便在连接所有部分之前只有一个分配。我们还可以通过使用表达式模板来实现静态连接器。

That kind of idea have been implemented in STLport std::string implementation -- that does not conform to the standard because of this precise hack.

这种想法已经在 STLport std::string 实现中实现了——由于这种精确的黑客攻击,它不符合标准。

回答by timmerov

std::stringoperator+allocates a new string and copies the two operand strings every time. repeat many times and it gets expensive, O(n).

std::stringoperator+每次分配一个新字符串并复制两个操作数字符串。重复多次,它变得昂贵,O(n)。

std::stringappendand operator+=on the other hand, bump the capacity by 50% every time the string needs to grow. Which reduces the number of memory allocations and copy operations significantly, O(log n).

std::stringappendoperator+=在另一方面,50%每次字符串需要成长时间撞击的能力。这显着减少了内存分配和复制操作的数量,O(log n)。

回答by Mykola Golubyev

For small strings it doesn't matter. If you have big strings you'd better to store them as they are in vector or in some other collection as parts. And addapt your algorithm to work with such set of data instead of the one big string.

对于小字符串,这无关紧要。如果你有大字符串,你最好将它们存储在向量中或作为部件存储在其他一些集合中。并添加您的算法以处理这样的数据集而不是一个大字符串。

I prefer std::ostringstream for complex concatenation.

我更喜欢 std::ostringstream 进行复杂的连接。

回答by Pete Kirkham

As with most things, it's easier not to do something than to do it.

与大多数事情一样,不做某事比做某事容易。

If you want to output large strings to a GUI, it may be that whatever you're outputting to can handle the strings in pieces better than as a large string (for example, concatenating text in a text editor - usually they keep lines as separate structures).

如果您想将大字符串输出到 GUI,则可能是无论您输出到什么,都可以比大字符串更好地处理字符串(例如,在文本编辑器中连接文本 - 通常它们将行保持为单独的结构)。

If you want to output to a file, stream the data rather than creating a large string and outputting that.

如果要输出到文件,请流式传输数据,而不是创建一个大字符串并输出。

I've never found a need to make concatenation faster necessary if I removed unnecessary concatenation from slow code.

如果我从慢代码中删除了不必要的连接,我从来没有发现需要使连接更快。