C++ std::string 是如何实现的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1466073/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 20:05:29  来源:igfitidea点击:

How is std::string implemented?

c++stringstdcstring

提问by yesraaj

I am curious to know how std::string is implemented and how does it differ from c string?If the standard does not specify any implementation then any implementation with explanation would be great with how it satisfies the string requirement given by standard?

我很想知道 std::string 是如何实现的,它与 c 字符串有什么不同?如果标准没有指定任何实现,那么任何带有解释的实现都会很好地满足标准给出的字符串要求?

回答by Michael Burr

Virtually every compiler I've used provides source code for the runtime - so whether you're using GCC or MSVC or whatever, you have the capability to look at the implementation. However, a large part or all of std::stringwill be implemented as template code, which can make for very difficult reading.

几乎我使用过的每个编译器都提供了运行时的源代码——所以无论您使用 GCC 还是 MSVC 或其他什么,您都可以查看实现。但是,很大一部分或全部std::string将作为模板代码实现,这可能使阅读变得非常困难。

Scott Meyer's book, Effective STL, has a chapter on std::string implementations that's a decent overview of the common variations: "Item 15: Be aware of variations in stringimplementations".

Scott Meyer 的书 Effective STL有一章关于 std::string 实现,这是对常见变体的一个不错的概述:“项目 15:注意string实现中的变体”。

He talks about 4 variations:

他谈到了 4 种变化:

  • several variations on a ref-counted implementation (commonly known as copy on write) - when a string object is copied unchanged, the refcount is incremented but the actual string data is not. Both object point to the same refcounted data until one of the objects modifies it, causing a 'copy on write' of the data. The variations are in where things like the refcount, locks etc are stored.

  • a "short string optimization" (SSO) implementation. In this variant, the object contains the usual pointer to data, length, size of the dynamically allocated buffer, etc. But if the string is short enough, it will use that area to hold the string instead of dynamically allocating a buffer

  • 引用计数实现的几种变体(通常称为写时复制) - 当字符串对象被原样复制时,引用计数会增加,但实际的字符串数据不会。两个对象都指向相同的 refcounted 数据,直到其中一个对象修改它,导致数据的“写入时复制”。变化在于存储引用计数、锁等内容的位置。

  • “短字符串优化”(SSO)实现。在这个变体中,对象包含指向数据、长度、动态分配缓冲区大小等的常用指针。但是如果字符串足够短,它将使用该区域来保存字符串而不是动态分配缓冲区

Also, Herb Sutter's "More Exceptional C++"has an appendix (Appendix A: "Optimizations that aren't (in a Multithreaded World)") that discusses why copy on write refcounted implementations often have performance problems in multithreaded applications due to synchronization issues. That article is also available online (but I'm not sure if it's exactly the same as what's in the book):

此外,Herb Sutter 的“More Exceptional C++”有一个附录(Appendix A:“Optimizations that not (in a Multithreaded World)”)讨论了为什么在多线程应用程序中由于同步问题,写引用计数实现时的复制经常会出现性能问题。那篇文章也可以在网上找到(但我不确定它是否与书中的内容完全相同):

Both those chapters would be worthwhile reading.

这两章都值得一读。

回答by Glen

std::string is a class that wraps around some kind of internal buffer and provides methods for manipulating that buffer.

std::string 是一个环绕某种内部缓冲区并提供操作该缓冲区的方法的类。

A string in C is just an array of characters

C 中的字符串只是一个字符数组

Explaining all the nuances of how std::string works here would take too long. Maybe have a look at the gcc source code http://gcc.gnu.orgto see exactly how they do it.

在这里解释 std::string 如何工作的所有细微差别将花费太长时间。也许看看 gcc 源代码http://gcc.gnu.org看看他们是如何做到的。

回答by DVK

There's an example implementation in an answer on this page.

此页面上的答案中有一个示例实现。

In addition, you can look at gcc's implementation, assuming you have gcc installed. If not, you can access their source code via SVN. Most of std::string is implemented by basic_string, so start there.

此外,假设您已安装 gcc,您可以查看 gcc 的实现。如果没有,您可以通过 SVN 访问他们的源代码。大多数 std::string 是由basic_string实现的,所以从那里开始。

Another possible source of info is Watcom's compiler

另一个可能的信息来源是Watcom 的编译器

回答by progician

The c++ solution for strings are quite different from the c-version. The first and most important difference is while the c using the ASCIIZ solution, the std::string and std::wstring are using two iterators (pointers) to store the actual string. The basic usage of the string classes provides a dynamic allocated solution, so in the cost of CPU overhead with the dynamic memory handling it makes the string handling more comfortable.

字符串的 c++ 解决方案与 c 版本完全不同。第一个也是最重要的区别是当 c 使用 ASCIIZ 解决方案时, std::string 和 std::wstring 使用两个迭代器(指针)来存储实际字符串。字符串类的基本用法提供了动态分配的解决方案,因此在动态内存处理的 CPU 开销成本中,它使字符串处理更加舒适。

As you probably already know, the C doesn't contain any built-in generic string type, only provides couple of string operations through the standard library. One of the major difference between C and C++ that the C++ provides a wrapped functionality, so it can be considered as a faked generic type.

您可能已经知道,C 不包含任何内置的通用字符串类型,仅通过标准库提供了几个字符串操作。C 和 C++ 之间的主要区别之一是 C++ 提供了包装功能,因此可以将其视为伪造的泛型类型。

In C you need to walk through the string if you would like to know the length of it, the std::string::size() member function is only one instruction (end - begin) basically. You can safely append strings one to an other as long as you have memory, so there is no need to worry about the buffer overflow bugs (and therefore the exploits), because the appending creates a bigger buffer if it is needed.

在 C 中,如果您想知道字符串的长度,则需要遍历字符串,std::string::size() 成员函数基本上只是一条指令(结束 - 开始)。只要您有内存,您就可以安全地将一个字符串附加到另一个字符串,因此无需担心缓冲区溢出错误(以及漏洞利用),因为如果需要,附加会创建更大的缓冲区。

As somebody told here before, the string is derivated from the vector functionality, in a templated way, so it makes easier to deal with the multibyte-character systems. You can define your own string type using the typedef std::basic_string specific_str_t; expression with any arbitary data type in the template parameter.

正如之前有人所说,字符串是从向量功能派生出来的,以模板化的方式,因此更容易处理多字节字符系统。您可以使用 typedef std::basic_string specific_str_t; 定义自己的字符串类型。模板参数中具有任意数据类型的表达式。

I think there are enough pros and contras both side:

我认为双方有足够的利弊:

C++ string Pros: - Faster iteration in certain cases (using the size definitely, and it doesn't need the data from the memory to check if you are at the end of the string, comparing two pointers. that could make a difference with the caching) - The buffer operation are packed with the string functionality, so less worries about the buffer problems.

C++ 字符串优点: - 在某些情况下更快的迭代(明确使用大小,并且不需要内存中的数据来检查您是否在字符串的末尾,比较两个指针。这可能会与caching) - 缓冲区操作带有字符串功能,所以不用担心缓冲区问题。

C++ string Cons: - due to the dynamic memory allocation stuff, the basic usage could cause impact on the performance. (fortunately you can tell to the string object what should be the original buffer size, so unless you are exceed it, it won't allocate dynamic blocks from the memory) - often weird and inconsistent names compared to other languages. this is the bad thing about any stl stuff, but you can use to it, and it makes a bit specific C++ish feeling. - the heavy usage of the templating forces the standard library to use header based solutions so it is a big impact on the compiling time.

C++ 字符串 缺点: - 由于动态内存分配的东西,基本使用可能会影响性能。(幸运的是,您可以告诉字符串对象原始缓冲区大小应该是多少,因此除非超过它,否则它不会从内存中分配动态块) - 与其他语言相比,名称通常很奇怪且不一致。这是任何 stl 东西的坏处,但你可以习惯它,它会产生一些特定的 C++ish 感觉。- 模板的大量使用迫使标准库使用基于头的解决方案,因此它对编译时间有很大影响。

回答by Georg Sch?lly

That depends on the standard library you use.

这取决于您使用的标准库。

STLPortfor example is a C++ Standard Library implementation which implements strings among other things.

例如,STLPort是一个 C++ 标准库实现,它实现了字符串等。