C++ libc++ 中短字符串优化的机制是什么？

Question

提问by ValarDohaeris

This answergives a nice high-level overview of short string optimization (SSO). However, I would like to know in more detail how it works in practice, specifically in the libc++ implementation:

这个答案很好地概括了短字符串优化 (SSO)。但是，我想更详细地了解它在实践中是如何工作的，特别是在 libc++ 实现中：

How short does the string have to be in order to qualify for SSO? Does this depend on the target architecture?
How does the implementation distinguish between short and long strings when accessing the string data? Is it as simple as m_size <= 16or is it a flag that is part of some other member variable? (I imagine that m_sizeor part of it might also be used to store string data).

字符串必须有多短才能符合 SSO 的条件？这是否取决于目标架构？
在访问字符串数据时，实现如何区分短字符串和长字符串？它是简单的m_size <= 16还是作为其他成员变量的一部分的标志？（我想它m_size或它的一部分也可能用于存储字符串数据）。

I asked this question specifically for libc++ because I know that it uses SSO, this is even mentioned on the libc++ home page.

我专门为 libc++ 问了这个问题，因为我知道它使用 SSO，这甚至在libc++ 主页上提到过。

Here are some observations after looking at the source:

以下是查看来源后的一些观察结果：

libc++ can be compiled with two slightly different memory layouts for the string class, this is governed by the _LIBCPP_ALTERNATE_STRING_LAYOUTflag. Both of the layouts also distinguish between little-endian and big-endian machines which leaves us with a total of 4 different variants. I will assume the "normal" layout and little-endian in what follows.

libc++ 可以为字符串类使用两种略有不同的内存布局进行编译，这由_LIBCPP_ALTERNATE_STRING_LAYOUT标志控制。这两种布局还区分了小端和大端机器，这给我们留下了总共 4 种不同的变体。我将在下面假设“正常”布局和小端。

Assuming further that size_typeis 4 bytes and that value_typeis 1 byte, this is what the first 4 bytes of a string would look like in memory:

进一步假设size_type4 个字节和value_type1 个字节，这就是字符串的前 4 个字节在内存中的样子：

// short string: (s)ize and 3 bytes of char (d)ata
sssssss0;dddddddd;dddddddd;dddddddd
       ^- is_long = 0

// long string: (c)apacity
ccccccc1;cccccccc;cccccccc;cccccccc
       ^- is_long = 1

Since the size of the short string is in the upper 7 bits, it needs to be shifted when accessing it:

由于短字符串的大小在高7位，访问时需要进行移位：

size_type __get_short_size() const {
    return __r_.first().__s.__size_ >> 1;
}

Similarly, the getter and setter for the capacity of a long string uses __long_maskto work around the is_longbit.

类似地，长字符串容量的 getter 和 setter 用于__long_mask处理is_long位。

I am still looking for an answer to my first question, i.e. what value would __min_cap, the capacity of short strings, take for different architectures?

我仍在寻找我的第一个问题的答案，即__min_cap对于不同的架构，短字符串的容量会有什么价值？

Other standard library implementations

其他标准库实现

This answergives a nice overview of std::stringmemory layouts in other standard library implementations.

这个答案很好地概述了std::string其他标准库实现中的内存布局。

Answer 1

回答by Howard Hinnant

The libc++ basic_stringis designed to have a sizeof3 words on all architectures, where sizeof(word) == sizeof(void*). You have correctly dissected the long/short flag, and the size field in the short form.

libc++basic_string被设计为sizeof在所有架构上都有3 个字，其中sizeof(word) == sizeof(void*). 您已经正确剖析了多头/空头标志和短格式中的大小字段。

what value would __min_cap, the capacity of short strings, take for different architectures?

__min_cap，短字符串的容量，对于不同的架构会有什么价值？

In the short form, there are 3 words to work with:

在简短的形式中，有 3 个词可以使用：

1 bit goes to the long/short flag.
7 bits goes to the size.
Assuming char, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).

1 位进入长/短标志。
7 位用于大小。
假设char1 个字节进入尾随空值（libc++ 将始终在数据后面存储尾随空值）。

This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity()without an allocation).

这留下了 3 个字减去 2 个字节来存储一个短字符串（即capacity()没有分配的最大字符串）。

On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.

在 32 位机器上，10 个字符将适合短字符串。sizeof(string) 是 12。

On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.

在 64 位机器上，22 个字符将适合短字符串。sizeof(string) 是 24。

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

一个主要的设计目标是最小化sizeof(string)，同时使内部缓冲区尽可能大。其基本原理是加快移动构建和移动分配。越大sizeof，在移动构造或移动分配期间必须移动的单词越多。

The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.

长格式至少需要 3 个字来存储数据指针、大小和容量。因此，我将简短形式限制为相同的 3 个单词。有人建议 4 个字的 sizeof 可能有更好的性能。我还没有测试过这种设计选择。

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUTwhich rearranges the data members such that the "long layout" changes from:

有一个名为的配置标志_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT，它重新排列数据成员，使“长布局”从：

struct __long
{
    size_type __cap_;
    size_type __size_;
    pointer   __data_;
};

to:

到：

struct __long
{
    pointer   __data_;
    size_type __size_;
    size_type __cap_;
};

The motivation for this change is the belief that putting __data_first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.

这种变化的动机是相信__data_由于更好的对齐，放在首位会具有一些性能优势。试图衡量性能优势，但很难衡量。它不会使性能变差，而且可能会使其稍微好一点。

The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::stringcompiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUTwill create run time errors.

应谨慎使用该标志。它是一个不同的 ABI，如果不小心std::string与使用不同设置编译的 libc++ 混合在一起，_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT将会产生运行时错误。

I recommend this flag only be changed by a vendor of libc++.

我建议仅由 libc++ 供应商更改此标志。

Answer 2

回答by Matthieu M.

The libc++ implementationis a bit complicated, I'll ignore its alternate design and suppose a little endian computer:

在libc中++实现有点复杂，我会忽略它的替代性设计，并假设小端计算机：

template <...>
class basic_string {
/* many many things */

    struct __long
    {
        size_type __cap_;
        size_type __size_;
        pointer   __data_;
    };

    enum {__short_mask = 0x01};
    enum {__long_mask  = 0x1ul};

    enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?
                      (sizeof(__long) - 1)/sizeof(value_type) : 2};

    struct __short
    {
        union
        {
            unsigned char __size_;
            value_type __lx;
        };
        value_type __data_[__min_cap];
    };

    union __ulx{__long __lx; __short __lxx;};

    enum {__n_words = sizeof(__ulx) / sizeof(size_type)};

    struct __raw
    {
        size_type __words[__n_words];
    };

    struct __rep
    {
        union
        {
            __long  __l;
            __short __s;
            __raw   __r;
        };
    };

    __compressed_pair<__rep, allocator_type> __r_;
}; // basic_string

Note: __compressed_pairis essentially a pair optimized for the Empty Base Optimization, aka template <T1, T2> struct __compressed_pair: T1, T2 {};; for all intents and purposes you can consider it a regular pair. Its importance just comes up because std::allocatoris stateless and thus empty.

注意：__compressed_pair本质上是针对Empty Base Optimization 优化的一对，也就是template <T1, T2> struct __compressed_pair: T1, T2 {};; 出于所有意图和目的，您可以将其视为常规对。它的重要性刚刚出现，因为它std::allocator是无状态的，因此是空的。

Okay, this is rather raw, so let's check the mechanics! Internally, many functions will call __get_pointer()which itself calls __is_longto determine whether the string is using the __longor __shortrepresentation:

好的，这是相当原始的，所以让我们检查一下机制！在内部，许多函数会调用__get_pointer()which 本身调用__is_long来确定字符串是否使用__longor__short表示：

bool __is_long() const _NOEXCEPT
    { return bool(__r_.first().__s.__size_ & __short_mask); }

// __r_.first() -> __rep const&
//     .__s     -> __short const&
//     .__size_ -> unsigned char

To be honest, I am not too sure this is Standard C++ (I know the initial subsequence provision in unionbut do not know how it meshes with an anonymous union and aliasing thrown together), but a Standard Library is allowed to take advantage of implementation defined behavior anyway.

老实说，我不太确定这是标准 C++（我知道其中的初始子序列规定，union但不知道它是如何与匿名联合和别名混在一起的），但允许标准库利用定义的实现反正行为。

C++ libc++ 中短字符串优化的机制是什么？

提问by ValarDohaeris

回答by Howard Hinnant

回答by Matthieu M.

相关推荐

最近更新

标签

C++ libc++ 中短字符串优化的机制是什么？

提问by ValarDohaeris

回答by Howard Hinnant

回答by Matthieu M.

相关推荐

C++ 如何正确使用布尔函数？

C++ 在类中声明枚举

C++ std::find '错误没有匹配的函数'

C++ 错误：表达式不能用作函数？

相关推荐

最近更新

标签