C++ std::string_view 究竟比 const std::string& 快多少?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40127965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 15:20:52  来源:igfitidea点击:

How exactly is std::string_view faster than const std::string&?

c++stringc++17string-view

提问by Patryk

std::string_viewhas made it to C++17 and it is widely recommended to use it instead of const std::string&.

std::string_view已经达到 C++17 并且广泛推荐使用它而不是const std::string&.

One of the reasons is performance.

原因之一是性能。

Can someone explain how exactlystd::string_viewis/will be faster than const std::string&when used as a parameter type? (let's assume no copies in the callee are made)

有人能解释一下究竟std::string_view是/将快于const std::string&作为参数类型时?(假设在被调用者中没有复制任何副本)

采纳答案by Yakk - Adam Nevraumont

std::string_viewis faster in a few cases.

std::string_view在少数情况下更快。

First, std::string const&requires the data to be in a std::string, and not a raw C array, a char const*returned by a C API, a std::vector<char>produced by some deserialization engine, etc. The avoided format conversion avoids copying bytes, and (if the string is longer than the SBO1 for the particular std::stringimplementation) avoids a memory allocation.

首先,std::string const&要求数据在一个std::string,而不是一个原始的 C 数组中,一个char const*由 C API 返回的,std::vector<char>由一些反序列化引擎产生的等等。避免的格式转换避免了复制字节,并且(如果字符串长于特定std::string实现的SBO1 )避免了内存分配。

void foo( std::string_view bob ) {
  std::cout << bob << "\n";
}
int main(int argc, char const*const* argv) {
  foo( "This is a string long enough to avoid the std::string SBO" );
  if (argc > 1)
    foo( argv[1] );
}

No allocations are done in the string_viewcase, but there would be if footook a std::string const&instead of a string_view.

在这种string_view情况下没有进行分配,但是如果使用fooastd::string const&而不是 a string_view

The second really big reason is that it permits working with substrings without a copy. Suppose you are parsing a 2 gigabyte json string (!)2. If you parse it into std::string, each such parse node where they store the name or value of a node copiesthe original data from the 2 gb string to a local node.

第二个真正重要的原因是它允许在没有副本的情况下使用子字符串。假设您正在解析一个 2 GB 的 json 字符串 (!)2。如果将其解析为std::string,则每个这样的解析节点都将在其中存储节点的名称或值的位置将原始数据从 2 gb 字符串复制到本地节点。

Instead, if you parse it to std::string_views, the nodes referto the original data. This can save millions of allocations and halve memory requirements during parsing.

相反,如果将其解析为std::string_views,则节点会引用原始数据。这可以在解析过程中节省数百万次分配并将内存需求减半。

The speedup you can get is simply ridiculous.

您可以获得的加速简直是荒谬的。

This is an extreme case, but other "get a substring and work with it" cases can also generate decent speedups with string_view.

这是一个极端情况,但其他“获取子字符串并使用它”的情况也可以使用string_view.

An important part to the decision is what you lose by using std::string_view. It isn't much, but it is something.

决定的一个重要部分是使用std::string_view. 它并不多,但它是一些东西。

You lose implicit null termination, and that is about it. So if the same string will be passed to 3 functions all of which require a null terminator, converting to std::stringonce may be wise. Thus if your code is known to need a null terminator, and you don't expect strings fed from C-style sourced buffers or the like, maybe take a std::string const&. Otherwise take a std::string_view.

您失去了隐式空终止,仅此而已。因此,如果将相同的字符串传递给 3 个都需要空终止符的函数,则转换为std::string一次可能是明智的。因此,如果已知您的代码需要一个空终止符,并且您不希望从 C 风格的源缓冲区等提供字符串,则可以采用std::string const&. 否则拿一个std::string_view

If std::string_viewhad a flag that stated if it was null terminated (or something fancier) it would remove even that last reason to use a std::string const&.

如果std::string_view有一个标志表明它是否为空终止(或更高级的东西),它甚至会删除使用std::string const&.

There is a case where taking a std::stringwith no const&is optimal over a std::string_view. If you need to own a copy of the string indefinitely after the call, taking by-value is efficient. You'll either be in the SBO case (and no allocations, just a few character copies to duplicate it), or you'll be able to movethe heap-allocated buffer into a local std::string. Having two overloads std::string&&and std::string_viewmight be faster, but only marginally, and it would cause modest code bloat (which could cost you all of the speed gains).

有一种情况,在 a 上取 astd::string和 noconst&是最优的std::string_view。如果您需要在调用后无限期地拥有该字符串的副本,则按值获取是有效的。您将处于 SBO 情况(并且没有分配,只有几个字符副本来复制它),或者您将能够堆分配的缓冲区移动到本地std::string. 有两个重载std::string&&std::string_view可能会更快,但只是微不足道,它会导致适度的代码膨胀(这可能会花费你所有的速度提升)。



1 Small Buffer Optimization

1 小缓冲区优化

2 Actual use case.

2 实际用例。

回答by Pavel Davydov

One way that string_view improves performance is that it allows removing prefixes and suffixes easily. Under the hood, string_view can just add the prefix size to a pointer to some string buffer, or subtract the suffix size from the byte counter, this is usually fast. std::string on the other hand has to copy its bytes when you do something like substr (this way you get a new string that owns its buffer, but in many cases you just want to get part of original string without copying). Example:

string_view 提高性能的一种方式是它允许轻松删除前缀和后缀。在幕后,string_view 可以将前缀大小添加到指向某个字符串缓冲区的指针,或者从字节计数器中减去后缀大小,这通常很快。另一方面,当您执行 substr 之类的操作时,std::string 必须复制其字节(这样您将获得一个拥有其缓冲区的新字符串,但在许多情况下,您只想获取原始字符串的一部分而不进行复制)。例子:

std::string str{"foobar"};
auto bar = str.substr(3);
assert(bar == "bar");

With std::string_view:

使用 std::string_view:

std::string str{"foobar"};
std::string_view bar{str.c_str(), str.size()};
bar.remove_prefix(3);
assert(bar == "bar");

Update:

更新:

I wrote a very simple benchmark to add some real numbers. I used awesome google benchmark library. Benchmarked functions are:

我写了一个非常简单的基准来添加一些实数。我使用了很棒的谷歌基准库。基准函数是:

string remove_prefix(const string &str) {
  return str.substr(3);
}
string_view remove_prefix(string_view str) {
  str.remove_prefix(3);
  return str;
}
static void BM_remove_prefix_string(benchmark::State& state) {                
  std::string example{"asfaghdfgsghasfasg3423rfgasdg"};
  while (state.KeepRunning()) {
    auto res = remove_prefix(example);
    // auto res = remove_prefix(string_view(example)); for string_view
    if (res != "aghdfgsghasfasg3423rfgasdg") {
      throw std::runtime_error("bad op");
    }
  }
}
// BM_remove_prefix_string_view is similar, I skipped it to keep the post short

Results

结果

(x86_64 linux, gcc 6.2, "-O3 -DNDEBUG"):

(x86_64 linux, gcc 6.2, " -O3 -DNDEBUG"):

Benchmark                             Time           CPU Iterations
-------------------------------------------------------------------
BM_remove_prefix_string              90 ns         90 ns    7740626
BM_remove_prefix_string_view          6 ns          6 ns  120468514

回答by Matthieu M.

There are 2 main reasons:

主要有2个原因:

  • string_viewis a slice in an existing buffer, it does not require a memory allocation
  • string_viewis passed by value, not by reference
  • string_view是现有缓冲区中的一个切片,它不需要内存分配
  • string_view按值传递,而不是按引用传递


The advantages of having a slice are multiple:

拥有切片的优点是多方面的:

  • you can use it with char const*or char[]without allocating a new buffer
  • you can take multipleslices and subslices into an existing buffer without allocating
  • substring is O(1), not O(N)
  • ...
  • 您可以在分配char const*char[]不分配新缓冲区的情况下使用它
  • 您可以将多个切片和子切片放入现有缓冲区中,而无需分配
  • 子串是 O(1),而不是 O(N)
  • ...

Better and more consistentperformance all over.

更好、更一致的性能。



Passing by value also has advantages over passing by reference, because aliasing.

由于存在别名,按值传递也比按引用传递有优势。

Specifically, when you have a std::string const&parameter, there is no guarantee that the reference string will not be modified. As a result, the compiler must re-fetch the content of the string after each call into an opaque method (pointer to data, length, ...).

具体来说,当你有一个std::string const&参数时,不能保证引用字符串不会被修改。因此,编译器必须在每次调用后重新获取字符串的内容到一个不透明的方法(指向数据的指针、长度等)。

On the other hand, when passing a string_viewby value, the compiler can statically determine that no other code can modify the length and data pointers now on the stack (or in registers). As a result, it can "cache" them across function calls.

另一方面,当传递string_view按值时,编译器可以静态地确定没有其他代码可以修改现在堆栈上(或寄存器中)的长度和数据指针。因此,它可以跨函数调用“缓存”它们。

回答by juanchopanza

One thing it can do is avoid constructing an std::stringobject in the case of an implicit conversion from a null terminated string:

它可以做的一件事是避免std::string在从空终止字符串进行隐式转换的情况下构造对象:

void foo(const std::string& s);

...

foo("hello, world!"); // std::string object created, possible dynamic allocation.
char msg[] = "good morning!";
foo(msg); // std::string object created, possible dynamic allocation.

回答by n.caillou

std::string_viewis basically just a wrapper around a const char*. And passing const char*means that there will be one less pointer in the system in comparison with passing const string*(or const string&), because string*implies something like:

std::string_view基本上只是一个const char*. 和传递const char*意味着与传递const string*(或const string&)相比,系统中将少一个指针,因为string*暗示如下:

string* -> char* -> char[]
           |   string    |

Clearly for the purpose of passing const arguments the first pointer is superfluous.

显然,为了传递 const 参数,第一个指针是多余的。

p.s.One substancial difference between std::string_viewand const char*, nevertheless, is that the string_views are not required to be null-terminated (they have built-in size), and this allows for random in-place splicing of longer strings.

psstd::string_view和之间的一个本质区别const char*是,string_views 不需要以空值结尾(它们具有内置大小),这允许随机就地拼接更长的字符串。