C++ map<std::string> vs map<char *> 性能(我知道,“再次?”)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12136071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 15:55:35  来源:igfitidea点击:

C++ map<std::string> vs map<char *> performance (I know, "again?")

c++performancedictionarystdmap

提问by uroc

I was using a map with a std::stringkey and while everything was working fine I wasn't getting the performance I expected. I searched for places to optimize and improved things only a little and that's when a colleague said, "that string key is going to be slow."

我正在使用带std::string键的地图,虽然一切正常,但我没有获得预期的性能。我搜索了一些地方来优化和改进一些东西,当时一位同事说,“那个字符串键会很慢。”

I read dozens of questions and they consistently say:

我读了几十个问题,他们一直说:

"don't use a char *as a key"
"std::stringkeys are never your bottleneck"
"the performance difference between a char *and a std::stringis a myth."

“不要使用 achar *作为键”
std::string键永远不是你的瓶颈”
“achar *和 a 之间的性能差异std::string是一个神话。”

I reluctantly tried a char *key and there was a difference, a big difference.

我不情愿地试了一把char *钥匙,发现有区别,很大的不同。

I boiled the problem down to a simple example:

我把问题归结为一个简单的例子:

#include <stdio.h>
#include <stdlib.h>
#include <map>

#ifdef USE_STRING

#include <string>
typedef std::map<std::string, int> Map;

#else

#include <string.h>
struct char_cmp { 
    bool operator () (const char *a,const char *b) const 
    {
        return strcmp(a,b)<0;
    } 
};
typedef std::map<const char *, int, char_cmp> Map;

#endif

Map m;

bool test(const char *s)
{
    Map::iterator it = m.find(s);
    return it != m.end();
}

int main(int argc, char *argv[])
{
    m.insert( Map::value_type("hello", 42) );

    const int lcount = atoi(argv[1]);
    for (int i=0 ; i<lcount ; i++) test("hello");
}

First the std::string version:

首先是 std::string 版本:

$ g++ -O3 -o test test.cpp -DUSE_STRING
$ time ./test 20000000
real    0m1.893s

Next the 'char *' version:

接下来是'char *'版本:

g++ -O3 -o test test.cpp             
$ time ./test 20000000
real    0m0.465s

That's a pretty big performance difference and about the same difference I see in my larger program.

这是一个相当大的性能差异,与我在更大的程序中看到的差异大致相同。

Using a char *key is a pain to handle freeing the key and just doesn't feel right. C++ experts what am I missing? Any thoughts or suggestions?

使用char *密钥是处理释放密钥的痛苦并且感觉不对。C++ 专家我错过了什么?有什么想法或建议吗?

回答by sth

You are using a const char *as a lookup key for find(). For the map containing const char*this is the correct type that findexpects and the lookup can be done directly.

您正在使用 aconst char *作为 的查找键find()。对于包含const char*它的地图是find期望的正确类型,可以直接进行查找。

The map containing std::stringexpects the parameter of find()to be a std::string, so in this case the const char*first has to be converted to a std::string. This is probably the difference you are seeing.

包含的映射std::string期望参数为find()a std::string,因此在这种情况下,const char*第一个必须转换为 a std::string。这可能就是您所看到的差异。

回答by Matthieu M.

As sth noted, the issue is one of specifications of the associative containers (sets and maps), in that their member search methods always force a conversion to the key_type, even if an operator<exists that would accept to compare your key against the keys in the map despite their different types.

正如所指出的,问题是关联容器(集合和映射)的规范之一,因为它们的成员搜索方法总是强制转换为key_type,即使operator<存在接受将您的键与映射中的键进行比较的存在尽管它们的类型不同。

On the other hand, the functions in <algorithm>do not suffer from this, for example lower_boundis defined as:

另一方面,中的函数<algorithm>不受此影响,例如lower_bound定义为:

template< class ForwardIt, class T >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value );

template< class ForwardIt, class T, class Compare >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value, Compare comp );

So, an alternative could be:

因此,替代方案可能是:

std::vector< std::pair< std::string, int > >

And then you could do:

然后你可以这样做:

std::lower_bound(vec.begin(), vec.end(), std::make_pair("hello", 0), CompareFirst{})

Where CompareFirstis defined as:

其中CompareFirst定义为:

struct CompareFirst {
     template <typename T, typename U>
     bool operator()(T const& t, U const& u) const { return t.first < u.first; }
};

Or even build a completely custom comparator (but it's a bit harder).

或者甚至构建一个完全自定义的比较器(但它有点难)。

A vectorof pair is generally more efficient in read-heavy loads, so it's really to store a configuration for example.

vector对通常在读取大量负载时更有效,因此它实际上是存储例如配置。

I do advise to provide methods to wrap the accesses. lower_boundis pretty low-level.

我确实建议提供包装访问的方法。lower_bound是相当低级的。

回答by sevensevens

If your in C++ 11, the copy constructor is not called unless the string is changed. Because std::string is a C++ construct, at least 1 dereference is needed to get at the string data.

如果您在 C++ 11 中,除非字符串被更改,否则不会调用复制构造函数。由于 std::string 是 C++ 构造,因此至少需要 1 次取消引用才能获取字符串数据。

My guess would be the time is taken up in an extra dereference (which if done 10000 times is costly), and std::string is likely doing appropriate null pointer checks, which again eats up cycles.

我的猜测是额外的取消引用会占用时间(如果执行 10000 次代价高昂),并且 std::string 可能会进行适当的空指针检查,这又会消耗掉周期。

回答by QwerJoe

After compilation the 2 "Hello" string literals will have the same memory address. On the char *case you use this memory addresses as keys.

编译后,2 个“Hello”字符串文字将具有相同的内存地址。在char *您使用此内存地址作为键的情况下。

In the stringcase every "Hello"s will be converted to a different object. This is a small part (really really small) of your performance difference.

在这种string情况下,每个“Hello”都将转换为不同的对象。这是性能差异的一小部分(真的很小)。

A bigger part can be that as all the "Hello"s you are using has the same memory address strcmpwill always get 2 equivalent char pointers and I'm quite sure that it early checks for this case :) So it will never really iterate on the all characters but the std::string comparison will.

更大的部分可能是因为您使用的所有“Hello”都具有相同的内存地址strcmp将始终获得 2 个等效的字符指针,我很确定它会提前检查这种情况:) 所以它永远不会真正迭代除了 std::string 比较之外的所有字符都会。

回答by Adrian Cornish

Store the std::string as a pointer and then you lose the copy constructor overhead.

将 std::string 存储为指针,然后您就失去了复制构造函数的开销。

But after you have to remember to handle the deletes.

但是之后你必须记住处理删除。

The reason std::string is slow is that is constructs itself. Calls the copy constructor, and then at the end calls delete. If you create the string on the heap you lose the copy construction.

std::string 慢的原因是它自己构造。调用复制构造函数,然后在最后调用 delete。如果在堆上创建字符串,则会丢失复制构造。

回答by Drgabble

One solution to this is use a custom key class that acts as a cross between a const char *and a std::string, but has a boolean to tell at run time if it is "owning" or "non-owning". That way you can insert a key into the map which owns it's data (and will free it on destruction), and then compare with a key that does not own it's data. (This is a similar concept to the rust Cow<'a, str>type).

对此的一种解决方案是使用自定义键类作为 aconst char *和 a之间的交叉std::string,但在运行时有一个布尔值来判断它是“拥有”还是“非拥有”。这样你就可以将一个键插入到拥有它的数据的映射中(并在销毁时释放它),然后与一个不拥有它的数据的键进行比较。(这是与锈Cow<'a, str>类型相似的概念)。

The below example also inherits from boost's string_refto avoid having to re-implement hash functions etc.

下面的示例也继承自 booststring_ref以避免重新实现哈希函数等。

NOTE this has the dangerous effect that if you accidentally insert into the map with the non-owning version, and the string you are pointing at goes out of scope, the key will point at already freed memory. The non-owning version can only be used for lookups.

注意这会产生危险的影响,如果您不小心将非拥有版本插入到地图中,并且您指向的字符串超出范围,则键将指向已释放的内存。非拥有版本只能用于查找。

#include <iostream>
#include <map>
#include <cstring>

#include <boost/utility/string_ref.hpp>

class MaybeOwned: public boost::string_ref {
public:
  // owning constructor, takes a std::string and copies the data
  // deletes it's copy on destruction
  MaybeOwned(const std::string& string):
    boost::string_ref(
      (char *)malloc(string.size() * sizeof(char)),
      string.size()
    ),
    owned(true)
  {
    memcpy((void *)data(), (void *)string.data(), string.size());
  }

  // non-owning constructor, takes a string ref and points to the same data
  // does not delete it's data on destruction
  MaybeOwned(boost::string_ref string):
    boost::string_ref(string),
    owned(false)
  {
  }

  // non-owning constructor, takes a c string and points to the same data
  // does not delete it's data on destruction
  MaybeOwned(const char * string):
    boost::string_ref(string),
    owned(false)
  {
  }

  // move constructor, tells source that it no longer owns the data if it did
  // to avoid double free
  MaybeOwned(MaybeOwned&& other):
    boost::string_ref(other),
    owned(other.owned)
  {
    other.owned = false;
  }

  // I was to lazy to write a proper copy constructor
  // (it would need to malloc and memcpy again if it owned the data)
  MaybeOwned(const MaybeOwned& other) = delete;

  // free owned data if it has any
  ~MaybeOwned() {
    if (owned) {
      free((void *)data());
    }
  }

private:
  bool owned;
};

int main()
{
  std::map<MaybeOwned, std::string> map;
  map.emplace(std::string("key"), "value");
  map["key"] += " here";
  std::cout << map["key"] << "\n";
}