最快的 C++ 地图?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3198112/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 12:17:14  来源:igfitidea点击:

Fastest C++ map?

c++performancedata-structuresmap

提问by Poni

Correct me I'm wrong but std::map is an ordered map, thus each time I insert a value the map uses an algorithm to sort its items internally, which takes some time.

纠正我我错了,但 std::map 是一个有序的地图,因此每次我插入一个值时,地图都会使用一种算法在内部对其项目进行排序,这需要一些时间。

My application gets information regarding some items on a constant interval.

我的应用程序以固定的时间间隔获取有关某些项目的信息。

This app keeps a map which is defined like this:

这个应用程序保留了一个定义如下的地图:

::std::map<DWORD, myItem*>

At first all items are considered "new" to the app. An "Item" object is being allocated and added to this map, associating its id and a pointer to it.

起初,所有项目都被视为应用程序的“新”项目。一个“Item”对象被分配并添加到这个映射中,将它的 id 和一个指向它的指针相关联。

When it's not a "new" item (just an update of this object) my app should find the object at the map, using the given id, and update.

当它不是“新”项目(只是此对象的更新)时,我的应用程序应该使用给定的 id 在地图上找到该对象,然后更新。

Most of the times I get updates.

大多数时候我都会收到更新。

My question is:
Is there any faster map implementation or should I keep using this one?
Am I better use unordered_map?

我的问题是:
有没有更快的地图实现,还是我应该继续使用这个?
我最好使用 unordered_map 吗?

回答by Richard

Am I better use unordered_map?

我最好使用 unordered_map 吗?

Possibly.

可能。

std:mapprovides consistent performance at O(log n) because it needs to be implemented as a balanced tree. But std:unordered_mapwill be implemented as a hash table which might give you O(1) performance (good hash function and distribution of keys across hash buckets), but it could be O(n) (everything in one hash bucket and devolves to a list). One would normally expect something inbetween these extremes.

std:map在 O(log n) 下提供一致的性能,因为它需要实现为平衡树。但是std:unordered_map将被实现为一个哈希表,它可能会给你 O(1) 性能(良好的哈希函数和跨哈希桶的键分布),但它可能是 O(n)(一个哈希桶中的所有内容并转移到一个列表) . 人们通常会期望介于这些极端之间。

So you can have reasonable performance (O(log n)) all the time, or youneed to ensure everything lines up to get good performance with a hash.

因此,您可以始终获得合理的性能 (O(log n)),或者需要确保所有内容都对齐以使用散列获得良好的性能。

As with any such question: you need to measure before committing to one approach. Unless your datasets are large you might find there is no significant difference.

与任何此类问题一样:您需要在采用一种方法之前进行测量。除非您的数据集很大,否则您可能会发现没有显着差异。

回答by Tomek Szpakowicz

Important warning:Unless you have measured (and your question suggests that you haven't) that map performance substantially influences your application performance (large percentage of time is spent on searching and updating the map) don't bother with making it faster. Stick to std::map(or std::unordered_mapor any available hash_mapimplementation). Speeding up your application by 1% probably will not be worth the effort. Make it bug free instead.

重要警告:除非您已经测量(并且您的问题表明您没有)地图性能显着影响您的应用程序性能(大部分时间花在搜索和更新地图上),否则不要费心让它更快。坚持std::map(或std::unordered_map任何可用的hash_map实现)。将您的应用程序加速 1% 可能不值得付出努力。让它没有错误。

Echoing Richard's answer: measureperformance with different map implementation using your real classes and real data.

回应 Richard 的回答:使用您的真实类和真实数据测量不同地图实现的性能。

Some additional notes:

一些补充说明:

  • Understand the difference between expected cost (hash maps usually have it lower), worst case cost (O(logn) for balanced binary tree but much higher for hash map if insert triggers reallocation of hash array) and amortized cost (total cost divided by number of operations or elements; depends on things like ratio of new and existing elements). You need to find out which is more constraining in your case. For example reallocating of hash maps can be too much if you need to adhere to very low latency limit.

  • Find out where real bottleneck is. It might be that cost of searching in map is insignificant compared to e.g. IO cost.

  • Try more specialized map implementation. For example a lot can be gained if you know something more about map's key. Authors of generic map implementations do not have such knowledge.

  • 了解预期成本(散列图通常较低)、最坏情况成本(平衡二叉树的 O(logn) 但如果插入触发散列数组的重新分配,散列图的成本要高得多)和摊销成本(总成本除以数字)之间的差异操作或元素的数量;取决于诸如新元素和现有元素的比率之类的东西)。您需要找出在您的情况下哪个更具限制性。例如,如果您需要遵守非常低的延迟限制,则重新分配哈希映射可能会太多。

  • 找出真正的瓶颈在哪里。与例如 IO 成本相比,在地图中搜索的成本可能微不足道。

  • 尝试更专业的地图实现。例如,如果您对地图的键有更多的了解,可以获得很多。通用地图实现的作者没有这样的知识。

In your example (32 bit unsigned integer keys which strongly cluster, e.g. are assigned sequentially) you can use radix based approach. Verysimple example (threat it as an illustration, not ready to use recipe):

在您的示例中(32 位无符号整数键强聚类,例如按顺序分配),您可以使用基于基数的方法。非常简单的例子(以威胁为例,还没有准备好使用配方):

Item *sentinel[65536];  // sentinel page, initialized to NULLs.
Item (*pages[65536])[65536];  // list of pages,
                              // initialized so every element points to sentinel

Then search is as simple as:

然后搜索就像这样简单:

Item *value = pages[index >> 16][index & 0xFFFF];

When you need to set new value:

当您需要设置新值时:

if (pages[index >> 16] == sentinel) {
  pages[index >> 16] = allocate_new_null_filled_page();
}
pages[index >> 16][index & 0xFFFF] = value;
  • Tweak your map implementation.

    • E.g. every hash_maplikes to know approximate number of elements in advance. It helps avoid unnecessary reallocation of hash table and (possibly) rehashing of all keys.

    • With my specialized example above you certainly would try different page sizes, or three level version.

    • Common optimization is providing specialized memory allocator to avoid multiple allocations of small objects.

  • 调整您的地图实施。

    • 例如,每个hash_map人都喜欢提前知道元素的大致数量。它有助于避免不必要的散列表重新分配和(可能)所有键的重新散列。

    • 通过我上面的专门示例,您肯定会尝试不同的页面大小或三级版本。

    • 常见的优化是提供专门的内存分配器来避免多次分配小对象。

回答by no one special

Whenever you insert or delete item, the memory allocation/deallocation costs a lot. Instead you can use an allocator like this one: https://github.com/moya-lang/Allocatorwhich speeds up std::map twice as author says, but I found it even faster especially for other STL containers.

每当您插入或删除项目时,内存分配/释放都会消耗很多。相反,您可以使用像这样的分配器:https: //github.com/moya-lang/Allocator像作者所说的那样将 std::map 加速两倍,但我发现它更快,尤其是对于其他 STL 容器。