C++ 预先知道大小时初始化 std::map

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13049340/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 16:54:41  来源:igfitidea点击:

Initializing a std::map when the size is known in advance

c++dictionarystd

提问by vanna

I would like to initialize a std::map. For now I am using ::insertbut I feel I am wasting some computational time since I already know the size I want to allocate. Is there a way to allocate a fixed size map and then fill the map ?

我想初始化一个std::map. 现在我正在使用,::insert但我觉得我在浪费一些计算时间,因为我已经知道我想要分配的大小。有没有办法分配固定大小的地图然后填充地图?

回答by Bo Persson

No, the members of the map are internally stored in a tree structure. There is no way to build the tree until you know the keys and values that are to be stored.

不,地图的成员内部存储在树结构中。在您知道要存储的键和值之前,无法构建树。

回答by Peter Ruderman

The short answer is: yes, this is possible, but it's not trivial. You need to define a custom allocator for your map. The basic idea is that your custom allocator will set aside a single block of memory for the map. As the map requires new nodes, the allocator will simply assign them addresses within the pre-allocated block. Something like this:

简短的回答是:是的,这是可能的,但这并非微不足道。您需要为您的地图定义一个自定义分配器。基本思想是您的自定义分配器将为地图留出一块内存。由于映射需要新节点,分配器将简单地为它们分配预分配块内的地址。像这样的东西:

std::map<KeyType, ValueType, std::less<KeyType>, MyAllocator> myMap;

myMap.get_allocator().reserve( nodeSize * numberOfNodes );

There are a number of issues you'll have to deal with, however.

但是,您必须处理许多问题。

First, you don't really know the size of each map node or how many allocations the map will perform. These are internal implementation details. You can experiment to find out, but you can't assume that the results will hold across different compilers (or even future versions of the same compiler). Therefore, you shouldn't worry about allocating a "fixed" size map. Rather, your goal should be to reduce the number of allocations required to a handful.

首先,您并不真正知道每个映射节点的大小或映射将执行多少次分配。这些是内部实现细节。您可以通过试验找出答案,但您不能假设结果将适用于不同的编译器(甚至同一编译器的未来版本)。因此,您不必担心分配“固定”大小的地图。相反,您的目标应该是将所需的分配数量减少到少数。

Second, this strategy becomes quite a bit more complex if you want to support deletion.

其次,如果你想支持删除,这个策略会变得相当复杂。

Third, don't forget memory alignment issues. The pointers your allocator returns must be properly aligned for the various types of objects the memory will store.

第三,不要忘记内存对齐问题。您的分配器返回的指针必须针对内存将存储的各种类型的对象正确对齐。

All that being said, before you try this, make sure it's necessary. Memory allocation can be very expensive, but you still shouldn't assume that it's a problem for your program. Measure to find out. You should also consider alternative strategies that more naturally allow pre-allocation. For example, a sorted list or a std::unordered_map.

说了这么多,在你尝试这个之前,确保它是必要的。内存分配可能非常昂贵,但您仍然不应该认为这是您的程序的问题。测一测。您还应该考虑更自然地允许预分配的替代策略。例如,排序列表或 std::unordered_map。

回答by gast128

Not sure if this answers your question, but Boost.Containerhas a flat_mapin which you can reserve space. Basically you can see this as a sorted vector of (key, value) pairs. Tip: if you also know that your input is sorted, you can use insert with hint for maximal performance.

不确定这是否能回答您的问题,但Boost.Container有一个flat_map您可以保留空间的地方。基本上,您可以将其视为 (key, value) 对的排序向量。提示:如果您也知道您的输入已排序,则可以使用带有提示的插入以获得最大性能。

回答by Denis Ermolin

You are talking about block allocators. But it is hard to implement. Measure before think about such hard things. Anyway Boosthas some articles about implementing block allocator. Or use already implemented preallocated map Stree

你在谈论block allocators. 但实施起来很困难。在考虑这些困难的事情之前先衡量一下。无论如何,Boost有一些关于实现块分配器的文章。或者使用已经实现的预分配地图Stree

回答by darune

There are several good answers to this question already, but they miss some primary points.

这个问题已经有几个很好的答案,但他们遗漏了一些主要观点。

Initialize the map directly

直接初始化地图

The map knows the size up front if initialized directly with iterators:

如果直接使用迭代器初始化,则映射预先知道大小:

auto mymap = std::map(it_begin, it_end);

This is the best way to dodgethe issue. If you are agnostic about the implementation, the map can then know the size up front from the iterators and you moved the issue to the std::implementation to worry about.

这是逃避问题的最佳方法。如果您对实现不可知,那么映射可以从迭代器预先知道大小,并且您将问题转移到std::实现来担心。

Alternativelyuse insertwith iterators instead, that is:

或者insert与迭代器一起使用,即:

mymap.insert(it_begin, it_end);

See: https://en.cppreference.com/w/cpp/container/map/insert

请参阅:https: //en.cppreference.com/w/cpp/container/map/insert

Beware of Premature optimization

提防过早优化

but I feelI am wasting some computational time.

但我觉得我在浪费一些计算时间。

This sounds a lot like you are optimization prematurely (meaning you do not knowwhere the bottleneck is - you are gueessing or seeing an issue that isn't really one). Instead, measure first and then do optimization - repeat if neccesary.

这听起来很像您过早地进行优化(意味着您不知道瓶颈在哪里 - 您正在猜测或看到一个并非真正的问题)。相反,首先进行测量,然后进行优化——如有必要,请重复。

Memory allocation could already be optimized, to a large degree

内存分配已经可以在很大程度上进行优化

Rolling your own block allocator for the map could be close to fruitless. On modern system(her I include OS/hardware andthe c++ language level) memory allocation is already very well optimized for the generel case and you could be looking at little or no improvement if rolling your own block allocator. Even if you take a lot of care and get the map into one contiguoes array - while an improvement in itself - you could still be facing the problem that in the end, the elements could be placed randomly in the array (eg. insertion order) and be less cache friendly anyway (this very much depending on your actual use case though - Im assuming a super large data-set).

为地图滚动您自己的块分配器可能几乎没有结果。在现代系统(她包括操作系统/硬件c++ 语言级别)上,内存分配已经针对一般情况进行了很好的优化,如果滚动自己的块分配器,您可能只会看到很少或没有改进。即使您非常小心并将地图放入一个连续的数组中 - 虽然本身是一种改进 - 您仍然可能面临这样的问题,即最终元素可能会随机放置在数组中(例如插入顺序)并且无论如何都要对缓存不友好(这在很大程度上取决于您的实际用例 - 我假设一个超大数据集)。

Use another container or third party map

使用另一个容器或第三方地图

If you are still facing this issue - the best approach is probably to use another container (eg. a sorted std::vector- use std::lower_boundfor lookups) or use a third party map optimized for how you are using the map. A good example is flat_mapfrom boost- see this answer.

如果您仍然面临这个问题 - 最好的方法可能是使用另一个容器(例如排序std::vector-std::lower_bound用于查找)或使用针对您使用地图的方式优化的第三方地图。一个很好的例子flat_map来自boost- 请参阅此答案

Conclusion

结论

  1. Let the std::map worry about the issue.
  2. When performance is themain issue: use a data structure (perhaps 3rd party) that best suits how your data is being used (random inserts or bulk inserts / mostly iteration or mostly lookups / etc.). You then need toprofile and gather performance metrics to compare.
  1. 让 std::map 担心这个问题。
  2. 当性能主要问题:使用的数据结构(可能是第三方)最适合网站资料的使用(随机插入或批量插入/大多迭代或大部分查询/等)。然后需要分析和收集性能指标以进行比较。