C++ 什么时候使用 std::multimap 有意义

Question

提问by LiKao

I am currently experimenting on some usage of stl-datastructures. However I am still not sure when to use which one and when to use a certain combination. Currently I am trying to figure out, when using a std::multimapdoes make sense. As far as I can see, one can easily build ones own multimap implementation by combining std::mapand std::vector. So I am left with the question when each of these datastructures should be used.

我目前正在试验 stl-datastructures 的一些用法。但是我仍然不确定何时使用哪个以及何时使用某种组合。目前我想弄清楚，什么时候使用 astd::multimap确实有意义。据我所知，可以通过组合std::map和轻松构建自己的多映射实现std::vector。所以我留下了一个问题，什么时候应该使用这些数据结构中的每一个。

Simplicity: A std::multimap is definitely simpler to use, because one does not have to handle the additional nesting. However access to a range of elements as a bulk one might need to copy the data from the iterators to another datastructure (for example a std::vector).
Speed: The locality of the vector most likely makes iterating over the range of equal element much faster, because the cache usage is optimized. However I am guessing that std::multimapsalso have a lot of optimization tricks behind the back to make iterating over equal elements as fast as possible. Also getting to the correct element-range might probably be optimized for std::multimaps.

简单性： std::multimap 使用起来肯定更简单，因为不必处理额外的嵌套。然而，作为批量访问一系列元素可能需要将数据从迭代器复制到另一个数据结构（例如 a std::vector）。
速度：向量的局部性最有可能使在相等元素范围内的迭代速度更快，因为缓存使用得到了优化。然而，我猜测std::multimaps背后也有很多优化技巧，可以尽可能快地迭代相等的元素。也可能会针对std::multimaps.

In order to try out the speed issues I did some simple comparisons using the following program:

为了尝试速度问题，我使用以下程序进行了一些简单的比较：

#include <stdint.h>
#include <iostream>
#include <map>
#include <vector>
#include <utility>

typedef std::map<uint32_t, std::vector<uint64_t> > my_mumap_t;

const uint32_t num_partitions = 100000;
const size_t num_elements =     500000;

int main() {
  srand( 1337 );
  std::vector<std::pair<uint32_t,uint64_t>> values;
  for( size_t i = 0; i <= num_elements; ++i ) {
    uint32_t key = rand() % num_partitions;
    uint64_t value = rand();
    values.push_back( std::make_pair( key, value ) );
  }
  clock_t start;
  clock_t stop;
  {
    start = clock();
    std::multimap< uint32_t, uint64_t > mumap;
    for( auto iter = values.begin(); iter != values.end(); ++iter ) {
      mumap.insert( *iter );
    }
    stop = clock();
    std::cout << "Filling std::multimap: " << stop - start << " ticks" << std::endl;
    std::vector<uint64_t> sums;
    start = clock();
    for( uint32_t i = 0; i <= num_partitions; ++i ) {
      uint64_t sum = 0;
      auto range = mumap.equal_range( i );
      for( auto iter = range.first; iter != range.second; ++iter ) {
        sum += iter->second;
      }
      sums.push_back( sum );
    }
    stop = clock();
    std::cout << "Reading std::multimap: " << stop - start << " ticks" << std::endl;
  }
  {
    start = clock();
    my_mumap_t mumap;
    for( auto iter = values.begin(); iter != values.end(); ++iter ) {
      mumap[ iter->first ].push_back( iter->second );
    }
    stop = clock();
    std::cout << "Filling my_mumap_t: " << stop - start << " ticks" << std::endl;
    std::vector<uint64_t> sums;
    start = clock();
    for( uint32_t i = 0; i <= num_partitions; ++i ) {
      uint64_t sum = 0;
      auto range = std::make_pair( mumap[i].begin(), mumap[i].end() );
      for( auto iter = range.first; iter != range.second; ++iter ) {
        sum += *iter;
      }
      sums.push_back( sum );
    }
    stop = clock();
    std::cout << "Reading my_mumap_t: " << stop - start << " ticks" << std::endl;
  }
}

As I suspected it depends mainly on the ratio between num_partitionsand num_elements, so I am still at a loss here. Here are some example outputs:

因为我怀疑这主要取决于num_partitions和之间的比例num_elements，所以我在这里仍然不知所措。以下是一些示例输出：

For num_partitions = 100000and num_elements = 1000000

对于num_partitions = 100000和num_elements = 1000000

Filling std::multimap: 1440000 ticks
Reading std::multimap: 230000 ticks
Filling    my_mumap_t: 1500000 ticks
Reading    my_mumap_t: 170000 ticks

For num_partitions = 100000and num_elements = 500000

对于num_partitions = 100000和num_elements = 500000

Filling std::multimap: 580000 ticks
Reading std::multimap: 150000 ticks
Filling    my_mumap_t: 770000 ticks
Reading    my_mumap_t: 140000 ticks

For num_partitions = 100000and num_elements = 200000

对于num_partitions = 100000和num_elements = 200000

Filling std::multimap: 180000 ticks
Reading std::multimap:  90000 ticks
Filling    my_mumap_t: 290000 ticks
Reading    my_mumap_t: 130000 ticks

For num_partitions = 1000and num_elements = 1000000

对于num_partitions = 1000和num_elements = 1000000

Filling std::multimap: 970000 ticks
Reading std::multimap: 150000 ticks
Filling    my_mumap_t: 710000 ticks
Reading    my_mumap_t:  10000 ticks

I am unsure about how to interpret these results. How would you go about deciding for the correct data structure? Are there any additional constraints for the decission, which I might have missed?

我不确定如何解释这些结果。您将如何决定正确的数据结构？该决定是否有任何额外的限制，我可能错过了？

Answer 1

采纳答案by Kerrek SB

It's hard to tell whether your benchmark is doing the right thing, so I can't comment on the numbers. However, a few general points:

很难判断您的基准测试是否正确，因此我无法对数字发表评论。但是，有几点一般性：

Why multimaprather than map of vectors: Maps, multimaps, sets and multisets are all essentially the same data structure, and once you have one, it's trivial to just spell out all four. So the first answer is, "why nothave it"?
How is it useful: Multimaps are one of those things that you need rarely, but when you need them, you really need them.
Why not roll my own solution?As I said, I'm not sure about those benchmarks, but even ifyou could make something else that isn't worse than the standard container (which I question), then you should consider the overall burden of getting it right, testing it and maintaining it. Imagine a world in which you would be taxedfor every line of code you wrote (that's Stepanov's suggestion). Re-use industry-standard components whenever possible.

为什么multimap不是向量映射：映射、多映射、集合和多集本质上都是相同的数据结构，一旦你有了一个，就很容易拼出所有四个。所以第一个答案是，“为什么不拥有它”？
它有什么用处：Multimaps 是你很少需要的东西之一，但是当你需要它们时，你真的需要它们。
为什么不推出我自己的解决方案？正如我说的，我不知道这些基准，但即使如果你可以做别的，不低于标准集装箱（我的问题），更糟的是，那么你应该考虑得到正确的总体负担，测试它并维护它。想象一下这样一个世界：你写的每一行代码都要被征税（这是 Stepanov 的建议）。尽可能重复使用行业标准组件。

Finally, here's the typical way you iterate a multimap:

最后，这是迭代多映射的典型方式：

for (auto it1 = m.cbegin(), it2 = it1, end = m.cend(); it1 != end; it1 = it2)
{
  // unique key values at this level
  for ( ; it2 != end && it2->first == it1->first; ++it2)
  {
    // equal key value (`== it1->first`) at this level
  }
}

Answer 2

回答by Matthieu M.

You have forgotten one very important alternative: not all sequences are created equal.

您已经忘记了一个非常重要的选择：并非所有序列都生而平等。

Especially, why a vectorand not a dequeor a list?

特别是，为什么 avector而不是 adeque或 a list？

Using list

使用 list

A std::map<int, std::list<int> >should perform roughly equivalently to a std::multimap<int, int>since listis node based as well.

A 的性能std::map<int, std::list<int> >应该与 a 大致相当，std::multimap<int, int>因为list也是基于节点的。

Using deque

使用 deque

A dequeis the default container to use when you don't know for which to go and do not have any special requirement.

Adeque是当您不知道去哪个并且没有任何特殊要求时使用的默认容器。

With regard to the vector, you trade up some read speed (not much) for faster pushand popoperations.

至于vector，你换了一些读取速度（不要太多），为更快push和pop操作。

Using a dequeinstead, and some obvious optimizations, I get:

使用 adeque和一些明显的优化，我得到：

const uint32_t num_partitions = 100000;
const size_t num_elements =     500000;

Filling std::multimap: 360000 ticks
Filling MyMumap:       530000 ticks

Reading std::multimap: 70000 ticks (0)
Reading MyMumap:       30000 ticks (0)

Or in the "bad" case:

或者在“坏”的情况下：

const uint32_t num_partitions = 100000;
const size_t num_elements =     200000;

Filling std::multimap: 100000 ticks
Filling MyMumap:       240000 ticks

Reading std::multimap: 30000 ticks (0)
Reading MyMumap:       10000 ticks (0)

Thus reading is unconditionally faster, but filling is also way slower.

因此，读取无条件地更快，但填充也慢得多。

Answer 3

回答by Michael Kristofik

A map of vectors comes with the memory overhead for the capacity of each vector. std::vectortypically allocates space for more elements than you actually have. It may not be a big deal for your application, but it's another tradeoff you haven't considered.

向量映射伴随着每个向量容量的内存开销。 std::vector通常为比实际拥有的元素更多的元素分配空间。这对您的应用程序来说可能不是什么大问题，但这是您没有考虑过的另一个权衡。

If you're doing a lot of reads, then the O(1) lookup time of unordered_multimapmight be a better choice.

如果您要进行大量读取，那么 O(1) 查找时间unordered_multimap可能是更好的选择。

If you have a reasonably modern compiler (and given the presence of the autokeyword, you do) then in general you're going to have a difficult time beating the standard containers in terms of performance and reliability. The people who wrote them are experts. I would always start with the standard container that most easily expresses what you want to do. Profile your code early and often, and if it's not running fast enough, then look for ways to improve it (e.g., using the unordered_containers when doing mostly reads).

如果你有一个相当现代的编译器（并且考虑到auto关键字的存在，你有）那么一般来说，你将很难在性能和可靠性方面击败标准容器。编写它们的人是专家。我总是从最容易表达你想要做什么的标准容器开始。尽早并经常分析您的代码，如果它运行得不够快，则寻找改进它的方法（例如，unordered_在主要进行读取时使用容器）。

So, to answer your original question, if you need an associative array of values where those values won't be unique, then using std::multimapdefinitely makes sense.

因此，要回答您的原始问题，如果您需要一个值的关联数组，其中这些值不会是唯一的，那么使用std::multimap肯定是有意义的。

C++ 什么时候使用 std::multimap 有意义

提问by LiKao

采纳答案by Kerrek SB

回答by Matthieu M.

回答by Michael Kristofik

相关推荐

最近更新

标签

C++ 什么时候使用 std::multimap 有意义

提问by LiKao

采纳答案by Kerrek SB

回答by Matthieu M.

回答by Michael Kristofik

相关推荐

C++中的构造函数和对象数组

C++ 函数声明后=delete的含义

C++ 如何为 Qt 应用程序创建可执行文件？

C++ 1字节！= 8位的系统？

相关推荐

最近更新

标签