C++ 在保留原始顺序的同时擦除/删除多个 std::vector 元素的最有效方法？

Question

提问by sascha

i have a std::vector<int>and a second container holding iterators or indexes (no keys, i want constant access to the element) to this vector for deletion purposes. Let's assume i have a vector of 1000 elements and want to erase 200 of them. The order of the non-removed elements should be the same after the deletion operations like before.

我有一个std::vector<int>和第二个容器包含迭代器或索引（没有键，我想持续访问元素）到这个向量以用于删除目的。假设我有一个包含 1000 个元素的向量，并且想要删除其中的 200 个。未删除元素的顺序应与之前的删除操作后相同。

One more thing i missed in the first version of my question: the values are unique. They are identities.

我在问题的第一个版本中遗漏的另一件事是：值是唯一的。他们是身份。

How would you do that in a safe (regarding the stl rules) and efficient manner (the decision for a vector shall be final)?

您将如何以安全（关于 stl 规则）和有效的方式（向量的决定将是最终的）做到这一点？

Possibilitiesor Methodsi thought about:

我想到的可能性或方法：

the erase-remove idiom(http://en.wikipedia.org/wiki/Erase-remove_idiom): originally for the deletion of elements which fulfill a condition (including linear search) but i think with ranges of size 1 this method could be used to with already given iterators and a dummy condition. Question: is the original order of elements kept and is it more performant than the last method?
loop over the indexes and erase the elements with the use of vector.erase(vector.begin()+index+offset)while keeping the indexes removed in a container for calculating the offset. This offset could be determined for every remove iteration with the use of a std::lower_boundn the container of already removed elements. The problem: A lot of binary_searches for getting the offset and a lot of move operations because of random-location-deletion.
At the moment I'm doing the following: get all the iterators for the elements to remove. Sort them in descending order according to the location in the vector and loop over them for the final deletion with vector.erase. Now I'm not invalidating any iterator and there are no vector rearrange-operations except for the deletion itself. The problem: a lot of sorting

所述擦除remove惯用法（http://en.wikipedia.org/wiki/Erase-remove_idiom）：最初为其中满足条件（包括直链的搜索）的元素的删除，但我认为有尺寸1这种方法可能是范围用于已经给定的迭代器和虚拟条件。问题：是否保留了元素的原始顺序，它是否比最后一种方法性能更高？
循环遍历索引并使用删除元素，vector.erase(vector.begin()+index+offset)同时将索引保留在容器中以计算偏移量。可以使用std::lower_boundn 已删除元素的容器为每次删除迭代确定此偏移量。问题：由于随机位置删除，大量 binary_search 用于获取偏移量和大量移动操作。
目前我正在执行以下操作：获取要删除的元素的所有迭代器。根据向量中的位置按降序对它们进行排序，并使用循环遍历它们以进行最终删除vector.erase。现在我没有使任何迭代器失效，并且除了删除本身之外没有向量重新排列操作。问题：大量排序

So, how would you tackle this? Any new ideas? Any recommendations?

那么，你将如何解决这个问题？有什么新想法吗？有什么建议吗？

Thanks for your input.

感谢您的输入。

Sascha

萨沙

Edit / Update / Own results:I implemented the erase-remove idiom, which was also mentioned by KennyTM, with a predicate based on the lookup in a boost::dynamic_bitsetand it's insanely fast. Furthermore i tried PigBen's move-and-truncate method(also mentioned by Steve Jessop) which is also accessing the bitset in it's while-loop. Both seem to be equally fast with my kind of data. I tried to delete 100 of 1000 Elements (unsigned ints), did this 100 deletes 1M times and there was no significant difference. Because i think the stl-based erase-remove idiom is kinda more "natural, i'm choosing this method (argument was also mentioned by KennyTM).

编辑/更新/自己的结果：我使用了一个基于 boost::dynamic_bitset 中查找的谓词实现了擦除-删除习语，它也被 KennyTM 提到过，而且速度非常快。此外，我尝试了PigBen 的 move-and-truncate 方法（Steve Jessop 也提到过），它也在它的 while 循环中访问位集。对于我的数据，两者似乎同样快。我试图删除 100 个元素中的 100 个（无符号整数），这 100 个删除了 1M 次并且没有显着差异。因为我认为基于 stl 的擦除-删除习语更“自然，所以我选择了这种方法（KennyTM 也提到了这个论点）。

Answer 1

采纳答案by kennytm

In <algorithm>there is a remove_iffunctionwhich squeezes all values not removed to the front maintaining the order. This works if those 200 elements can be purely determined by the values, not index.

其中<algorithm>有一个remove_if函数可以将所有未删除的值压缩到前面以保持顺序。如果这 200 个元素可以完全由值而不是索引确定，则此方法有效。

This is essentially the Erase-remove idiom you have linked to. remove_ifis guaranteed to perform O(N) comparisons (and at most O(N) copyings), which would be more efficient than sorting (O(N log N)), although your last option doesn't actually require sorting if the indices are determined from values (just scan in the reversed direction while copying).

这本质上是您已链接到的擦除删除习语。remove_if保证执行 O(N) 次比较（最多 O(N) 次复制），这比排序（O（N log N））更有效，尽管如果索引是，您的最后一个选项实际上不需要排序根据值确定（复印时只需反向扫描）。

Nevertheless, using remove_if(if you can) is better than the other 2 options because the implementation has already been written for you, so there's less chance of logical error and conveys better what(not how) to do.

尽管如此，使用remove_if（如果可以）比其他 2 个选项更好，因为已经为您编写了实现，因此逻辑错误的可能性较小，并且可以更好地传达要做什么（而不是如何）。

Answer 2

回答by Benjamin Lindley

How about looping through the vector, and for each element that needs to be removed, copy the next element that doesn't need to be removed in to that position. Then when you get to the end, truncate it.

如何循环遍历向量，对于每个需要删除的元素，将下一个不需要删除的元素复制到该位置。然后当你到达最后时，截断它。

int last = 0;
for(int i=0; i<vec.size(); ++i, ++last)
{
   while(needs_to_be_removed(i))
      ++i;
   if(i >= vec.size()) break;

   vec[last] = vec[i];   
}

vec.resize(last);

Answer 3

回答by Steve Jessop

First thing is, don't call erasemore times than you have to, because for a vector it shuffles all the later elements down, giving the whole operation an Ω(n*m) worst case run time (n the size of the vector, m the size of the list of indexes to remove).

首先，不要调用erase过多的次数，因为对于一个向量，它会将所有后面的元素打乱，给整个操作一个 Ω(n*m) 最坏情况运行时间（n 向量的大小， m 要删除的索引列表的大小）。

I think the first thing I'd try would be similar to your current code:

我认为我会尝试的第一件事类似于您当前的代码：

sort the indexes
create a new vector of size n - m
iterate over the original vector, copying indexes[0]elements, skipping an element, then copying indexes[1] - indexes[0] - 1elements, skip an element, and so on.
swapthe original vector with the new one.

对索引进行排序
创建一个大小为 n - m 的新向量
迭代原始向量，复制indexes[0]元素，跳过一个元素，然后复制indexes[1] - indexes[0] - 1元素，跳过一个元素，等等。
swap原始向量与新向量。

You might be able to do the third step with remove_copy_ifand a predicate which contains state (counting how many items it has copied and how far it is through the sorted list of indexes), butfor extremely tedious and obscure reasons this isn't guaranteed to work (algorithm predicates with mutable state are problematic, it seems to be the consensus that the standard doesn't guarantee that the same copyof the predicate is used throughout the algorithm). So I really don't advise trying it, but it might help to bear in mind that what you're writing basically is a modified version of remove_copy_if.

您可能可以使用remove_copy_if包含状态的谓词（计算它复制了多少项以及它通过索引排序列表的距离）来执行第三步，但由于极其乏味和晦涩的原因，这不能保证工作（具有可变状态的算法谓词是有问题的，似乎一致认为标准不保证在整个算法中使用谓词的相同副本）。所以我真的不建议尝试它，但记住你正在编写的内容基本上是remove_copy_if.

You could avoid the second step using a back_inserterrather than presizing the vector, although you'd presumably still reserve the space in advance.

您可以使用 aback_inserter而不是预先调整向量的大小来避免第二步，尽管您可能仍会提前保留空间。

[Edit: come to think of it, why am I copying anything? Rather than implementing a modified remove_copy_if, implement a modified remove_if, and just copy to an earlier point in the vector. Then erase/resizeat the end. I wouldn't worry about the O(m log m)sort of the indexes until proven to be a problem, because it's unlikely to be significantly slower than the Ω(m) operation to read all the values to be removed, and store them in some kind of container. Then, using this container in the predicate to remove_ifmay or may not be O(1). Sorting might turn out faster for plausible values of m.]

[编辑：想想看，我为什么要复制任何东西？不是实现一个 modified remove_copy_if，而是实现一个 modified remove_if，然后复制到向量中较早的点。然后erase/resize最后。O(m log m)在被证明是一个问题之前，我不会担心索引的类型，因为读取所有要删除的值并将它们存储在某种容器中的速度不太可能比 Ω(m) 操作慢得多。然后，在谓词中使用这个容器 toremove_if可能是也可能不是O(1)。对于合理的值，排序可能会更快m。]

Answer 4

回答by patros

You can copy all elements of the vector to a list unless the index in your second container, and then back to a vector. Even with your algorithm of going from the end of the vector to the front, there's a lot of work going on behind the scenes in your vector.

您可以将向量的所有元素复制到列表中，除非第二个容器中的索引，然后再复制回向量。即使你的算法是从向量的末尾到前面，在你的向量的幕后还有很多工作要做。

Make your second container a map so it keeps the indeces sorted for you automatically.

使您的第二个容器成为地图，以便自动为您排序索引。

edit:

编辑：

To respond to the comment

回复评论

The cost of maintaining a map is worst case the same as maintaining another structure (list or vector) and then sorting it. If you're already doing that, you might as well keep it as a map. It doesn't make sense to complain about the overhead of a map vs. the overhead of sorting a list.

在最坏的情况下，维护地图的成本与维护另一个结构（列表或向量）然后对其进行排序的成本相同。如果您已经这样做了，您不妨将其作为地图保留。抱怨地图的开销与排序列表的开销是没有意义的。

As for the performance of my suggested algorithm, if m is the number of elements to be deleted, and n is the total number of elements then it results in O(n - m).

至于我建议的算法的性能，如果 m 是要删除的元素数，n 是元素总数，那么它的结果是 O(n - m)。

Of course, this is mostly just humoring your attempt to optimize with a vector.

当然，这主要是为了满足您使用向量进行优化的尝试。

1 - You shouldn't be using a vector if you want to do random access deletes. That's not what they're good at, use a list if at all possible. And since you seem to be much more interested in relative order rather than absolute index, I am wondering why a vector is needed at all. If you gave the entire problem, there's probably a common solution to let you use the most efficient data structure to solve it.

1 - 如果要进行随机访问删除，则不应使用向量。这不是他们擅长的，如果可能的话，使用列表。而且由于您似乎对相对顺序而不是绝对索引更感兴趣，我想知道为什么根本需要向量。如果你给出了整个问题，那么可能有一个通用的解决方案可以让你使用最有效的数据结构来解决它。

2 - Instead of maintaining a second data structure, mark elements that need to be deleted directly in their container. A trivial way is instead using a container< T > use a container< std::pair< T, char > > and use the char to keep track of the element status.

2 - 无需维护第二个数据结构，而是在其容器中直接标记需要删除的元素。一种简单的方法是使用容器< T > 使用容器< std::pair< T, char > > 并使用 char 来跟踪元素状态。

If you do 1 and 2, you remove all copying completely and get a much more efficient implementation.

如果您执行 1 和 2，您将完全删除所有复制并获得更有效的实现。

Answer 5

回答by David Frantz

Elements of what? Maybe I'm taking your post to seriously but if you have a vector of 1000 elements why not mark the ones that are not valid anymore and do away with erasing in the first place. Obviously I'm making an assumption here that your elements are not demanding a lot of memory.

什么元素？也许我正在认真对待你的帖子，但如果你有一个包含 1000 个元素的向量，为什么不标记那些不再有效的元素并首先取消擦除。显然，我在这里假设您的元素不需要大量内存。

I only bring this up because you seem to be concerned with speed. If the suggestions already given don't do the trick maybe this idea is worth a thought! In essence speed things up by not doing the operation in the first place.

我提出这个只是因为你似乎关心速度。如果已经给出的建议不起作用，也许这个想法值得考虑！从本质上讲，通过不首先进行操作来加快速度。

Answer 6

回答by MrX

If you have a (e.g. unordered) set of indices that you want to erase, you can use this:

如果您有一组（例如无序）索引要擦除，则可以使用以下命令：

template <typename Type>
void erase_indices(
        const std::unordered_set<size_t>& indices_to_erase,
        std::vector<Type>& vec) {
    std::vector<bool> erase_index(vec.size(), false);
    for (const size_t i: indices_to_erase) {
        erase_index[i] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

It is the fastest solution that came to my mind. You need C++11, though. Usage example to erase elements at index 2 and 5:

这是我想到的最快的解决方案。不过，您需要C++11。擦除索引 2 和 5 处元素的用法示例：

constexpr size_t num = 10u;
std::vector<int> vec(num);
std::iota(vec.begin(), vec.end(), 0);

std::unordered_set<size_t> indices_to_erase;
indices_to_erase.insert(2u);
indices_to_erase.insert(5u);

erase_indices(indices_to_erase, vec);

Before:

前：

0 1 2 3 4 5 6 7 8 9

After:

后：

0 1 3 4 6 7 8 9

Edit:If want to be more flexible regarding type of container that hold the indices to erase:

编辑：如果想要更灵活地处理包含要擦除的索引的容器类型：

template <typename Type, typename Container>
void erase_indices(
        const Container& indices_to_erase,
        std::vector<Type>& vec) {
    typedef typename Container::value_type IndexType;
    static_assert(std::is_same<IndexType, std::size_t>::value,
        "Indices to be erased have to be of type std::size_t");
    std::vector<bool> erase_index(vec.size(), false);
    for (const IndexType idx_erase: indices_to_erase) {
        erase_index[idx_erase] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

Now you can use any kind of container from the Containers Libraryto provide the indices to be erased as long as the value_typeof that container is std::size_t. Usage remains the same.

现在，您可以使用容器库中的任何类型的容器来提供要擦除的索引，只要value_type该容器的std::size_t. 用法保持不变。

Answer 7

回答by Valeriy Ivanov

I've written a function, based on Benjamin Lindley answer https://stackoverflow.com/a/4115582/2835054.

我写了一个函数，基于 Benjamin Lindley 的回答https://stackoverflow.com/a/4115582/2835054。

#include <iostream>
#include <algorithm>
#include <vector>

template <typename elementType, typename indexType>
void remove_multiple_elements_from_vector(std::vector<elementType> &vector,
std::vector<indexType> &indexes)
{
    // 1. indexType is any integer.
    // 2. elementType is any type.
    // 3. Indexes should be unique.
    // 4. The largest index inside indexes shouldn't be larger than
    //    the largetst index in the vector.
    // 5. Indexes should be sorted in ascending order
    //    (it is done inside function).
    std::sort(indexes.begin(), indexes.end());
    indexType currentIndexInIndexesVector = 0;
    indexType last = 0;
    for(indexType i=0; i<vector.size(); ++i, ++last)
    {
       while(indexes[currentIndexInIndexesVector] == i)
       {
          ++i;
          ++currentIndexInIndexesVector;
       }
       if(i >= vector.size()) break;

       vector[last] = vector[i];   
    }

    vector.resize(last);
}


int main()
{
    std::vector<int> vector = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    std::vector<int> indexes = {0, 10, 5};

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }    
    std::cout << "\n";

    remove_multiple_elements_from_vector<int, int>(vector, indexes);

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }
}

C++ 在保留原始顺序的同时擦除/删除多个 std::vector 元素的最有效方法？

提问by sascha

采纳答案by kennytm

回答by Benjamin Lindley

回答by Steve Jessop

回答by patros

回答by David Frantz

回答by MrX

回答by Valeriy Ivanov

相关推荐

最近更新

标签

C++ 在保留原始顺序的同时擦除/删除多个 std::vector 元素的最有效方法？

提问by sascha

采纳答案by kennytm

回答by Benjamin Lindley

回答by Steve Jessop

回答by patros

回答by David Frantz

回答by MrX

回答by Valeriy Ivanov

相关推荐

C++ 如何更改控制台字体大小

C++ 错误不完整的通用字符名称\U

C++ 访问冲突读取位置 0xFFFFFFFFFFFFFFFF

C++ 使用 boost 创建线程池

相关推荐

最近更新

标签