C++ 如何使向量的元素唯一?(删除不相邻的重复项)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1453333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 20:02:21  来源:igfitidea点击:

How to make elements of vector unique? (remove non adjacent duplicates)

c++stlvectorunique

提问by aJ.

I have a vector containing few non-adjacent duplicates.

我有一个包含很少非相邻重复项的向量。

As a simple example, consider:

作为一个简单的例子,考虑:

2 1 6 1 4 6 2 1 1

I am trying to make this vectorunique by removing the non-adjacent duplicates and maintaining the order of elements.

我试图vector通过删除不相邻的重复项并保持元素的顺序来使其独一无二。

Result would be:

结果将是:

2 1 6 4 

The solutions I tried are:

我尝试的解决方案是:

  1. Inserting into a std::set but the problem with this approach is that it will disturb the order of elements.
  2. Use the combination of std::sort and std::unique. But again same order problem.
  3. Manual duplicate elimination:

        Define a temporary vector TempVector.
        for (each element in a vector)
        {
            if (the element does not exists in TempVector)
            {
                add to TempVector;
            }
        }
        swap orginial vector with TempVector.
    
  1. 插入 std::set 但这种方法的问题是它会扰乱元素的顺序。
  2. 使用 std::sort 和 std::unique 的组合。但同样的顺序问题。
  3. 手动重复消除:

        Define a temporary vector TempVector.
        for (each element in a vector)
        {
            if (the element does not exists in TempVector)
            {
                add to TempVector;
            }
        }
        swap orginial vector with TempVector.
    

My question is:

我的问题是:

Is there any STL algorithm which can remove the non-adjacent duplicates from the vector ? what is its complexity?

是否有任何 STL 算法可以从向量中删除不相邻的重复项?它的复杂性是什么?

回答by fa.

I think you would do it like this:

我想你会这样做:

I would use two iterators on the vector :

我会在向量上使用两个迭代器:

The first of one reads the data and inserts it a temporary set.

第一个读取数据并将其插入一个临时集。

When the read data was not in the set you copy it from the first iterator to the second and increment it.

当读取的数据不在集合中时,您将它从第一个迭代器复制到第二个迭代器并递增它。

At the end you keep only the data up to the second iterator.

最后,您只将数据保留到第二个迭代器。

The complexity is O( n .log( n ) ) as the lookup for duplicated elements uses the set, not the vector.

复杂性是 O( n .log( n ) ) 因为查找重复元素使用的是集合,而不是向量。

#include <vector>
#include <set>
#include <iostream>

int main(int argc, char* argv[])
{
    std::vector< int > k ;

    k.push_back( 2 );
    k.push_back( 1 );
    k.push_back( 6 );
    k.push_back( 1 );
    k.push_back( 4 );
    k.push_back( 6 );
    k.push_back( 2 );
    k.push_back( 1 );
    k.push_back( 1 );

{
    std::vector< int >::iterator r , w ;

    std::set< int > tmpset ;

    for( r = k.begin() , w = k.begin() ; r != k.end() ; ++r )
    {
        if( tmpset.insert( *r ).second )
        {
            *w++ = *r ;
        }
    }

    k.erase( w , k.end() );
}


    {
        std::vector< int >::iterator r ;

        for( r = k.begin() ; r != k.end() ; ++r )
        {
            std::cout << *r << std::endl ;
        }
    }
}

回答by CB Bailey

Without using a temporary setit's possible to do this with (possibly) some loss of performance:

在不使用临时文件set的情况下,有可能(可能)会损失一些性能:

template<class Iterator>
Iterator Unique(Iterator first, Iterator last)
{
    while (first != last)
    {
        Iterator next(first);
        last = std::remove(++next, last, *first);
        first = next;
    }

    return last;
}

used as in:

用于:

vec.erase( Unique( vec.begin(), vec.end() ), vec.end() );

For smaller data sets, the implementation simplicity and lack of extra allocation required may offset the theoretical higher complexity of using an additional set. Measurement with a representative input is the only way to be sure, though.

对于较小的数据集,实现的简单性和不需要额外的分配可能会抵消使用额外的set. 但是,使用具有代表性的输入进行测量是唯一确定的方法。

回答by Andreas Spindler

As the question was "is there any STL algorithm...? what is its complexity?" it makes sense to implement the function like std::unique:

因为问题是“有没有任何 STL 算法......?它的复杂性是什么?” 实现如下功能是有意义的std::unique

template <class FwdIterator>
inline FwdIterator stable_unique(FwdIterator first, FwdIterator last)
{
    FwdIterator result = first;
    std::unordered_set<typename FwdIterator::value_type> seen;

    for (; first != last; ++first)
        if (seen.insert(*first).second)
            *result++ = *first;
    return result;
}

So this is how std::uniqueis implemented plus an extra set. The unordered_setshall be faster than a regular set. All elements are removed that compare equal to the element right preceding them (the first element is kept because we cannot unify to nothing). The iterator returned points to the new end within the range [first,last).

所以这就是如何std::unique实现加上一个额外的集合。该unordered_set应是比常规更快set。所有与它们前面的元素相等的元素都被删除(保留第一个元素,因为我们无法统一为空)。迭代器返回指向范围内新端点的点[first,last)

EDIT: The last sentence means that the container itself is NOT modified by unique. This can be confusing. The following example actually reduces the container to the unified set.

编辑:最后一句意味着容器本身没有被unique. 这可能会令人困惑。下面的例子实际上是将容器缩减为统一集。

1: std::vector<int> v(3, 5);
2: v.resize(std::distance(v.begin(), unique(v.begin(), v.end())));
3: assert(v.size() == 1);

Line 1 creates a vector { 5, 5, 5 }. In line 2 calling uniquereturns an iterator to the 2nd element, which is the first element that is not unique. Hence distancereturns 1 and resizeprunes the vector.

第 1 行创建了一个向量{ 5, 5, 5 }。在第 2 行调用unique返回一个迭代器到第二个元素,它是第一个不唯一的元素。因此distance返回 1 并 resize修剪向量。

回答by Richard Corden

You can remove some of the loops in fa'sanswer using remove_copy_if:

您可以使用以下命令删除fa答案中的一些循环remove_copy_if

class NotSeen : public std::unary_function <int, bool>
{
public:
  NotSeen (std::set<int> & seen) : m_seen (seen) { }

  bool operator ()(int i) const  {
    return (m_seen.insert (i).second);
  }

private:
  std::set<int> & m_seen;
};

void removeDups (std::vector<int> const & iv, std::vector<int> & ov) {
  std::set<int> seen;
  std::remove_copy_if (iv.begin ()
      , iv.end ()
      , std::back_inserter (ov)
      , NotSeen (seen));
}

This has no affect on the complexity of the algorithm (ie. as written it's also O(n log n)). You can improve upon this using unordered_set, or if the range of your values is small enough you could simply use an array or a bitarray.

这对算法的复杂性没有影响(即,如所写,它也是 O(n log n))。您可以使用 unordered_set 对此进行改进,或者如果您的值的范围足够小,您可以简单地使用数组或位数组。

回答by Heiko Sch?fer

Based on the answer of @fa. It can also get rewritten using the STL algorithm std::stable_partition:

基于@fa 的回答。它还可以使用 STL 算法重写std::stable_partition

struct dupChecker_ {
    inline dupChecker_() : tmpSet() {}
    inline bool operator()(int i) {
        return tmpSet.insert(i).second;
    }
private:
    std::set<int> tmpSet;
};

k.erase(std::stable_partition(k.begin(), k.end(), dupChecker_()), k.end());

This way it is more compact and we don't need to care of the iterators.

这样它更紧凑,我们不需要关心迭代器。

It seems even not to introduce to much performance penalty. I use it in my project which needs to handle quite large vectorsof complex types often and it makes no real difference.

似乎甚至不会引入太多的性能损失。我在我的项目中使用它,它需要经常处理相当大的复杂类型向量,但它没有真正的区别。

Another nice feature is, that it is possible to adjust the uniquenessby using std::set<int, myCmp_> tmpSet;. For instance, in my project to ignore certain rounding errors.

另一个不错的功能是,它是可以调整的唯一使用std::set<int, myCmp_> tmpSet;。例如,在我的项目中忽略某些舍入错误。

回答by sbi

There's no STL algorithm doing what you want preserving the sequence's original order.

没有 STL 算法可以执行您想要保留序列的原始顺序的操作。

You could create a std::setof iterators or indexes into the vector, with a comparison predicate that uses the referenced data rather than the iterators/indexes to sort stuff. Then you delete everything from the vector that isn't referenced in the set. (Of course, you could just as well use another std::vectorof iterators/indexes, std::sortand std::uniquethat, and use this as a reference as to what to keep.)

您可以std::set在向量中创建一个迭代器或索引,并使用一个比较谓词使用引用的数据而不是迭代器/索引对内容进行排序。然后从向量中删除集合中未引用的所有内容。(当然,您也可以使用另一个std::vector迭代器/索引,std::sort然后std::unique将其用作保留内容的参考。)

回答by Max Lybbert

My question is:

Is there any STL algorithm which can remove the non-adjacent duplicates from the vector ? what is its complexity?

我的问题是:

是否有任何 STL 算法可以从向量中删除不相邻的重复项?它的复杂性是什么?

The STL options are the ones you mentioned: put the items in a std::set, or call std::sort, std::uniqueand calling erase()on the container. Neither of these fulfills your requirement of "removing the non-adjacent duplicates and maintaining the order of elements."

STL 选项是您提到的选项:将项目放入 astd::set或 call std::sortstd::unique然后调用erase()容器。这些都不能满足您“删除不相邻的重复项并保持元素顺序”的要求。

So why doesn't the STL offer some other option? No standard library will offer everything for every user's needs. The STL's design considerations include "be fast enough for nearly all users," "be useful for nearly all users," and "provide exception safety as much as possible" (and "be small enough for the Standards Committee" as the library Stepanov originally wrote was much larger, and Stroustrup axed out something like 2/3 of it).

那么为什么 STL 不提供其他选择呢?没有标准库能够满足每个用户的需求。STL 的设计考虑包括“对几乎所有用户都足够快”、“对几乎所有用户都有用”和“尽可能提供异常安全”(以及“对标准委员会来说足够小”作为库 Stepanov 最初写的要大得多,Stroustrup 砍掉了大约 2/3 的东西)。

The simplest solution I can think of would look like this:

我能想到的最简单的解决方案如下所示:

// Note:  an STL-like method would be templatized and use iterators instead of
// hardcoding std::vector<int>
std::vector<int> stable_unique(const std::vector<int>& input)
{
    std::vector<int> result;
    result.reserve(input.size());
    for (std::vector<int>::iterator itor = input.begin();
                                    itor != input.end();
                                    ++itor)
        if (std::find(result.begin(), result.end(), *itor) == result.end())
            result.push_back(*itor);
        return result;
}

This solution should be O(M * N) where M is the number of unique elements and N is the total number of elements.

此解决方案应为 O(M * N),其中 M 是唯一元素的数量,N 是元素的总数。

回答by vschoech

There is a nice article by John Torjo which deals with this very question in a systematic way. The result he comes up with seems more generic and more efficient than any of the solutions suggested here so far:

John Torjo 有一篇很好的文章,它以系统的方式处理了这个问题。他提出的结果似乎比这里迄今为止提出的任何解决方案都更通用、更有效:

http://www.builderau.com.au/program/java/soa/C-Removing-duplicates-from-a-range/0,339024620,320271583,00.htm

http://www.builderau.com.au/program/java/soa/C-Removing-duplicates-from-a-range/0,339024620,320271583,00.htm

https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1052159.html

https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1052159.html

Unfortunately, the complete code of John's solution seems to be no longer available, and John did not respond to may email. Therefore, I wrote my own code which is based on similar grounds like his, but intentionally differs in some details. Feel free to contact me (vschoech think-cell com) and discuss the details if you wish.

不幸的是,John 的解决方案的完整代码似乎不再可用,并且 John 没有回复可能的电子邮件。因此,我编写了自己的代码,该代码基于与他类似的理由,但故意在某些细节上有所不同。如果您愿意,请随时与我联系 (vschoech think-cell com) 并讨论详细信息。

To make the code compile for you, I added some of my own library stuff which I use regularly. Also, instead of going with plain stl, I use boost a lot to create more generic, more efficient, and more readable code.

为了让代码为您编译,我添加了一些我自己经常使用的库内容。此外,我没有使用普通的 stl,而是大量使用 boost 来创建更通用、更高效、更易读的代码。

Have fun!

玩得开心!

#include <vector>
#include <functional>

#include <boost/bind.hpp>
#include <boost/range.hpp>
#include <boost/iterator/counting_iterator.hpp>

/////////////////////////////////////////////////////////////////////////////////////////////
// library stuff

template< class Rng, class Func >
Func for_each( Rng& rng, Func f ) {
    return std::for_each( boost::begin(rng), boost::end(rng), f );
};

template< class Rng, class Pred >
Rng& sort( Rng& rng, Pred pred ) {
    std::sort( boost::begin( rng ), boost::end( rng ), pred );
    return rng; // to allow function chaining, similar to operator+= et al.
}

template< class T >
boost::iterator_range< boost::counting_iterator<T> > make_counting_range( T const& tBegin, T const& tEnd ) {
    return boost::iterator_range< boost::counting_iterator<T> >( tBegin, tEnd );
}

template< class Func >
class compare_less_impl {
private:
    Func m_func;
public:
    typedef bool result_type;
    compare_less_impl( Func func ) 
    :   m_func( func )
    {}
    template< class T1, class T2 > bool operator()( T1 const& tLeft, T2 const& tRight ) const {
        return m_func( tLeft ) < m_func( tRight );
    }
};

template< class Func >
compare_less_impl<Func> compare_less( Func func ) {
    return compare_less_impl<Func>( func );
}


/////////////////////////////////////////////////////////////////////////////////////////////
// stable_unique

template<class forward_iterator, class predicate_type>
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd, predicate_type predLess) {
    typedef std::iterator_traits<forward_iterator>::difference_type index_type;
    struct SIteratorIndex {
        SIteratorIndex(forward_iterator itValue, index_type idx) : m_itValue(itValue), m_idx(idx) {}
        std::iterator_traits<forward_iterator>::reference Value() const {return *m_itValue;}
        index_type m_idx;
    private:
        forward_iterator m_itValue;
    };

    // {1} create array of values (represented by iterators) and indices
    std::vector<SIteratorIndex> vecitidx;
    vecitidx.reserve( std::distance(itBegin, itEnd) );
    struct FPushBackIteratorIndex {
        FPushBackIteratorIndex(std::vector<SIteratorIndex>& vecitidx) : m_vecitidx(vecitidx) {}
        void operator()(forward_iterator itValue) const {
            m_vecitidx.push_back( SIteratorIndex(itValue, m_vecitidx.size()) );
        }
    private:
        std::vector<SIteratorIndex>& m_vecitidx;
    };
    for_each( make_counting_range(itBegin, itEnd), FPushBackIteratorIndex(vecitidx) );

    // {2} sort by underlying value
    struct FStableCompareByValue {
        FStableCompareByValue(predicate_type predLess) : m_predLess(predLess) {}
        bool operator()(SIteratorIndex const& itidxA, SIteratorIndex const& itidxB) {
            return m_predLess(itidxA.Value(), itidxB.Value())
                // stable sort order, index is secondary criterion
                || !m_predLess(itidxB.Value(), itidxA.Value()) && itidxA.m_idx < itidxB.m_idx;
        }
    private:
        predicate_type m_predLess;
    };
    sort( vecitidx, FStableCompareByValue(predLess) );

    // {3} apply std::unique to the sorted vector, removing duplicate values
    vecitidx.erase(
        std::unique( vecitidx.begin(), vecitidx.end(),
            !boost::bind( predLess,
                // redundand boost::mem_fn required to compile
                boost::bind(boost::mem_fn(&SIteratorIndex::Value), _1),
                boost::bind(boost::mem_fn(&SIteratorIndex::Value), _2)
            )
        ),
        vecitidx.end()
    );

    // {4} re-sort by index to match original order
    sort( vecitidx, compare_less(boost::mem_fn(&SIteratorIndex::m_idx)) );

    // {5} keep only those values in the original range that were not removed by std::unique
    std::vector<SIteratorIndex>::iterator ititidx = vecitidx.begin();
    forward_iterator itSrc = itBegin;
    index_type idx = 0;
    for(;;) {
        if( ititidx==vecitidx.end() ) {
            // {6} return end of unique range
            return itSrc;
        }
        if( idx!=ititidx->m_idx ) {
            // original range must be modified
            break;
        }
        ++ititidx;
        ++idx;
        ++itSrc;
    }
    forward_iterator itDst = itSrc;
    do {
        ++idx;
        ++itSrc;
        // while there are still items in vecitidx, there must also be corresponding items in the original range
        if( idx==ititidx->m_idx ) {
            std::swap( *itDst, *itSrc ); // C++0x move
            ++ititidx;
            ++itDst;
        }
    } while( ititidx!=vecitidx.end() );

    // {6} return end of unique range
    return itDst;
}

template<class forward_iterator>
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd) {
    return stable_unique( itBegin, itEnd, std::less< std::iterator_traits<forward_iterator>::value_type >() );
}

void stable_unique_test() {
    std::vector<int> vecn;
    vecn.push_back(1);
    vecn.push_back(17);
    vecn.push_back(-100);
    vecn.push_back(17);
    vecn.push_back(1);
    vecn.push_back(17);
    vecn.push_back(53);
    vecn.erase( stable_unique(vecn.begin(), vecn.end()), vecn.end() );
    // result: 1, 17, -100, 53
}

回答by winterlight

Based on the @Corden's answer, but uses lambda expression and removes duplicates instead of writing them in the output vector

基于@Corden 的回答,但使用 lambda 表达式并删除重复项而不是将它们写入输出向量中

    set<int> s;
    vector<int> nodups;
    remove_copy_if(v.begin(), v.end(), back_inserter(nodups), 
        [&s](int x){ 
            return !s.insert(x).second; //-> .second false means already exists
        } ); 

回答by Yelonek

As far as i know there is none in stl. Look up reference.

据我所知,stl 中没有。查找参考