更优雅的方法来检查 C++ 数组中的重复项？

Question

提问by Saladin Akara

I wrote this code in C++ as part of a uni task where I need to ensure that there are no duplicates within an array:

我在 C++ 中编写了这段代码，作为 uni 任务的一部分，我需要确保数组中没有重复项：

// Check for duplicate numbers in user inputted data
    int i; // Need to declare i here so that it can be accessed by the 'inner' loop that starts on line 21
    for(i = 0;i < 6; i++) { // Check each other number in the array
        for(int j = i; j < 6; j++) { // Check the rest of the numbers
            if(j != i) { // Makes sure don't check number against itself
                if(userNumbers[i] == userNumbers[j]) {
                    b = true;
                }
            }
            if(b == true) { // If there is a duplicate, change that particular number
                cout << "Please re-enter number " << i + 1 << ". Duplicate numbers are not allowed:" << endl;
                cin >> userNumbers[i];
            }
        } // Comparison loop
        b = false; // Reset the boolean after each number entered has been checked
    } // Main check loop

It works perfectly, but I'd like to know if there is a more elegant or efficient way to check.

它工作得很好，但我想知道是否有更优雅或更有效的检查方法。

Answer 1

回答by Puppy

You could sort the array in O(nlog(n)), then simply look until the next number. That is substantially faster than your O(n^2) existing algorithm. The code is also a lot cleaner. Your code also doesn't ensure no duplicates were inserted when they were re-entered. You need to prevent duplicates from existing in the first place.

您可以在 O(nlog(n)) 中对数组进行排序，然后只需查看下一个数字即可。这比您的 O(n^2) 现有算法快得多。代码也干净了很多。您的代码也不能确保在重新输入时没有插入重复项。您首先需要防止重复。

std::sort(userNumbers.begin(), userNumbers.end());
for(int i = 0; i < userNumbers.size() - 1; i++) {
    if (userNumbers[i] == userNumbers[i + 1]) {
        userNumbers.erase(userNumbers.begin() + i);
        i--;
    }
}

I also second the reccomendation to use a std::set - no duplicates there.

我还推荐使用 std::set - 那里没有重复项。

Answer 2

回答by fredoverflow

The following solution is based on sorting the numbers and then removing the duplicates:

以下解决方案基于对数字进行排序然后删除重复项：

#include <algorithm>

int main()
{
    int userNumbers[6];

    // ...

    int* end = userNumbers + 6;
    std::sort(userNumbers, end);
    bool containsDuplicates = (std::unique(userNumbers, end) != end);
}

Answer 3

回答by Paul Michalik

Indeed, the fastest and as far I can see most elegant method is as advised above:

事实上，我能看到的最快和最优雅的方法是如上所建议的：

std::vector<int> tUserNumbers;
// ...
std::set<int> tSet(tUserNumbers.begin(), tUserNumbers.end());
std::vector<int>(tSet.begin(), tSet.end()).swap(tUserNumbers);

It is O(n log n). This however does not make it, if the ordering of the numbers in the input array needs to be kept... In this case I did:

它是 O(n log n)。然而，如果需要保留输入数组中数字的顺序，这并没有做到……在这种情况下，我做了：

    std::set<int> tTmp;
    std::vector<int>::iterator tNewEnd = 
        std::remove_if(tUserNumbers.begin(), tUserNumbers.end(), 
        [&tTmp] (int pNumber) -> bool {
            return (!tTmp.insert(pNumber).second);
    });
    tUserNumbers.erase(tNewEnd, tUserNumbers.end());

which is still O(n log n) and keeps the original ordering of elements in tUserNumbers.

这仍然是 O(n log n) 并保持中元素的原始顺序tUserNumbers。

Cheers,

干杯，

Paul

保罗

Answer 4

回答by Benoit Thiery

You can add all elements in a set and check when adding if it is already present or not. That would be more elegant and efficient.

您可以添加集合中的所有元素，并在添加时检查它是否已经存在。那会更优雅和高效。

Answer 5

回答by Josh Sanders

I'm not sure why this hasn't been suggested but here is a way in base 10 to find duplicates in O(n).. The problem I see with the already suggested O(n) solution is that it requires that the digits be sorted first.. This method is O(n) and does not require the set to be sorted. The cool thing is that checking if a specific digit has duplicates is O(1). I know this thread is probably dead but maybe it will help somebody! :)

我不确定为什么没有建议这样做，但这里有一种以 10 为基数的方法，可以在 O(n) 中找到重复项。我在已经建议的 O(n) 解决方案中看到的问题是它需要数字先排序.. 这种方法是 O(n) 并且不需要对集合进行排序。很酷的事情是检查特定数字是否重复是 O(1)。我知道这个线程可能已经死了，但也许它会帮助某人！:)

/*
============================
Foo
============================
* 
   Takes in a read only unsigned int. A table is created to store counters 
   for each digit. If any digit's counter is flipped higher than 1, function
   returns. For example, with 48778584:
    0   1   2   3   4   5   6   7   8   9
   [0] [0] [0] [0] [2] [1] [0] [2] [2] [0]

   When we iterate over this array, we find that 4 is duplicated and immediately
   return false.

*/
bool Foo( unsigned const int &number)
{
    int temp = number;
    int digitTable[10]={0};

    while(temp > 0)
    {
        digitTable[temp % 10]++; // Last digit's respective index.
        temp /= 10; // Move to next digit
    }

    for (int i=0; i < 10; i++)
    {
        if (digitTable [i] > 1)
        {
            return false;
        }
    }
    return true;
}

Answer 6

回答by ViFI

It is in extension to the answer by @Puppy, which is the current best answer.

它是@Puppy 答案的扩展，这是当前的最佳答案。

PS : I tried to insert this post as comment in the current best answer by @Puppy but couldn't so as I don't have 50 points yet. Also a bit of experimental data is shared here for further help.

PS：我试图在@Puppy 的当前最佳答案中插入这篇文章作为评论，但不能这样，因为我还没有 50 分。这里还分享了一些实验数据以获取进一步帮助。

Both std::set and std::map are implemented in STL using Balanced Binary Search tree only. So both will lead to a complexity of O(nlogn) only in this case. While the better performance can be achieved if a hash table is used. std::unordered_map offers hash table based implementation for faster search. I experimented with all three implementations and found the results using std::unordered_map to be better than std::set and std::map. Results and code are shared below. Images are the snapshot of performance measured by LeetCodeon the solutions.

std::set 和 std::map 都仅使用平衡二叉搜索树在 STL 中实现。因此，只有在这种情况下，两者都会导致 O(nlogn) 的复杂性。如果使用哈希表，则可以获得更好的性能。std::unordered_map 提供基于哈希表的实现以加快搜索速度。我对所有三种实现进行了试验，发现使用 std::unordered_map 的结果比 std::set 和 std::map 更好。结果和代码在下面共享。图像是LeetCode在解决方案上测量的性能快照。

bool hasDuplicate(vector<int>& nums) {
    size_t count = nums.size();
    if (!count)
        return false;
    std::unordered_map<int, int> tbl;
    //std::set<int> tbl;
    for (size_t i = 0; i < count; i++) {
        if (tbl.find(nums[i]) != tbl.end())
            return true;
        tbl[nums[i]] = 1;
        //tbl.insert(nums[i]);
    }
    return false;
}

bool hasDuplicate(vector<int>& nums) {
    size_t count = nums.size();
    if (!count)
        return false;
    std::unordered_map<int, int> tbl;
    //std::set<int> tbl;
    for (size_t i = 0; i < count; i++) {
        if (tbl.find(nums[i]) != tbl.end())
            return true;
        tbl[nums[i]] = 1;
        //tbl.insert(nums[i]);
    }
    return false;
}

unordered_mapPerformance (Run time was 52 ms here)

unordered_map性能（此处的运行时间为 52 毫秒）

Set/MapPerformance

设置/映射性能

Answer 7

回答by leonbloy

It's ok, specially for small array lengths. I'd use more efficient aproaches (less than n^2/2 comparisons) if the array is mugh bigger - see DeadMG's answer.

没关系，特别是对于小数组长度。如果数组更大，我会使用更有效的方法（小于 n^2/2 比较） - 请参阅 DeadMG 的答案。

Some small corrections for your code:

对您的代码进行一些小的更正：

Instead of int j = iwriteint j = i +1and you can omit your if(j != i)test
You should't need to declare ivariable outside the forstatement.

而不是int j = i写int j = i +1，你可以省略你的if(j != i)测试
您不需要i在for语句之外声明变量。

Answer 8

回答by FaridLU

#include<iostream>
#include<algorithm>

int main(){

    int arr[] = {3, 2, 3, 4, 1, 5, 5, 5};
    int len = sizeof(arr) / sizeof(*arr); // Finding length of array

    std::sort(arr, arr+len);

    int unique_elements = std::unique(arr, arr+len) - arr;

    if(unique_elements == len) std::cout << "Duplicate number is not present here\n";
    else std::cout << "Duplicate number present in this array\n";

    return 0;
}

Answer 9

回答by coh

//std::unique(_copy) requires a sorted container.
std::sort(cont.begin(), cont.end());

//testing if cont has duplicates
std::unique(cont.begin(), cont.end()) != cont.end();

//getting a new container with no duplicates
std::unique_copy(cont.begin(), cont.end(), std::back_inserter(cont2));

Answer 10

回答by Michael Jaison G

As mentioned by @underscore_d, an elegant and efficient solution would be,

正如@underscore_d 所提到的，一个优雅而有效的解决方案是，

#include <algorithm>
#include <vector>

template <class Iterator>
bool has_duplicates(Iterator begin, Iterator end) {
    using T = typename std::iterator_traits<Iterator>::value_type;
    std::vector<T> values(begin, end);

    std::sort(values.begin(), values.end());
    return (std::adjacent_find(values.begin(), values.end()) != values.end());
}

int main() {
    int user_ids[6];
    // ...
    std::cout << has_duplicates(user_ids, user_ids + 6) << std::endl;
}

更优雅的方法来检查 C++ 数组中的重复项？

提问by Saladin Akara

回答by Puppy

回答by fredoverflow

回答by Paul Michalik

回答by Benoit Thiery

回答by Josh Sanders

回答by ViFI

回答by leonbloy

回答by FaridLU

回答by coh

回答by Michael Jaison G

相关推荐

最近更新

标签

更优雅的方法来检查 C++ 数组中的重复项？

提问by Saladin Akara

回答by Puppy

回答by fredoverflow

回答by Paul Michalik

回答by Benoit Thiery

回答by Josh Sanders

回答by ViFI

回答by leonbloy

回答by FaridLU

回答by coh

回答by Michael Jaison G

相关推荐

C++ CMakeLists.txt:30 (project) 中的 CMake 错误：找不到 CMAKE_C_COMPILER

C++ 使用模板时得到“不能出现在常量表达式中”

C++ MSVCP140.dll 丢失

C++ 模板 - 多种类型

相关推荐

最近更新

标签