.net 无论项目的顺序如何,比较两个集合的相等性

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 09:46:09  来源:igfitidea点击:

Comparing two collections for equality irrespective of the order of items in them

.netcollectionscomparisonequality

提问by mbillard

I would like to compare two collections (in C#), but I'm not sure of the best way to implement this efficiently.

我想比较两个集合(在 C# 中),但我不确定有效实现它的最佳方法。

I've read the other thread about Enumerable.SequenceEqual, but it's not exactly what I'm looking for.

我已经阅读了关于Enumerable.SequenceEqual的另一个线程,但这并不是我想要的。

In my case, two collections would be equal if they both contain the same items (no matter the order).

就我而言,如果两个集合都包含相同的项目(无论顺序如何),则它们将相等。

Example:

例子:

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1 == collection2; // true

What I usually do is to loop through each item of one collection and see if it exists in the other collection, then loop through each item of the other collection and see if it exists in the first collection. (I start by comparing the lengths).

我通常做的是遍历一个集合的每个项目,看看它是否存在于另一个集合中,然后遍历另一个集合的每个项目,看看它是否存在于第一个集合中。(我从比较长度开始)。

if (collection1.Count != collection2.Count)
    return false; // the collections are not equal

foreach (Item item in collection1)
{
    if (!collection2.Contains(item))
        return false; // the collections are not equal
}

foreach (Item item in collection2)
{
    if (!collection1.Contains(item))
        return false; // the collections are not equal
}

return true; // the collections are equal

However, this is not entirely correct, and it's probably not the most efficient way to do compare two collections for equality.

然而,这并不完全正确,而且这可能不是比较两个集合是否相等的最有效方法。

An example I can think of that would be wrong is:

我能想到的一个例子是错误的:

collection1 = {1, 2, 3, 3, 4}
collection2 = {1, 2, 2, 3, 4}

Which would be equal with my implementation. Should I just count the number of times each item is found and make sure the counts are equal in both collections?

这与我的实现相同。我应该只计算找到每个项目的次数并确保两个集合中的计数相等吗?



The examples are in some sort of C# (let's call it pseudo-C#), but give your answer in whatever language you wish, it does not matter.

这些示例使用某种 C#(我们称其为伪 C#),但是用您希望的任何语言给出答案,这无关紧要。

Note:I used integers in the examples for simplicity, but I want to be able to use reference-type objects too (they do not behave correctly as keys because only the reference of the object is compared, not the content).

注意:为了简单起见,我在示例中使用了整数,但我也希望能够使用引用类型对象(它们作为键的行为不正确,因为只比较对象的引用,而不是内容)。

采纳答案by Ohad Schneider

It turns out Microsoft already has this covered in its testing framework: CollectionAssert.AreEquivalent

事实证明,微软已经在其测试框架中涵盖了这一点:CollectionAssert.AreEquivalent

Remarks

Two collections are equivalent if they have the same elements in the same quantity, but in any order. Elements are equal if their values are equal, not if they refer to the same object.

评论

如果两个集合具有相同数量但顺序任意的相同元素,则它们是等价的。如果元素的值相等,则元素相等,而不是引用同一个对象。

Using reflector, I modified the code behind AreEquivalent() to create a corresponding equality comparer. It is more complete than existing answers, since it takes nulls into account, implements IEqualityComparer and has some efficiency and edge case checks. plus, it's Microsoft:)

使用反射器,我修改了 AreEquivalent() 后面的代码以创建相应的相等比较器。它比现有答案更完整,因为它考虑了空值,实现了 IEqualityComparer 并具有一些效率和边缘情况检查。另外,它是微软:)

public class MultiSetComparer<T> : IEqualityComparer<IEnumerable<T>>
{
    private readonly IEqualityComparer<T> m_comparer;
    public MultiSetComparer(IEqualityComparer<T> comparer = null)
    {
        m_comparer = comparer ?? EqualityComparer<T>.Default;
    }

    public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
    {
        if (first == null)
            return second == null;

        if (second == null)
            return false;

        if (ReferenceEquals(first, second))
            return true;

        if (first is ICollection<T> firstCollection && second is ICollection<T> secondCollection)
        {
            if (firstCollection.Count != secondCollection.Count)
                return false;

            if (firstCollection.Count == 0)
                return true;
        }

        return !HaveMismatchedElement(first, second);
    }

    private bool HaveMismatchedElement(IEnumerable<T> first, IEnumerable<T> second)
    {
        int firstNullCount;
        int secondNullCount;

        var firstElementCounts = GetElementCounts(first, out firstNullCount);
        var secondElementCounts = GetElementCounts(second, out secondNullCount);

        if (firstNullCount != secondNullCount || firstElementCounts.Count != secondElementCounts.Count)
            return true;

        foreach (var kvp in firstElementCounts)
        {
            var firstElementCount = kvp.Value;
            int secondElementCount;
            secondElementCounts.TryGetValue(kvp.Key, out secondElementCount);

            if (firstElementCount != secondElementCount)
                return true;
        }

        return false;
    }

    private Dictionary<T, int> GetElementCounts(IEnumerable<T> enumerable, out int nullCount)
    {
        var dictionary = new Dictionary<T, int>(m_comparer);
        nullCount = 0;

        foreach (T element in enumerable)
        {
            if (element == null)
            {
                nullCount++;
            }
            else
            {
                int num;
                dictionary.TryGetValue(element, out num);
                num++;
                dictionary[element] = num;
            }
        }

        return dictionary;
    }

    public int GetHashCode(IEnumerable<T> enumerable)
    {
        if (enumerable == null) throw new ArgumentNullException(nameof(enumerable));

        int hash = 17;

        foreach (T val in enumerable.OrderBy(x => x))
            hash = hash * 23 + (val?.GetHashCode() ?? 42);

        return hash;
    }
}

Sample usage:

示例用法:

var set = new HashSet<IEnumerable<int>>(new[] {new[]{1,2,3}}, new MultiSetComparer<int>());
Console.WriteLine(set.Contains(new [] {3,2,1})); //true
Console.WriteLine(set.Contains(new [] {1, 2, 3, 3})); //false

Or if you just want to compare two collections directly:

或者,如果您只想直接比较两个集合:

var comp = new MultiSetComparer<string>();
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","c","b"})); //true
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","b"})); //false

Finally, you can use your an equality comparer of your choice:

最后,您可以使用您选择的相等比较器:

var strcomp = new MultiSetComparer<string>(StringComparer.OrdinalIgnoreCase);
Console.WriteLine(strcomp.Equals(new[] {"a", "b"}, new []{"B", "A"})); //true

回答by Ohad Schneider

A simple and fairly efficient solution is to sort both collections and then compare them for equality:

一个简单且相当有效的解决方案是对两个集合进行排序,然后比较它们是否相等:

bool equal = collection1.OrderBy(i => i).SequenceEqual(
                 collection2.OrderBy(i => i));

This algorithm is O(N*logN), while your solution above is O(N^2).

这个算法是 O(N*logN),而你上面的解决方案是 O(N^2)。

If the collections have certain properties, you may be able to implement a faster solution. For example, if both of your collections are hash sets, they cannot contain duplicates. Also, checking whether a hash set contains some element is very fast. In that case an algorithm similar to yours would likely be fastest.

如果集合具有某些属性,您也许能够实现更快的解决方案。例如,如果您的两个集合都是哈希集,则它们不能包含重复项。此外,检查散列集是否包含某些元素非常快。在这种情况下,类似于您的算法可能是最快的。

回答by Daniel Jennings

Create a Dictionary "dict" and then for each member in the first collection, do dict[member]++;

创建一个字典“dict”,然后对于第一个集合中的每个成员,执行 dict[member]++;

Then, loop over the second collection in the same way, but for each member do dict[member]--.

然后,以相同的方式循环第二个集合,但对每个成员执行 dict[member]--。

At the end, loop over all of the members in the dictionary:

最后,遍历字典中的所有成员:

    private bool SetEqual (List<int> left, List<int> right) {

        if (left.Count != right.Count)
            return false;

        Dictionary<int, int> dict = new Dictionary<int, int>();

        foreach (int member in left) {
            if (dict.ContainsKey(member) == false)
                dict[member] = 1;
            else
                dict[member]++;
        }

        foreach (int member in right) {
            if (dict.ContainsKey(member) == false)
                return false;
            else
                dict[member]--;
        }

        foreach (KeyValuePair<int, int> kvp in dict) {
            if (kvp.Value != 0)
                return false;
        }

        return true;

    }

Edit: As far as I can tell this is on the same order as the most efficient algorithm. This algorithm is O(N), assuming that the Dictionary uses O(1) lookups.

编辑:据我所知,这与最有效的算法顺序相同。这个算法是 O(N),假设 Dictionary 使用 O(1) 查找。

回答by mbillard

This is my (heavily influenced by D.Jennings) generic implementation of the comparison method (in C#):

这是我的(深受 D.Jennings 影响)比较方法的通用实现(在 C# 中):

/// <summary>
/// Represents a service used to compare two collections for equality.
/// </summary>
/// <typeparam name="T">The type of the items in the collections.</typeparam>
public class CollectionComparer<T>
{
    /// <summary>
    /// Compares the content of two collections for equality.
    /// </summary>
    /// <param name="foo">The first collection.</param>
    /// <param name="bar">The second collection.</param>
    /// <returns>True if both collections have the same content, false otherwise.</returns>
    public bool Execute(ICollection<T> foo, ICollection<T> bar)
    {
        // Declare a dictionary to count the occurence of the items in the collection
        Dictionary<T, int> itemCounts = new Dictionary<T,int>();

        // Increase the count for each occurence of the item in the first collection
        foreach (T item in foo)
        {
            if (itemCounts.ContainsKey(item))
            {
                itemCounts[item]++;
            }
            else
            {
                itemCounts[item] = 1;
            }
        }

        // Wrap the keys in a searchable list
        List<T> keys = new List<T>(itemCounts.Keys);

        // Decrease the count for each occurence of the item in the second collection
        foreach (T item in bar)
        {
            // Try to find a key for the item
            // The keys of a dictionary are compared by reference, so we have to
            // find the original key that is equivalent to the "item"
            // You may want to override ".Equals" to define what it means for
            // two "T" objects to be equal
            T key = keys.Find(
                delegate(T listKey)
                {
                    return listKey.Equals(item);
                });

            // Check if a key was found
            if(key != null)
            {
                itemCounts[key]--;
            }
            else
            {
                // There was no occurence of this item in the first collection, thus the collections are not equal
                return false;
            }
        }

        // The count of each item should be 0 if the contents of the collections are equal
        foreach (int value in itemCounts.Values)
        {
            if (value != 0)
            {
                return false;
            }
        }

        // The collections are equal
        return true;
    }
}

回答by Joel Gauvreau

You could use a Hashset. Look at the SetEqualsmethod.

您可以使用Hashset。查看SetEquals方法。

回答by Pier-Lionel Sgard

If you use Shouldly, you can use ShouldAllBe with Contains.

如果您使用Shouldly,则可以将 ShouldAllBe 与 Contains 一起使用。

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1.ShouldAllBe(item=>collection2.Contains(item)); // true

And finally, you can write an extension.

最后,您可以编写扩展程序。

public static class ShouldlyIEnumerableExtensions
{
    public static void ShouldEquivalentTo<T>(this IEnumerable<T> list, IEnumerable<T> equivalent)
    {
        list.ShouldAllBe(l => equivalent.Contains(l));
    }
}

UPDATE

更新

A optional parameter exists on ShouldBemethod.

ShouldBe方法上存在一个可选参数。

collection1.ShouldBe(collection2, ignoreOrder: true); // true

回答by Pier-Lionel Sgard

EDIT: I realized as soon as I posed that this really only works for sets -- it will not properly deal with collections that have duplicate items. For example { 1, 1, 2 } and { 2, 2, 1 } will be considered equal from this algorithm's perspective. If your collections are sets (or their equality can be measured that way), however, I hope you find the below useful.

编辑:我一提出就意识到这真的只适用于集合——它不能正确处理具有重复项的集合。例如,从该算法的角度来看,{1,1,2} 和 {2,2,1} 将被视为相等。但是,如果您的集合是集合(或者可以通过这种方式衡量它们的相等性),我希望您发现以下内容有用。

The solution I use is:

我使用的解决方案是:

return c1.Count == c2.Count && c1.Intersect(c2).Count() == c1.Count;

Linq does the dictionary thing under the covers, so this is also O(N). (Note, it's O(1) if the collections aren't the same size).

Linq 在幕后做字典的事情,所以这也是 O(N)。(请注意,如果集合大小不同,则为 O(1))。

I did a sanity check using the "SetEqual" method suggested by Daniel, the OrderBy/SequenceEquals method suggested by Igor, and my suggestion. The results are below, showing O(N*LogN) for Igor and O(N) for mine and Daniel's.

我使用 Daniel 建议的“SetEqual”方法、Igor 建议的 OrderBy/SequenceEquals 方法和我的建议进行了完整性检查。结果如下,显示了 Igor 的 O(N*LogN) 和我和 Daniel 的 O(N)。

I think the simplicity of the Linq intersect code makes it the preferable solution.

我认为 Linq 相交代码的简单性使其成为首选的解决方案。

__Test Latency(ms)__
N, SetEquals, OrderBy, Intersect    
1024, 0, 0, 0    
2048, 0, 0, 0    
4096, 31.2468, 0, 0    
8192, 62.4936, 0, 0    
16384, 156.234, 15.6234, 0    
32768, 312.468, 15.6234, 46.8702    
65536, 640.5594, 46.8702, 31.2468    
131072, 1312.3656, 93.7404, 203.1042    
262144, 3765.2394, 187.4808, 187.4808    
524288, 5718.1644, 374.9616, 406.2084    
1048576, 11420.7054, 734.2998, 718.6764    
2097152, 35090.1564, 1515.4698, 1484.223

回答by Ohad Schneider

In the case of no repeats and no order, the following EqualityComparer can be used to allow collections as dictionary keys:

在没有重复和顺序的情况下,可以使用以下 EqualityComparer 允许集合作为字典键:

public class SetComparer<T> : IEqualityComparer<IEnumerable<T>> 
where T:IComparable<T>
{
    public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
    {
        if (first == second)
            return true;
        if ((first == null) || (second == null))
            return false;
        return first.ToHashSet().SetEquals(second);
    }

    public int GetHashCode(IEnumerable<T> enumerable)
    {
        int hash = 17;

        foreach (T val in enumerable.OrderBy(x => x))
            hash = hash * 23 + val.GetHashCode();

        return hash;
    }
}

Hereis the ToHashSet() implementation I used. The hash code algorithmcomes from Effective Java (by way of Jon Skeet).

是我使用的 ToHashSet() 实现。该散列码算法来自有效的Java(由乔恩飞碟双向的方式)。

回答by palswim

static bool SetsContainSameElements<T>(IEnumerable<T> set1, IEnumerable<T> set2) {
    var setXOR = new HashSet<T>(set1);
    setXOR.SymmetricExceptWith(set2);
    return (setXOR.Count == 0);
}

Solution requires .NET 3.5 and the System.Collections.Genericnamespace. According to Microsoft, SymmetricExceptWithis an O(n + m)operation, with nrepresenting the number of elements in the first set and mrepresenting the number of elements in the second. You could always add an equality comparer to this function if necessary.

解决方案需要 .NET 3.5 和System.Collections.Generic命名空间。根据微软的说法SymmetricExceptWith是一个O(n + m)运算,其中n表示第一组中的元素数,m表示第二组中的元素数。如有必要,您始终可以向此函数添加相等比较器。

回答by Korayem

Why not use .Except()

为什么不使用 .Exception()

// Create the IEnumerable data sources.
string[] names1 = System.IO.File.ReadAllLines(@"../../../names1.txt");
string[] names2 = System.IO.File.ReadAllLines(@"../../../names2.txt");
// Create the query. Note that method syntax must be used here.
IEnumerable<string> differenceQuery =   names1.Except(names2);
// Execute the query.
Console.WriteLine("The following lines are in names1.txt but not names2.txt");
foreach (string s in differenceQuery)
     Console.WriteLine(s);

http://msdn.microsoft.com/en-us/library/bb397894.aspx

http://msdn.microsoft.com/en-us/library/bb397894.aspx