C# 匹配来自两个列表(或数组)的项目

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/419295/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 02:40:23  来源:igfitidea点击:

matching items from two lists (or arrays)

c#.netarrayslist

提问by dnord

I'm having a problem with my work that hopefully reduces to the following: I have two List<int>s, and I want to see if any of the ints in ListAare equal to any intin ListB. (They can be arrays if that makes life easier, but I think List<>has some built-in magic that might help.) And I'm sure this is a LINQ-friendly problem, but I'm working in 2.0 here.

我的工作遇到了一个问题,希望可以简化为以下问题:我有两个List<int>s,我想看看是否有任何ints inListA等于 any intin ListB。(如果这让生活更轻松,它们可以是数组,但我认为List<>有一些内置的魔法可能会有所帮助。)而且我确定这是一个 LINQ 友好的问题,但我在这里使用 2.0。

My solution so far has been to foreachthrough ListA, and then foreachthrough ListB,

到目前为止,我的解决方案是foreach通过 ListA,然后foreach通过 ListB,

foreach (int a in ListA)
{
    foreach (int b in ListB)
    {
        if (a == b)
        {
            return true;
        }
    }
}

which was actually pretty slick when they were each three items long, but now they're 200 long and they frequently don't match, so we get the worst-case of N^2 comparisons. Even 40,000 comparisons go by pretty fast, but I think I might be missing something, since N^2 seems pretty naive for this particular problem.

当它们每三个项目长时,这实际上非常光滑,但现在它们长 200 并且它们经常不匹配,所以我们得到了 N^2 比较的最坏情况。即使 40,000 次比较也进行得非常快,但我想我可能会遗漏一些东西,因为 N^2 对于这个特定问题似乎很幼稚。

Thanks!

谢谢!

采纳答案by casperOne

With LINQ, this is trivial, as you can call the Intersectextension methodon the Enumerableclassto give you the set intersection of the two arrays:

使用LINQ,这是微不足道的,因为您可以调用上的Intersect扩展方法来为您提供两个数组的集合交集:Enumerable

var intersection = ListA.Intersect(ListB);

However, this is the setintersection, meaning if ListAand ListBdon't have unique values in it, you won't get any copies. In other words if you have the following:

但是,这是集合交集,这意味着如果ListA其中ListB没有唯一值,您将不会获得任何副本。换句话说,如果你有以下几点:

var ListA = new [] { 0, 0, 1, 2, 3 };
var ListB = new [] { 0, 0, 0, 2 };

Then ListA.Intersect(ListB)produces:

然后ListA.Intersect(ListB)产生:

{ 0, 2 }

If you're expecting:

如果你期待:

{ 0, 0, 2 }

Then you're going to have to maintain a count of the items yourself and yield/decrement as you scan the two lists.

然后,您将不得不自己维护项目的数量,并在扫描两个列表时产生/递减。

First, you'd want to collect a Dictionary<TKey, int>with the lists of the individual items:

首先,您需要收集Dictionary<TKey, int>单个项目的列表:

var countsOfA = ListA.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count());

From there, you can scan ListBand place that in a list when you come across an item in countsOfA:

从那里,ListB当您遇到以下项目时,您可以扫描并将其放在列表中countsOfA

// The items that match.
IList<int> matched = new List<int>();

// Scan 
foreach (int b in ListB)
{
    // The count.
    int count;

    // If the item is found in a.
    if (countsOfA.TryGetValue(b, out count))
    {
        // This is positive.
        Debug.Assert(count > 0);

        // Add the item to the list.
        matched.Add(b);

        // Decrement the count.  If
        // 0, remove.
        if (--count == 0) countsOfA.Remove(b);
    }
}

You can wrap this up in an extension method that defers execution like so:

您可以将其封装在一个延迟执行的扩展方法中,如下所示:

public static IEnumerable<T> MultisetIntersect(this IEnumerable<T> first,
    IEnumerable<T> second)
{
    // Call the overload with the default comparer.
    return first.MultisetIntersect(second, EqualityComparer<T>.Default);
}

public static IEnumerable<T> MultisetIntersect(this IEnumerable<T> first,
    IEnumerable<T> second, IEqualityComparer<T> comparer)
{
    // Validate parameters.  Do this separately so check
    // is performed immediately, and not when execution
    // takes place.
    if (first == null) throw new ArgumentNullException("first");
    if (second == null) throw new ArgumentNullException("second");
    if (comparer == null) throw new ArgumentNullException("comparer");

    // Defer execution on the internal
    // instance.
    return first.MultisetIntersectImplementation(second, comparer);
}

private static IEnumerable<T> MultisetIntersectImplementation(
    this IEnumerable<T> first, IEnumerable<T> second, 
    IEqualityComparer<T> comparer)
{
    // Validate parameters.
    Debug.Assert(first != null);
    Debug.Assert(second != null);
    Debug.Assert(comparer != null);

    // Get the dictionary of the first.
    IDictionary<T, long> counts = first.GroupBy(t => t, comparer).
        ToDictionary(g => g.Key, g.LongCount(), comparer);

    // Scan 
    foreach (T t in second)
    {
        // The count.
        long count;

        // If the item is found in a.
        if (counts.TryGetValue(t, out count))
        {
            // This is positive.
            Debug.Assert(count > 0);

            // Yield the item.
            yield return t;

            // Decrement the count.  If
            // 0, remove.
            if (--count == 0) counts.Remove(t);
        }
    }
}

Note that both of these approaches are (and I apologize if I'm butchering Big-O notation here) O(N + M)where Nis the number of items in the first array, and Mis the number of items in the second array. You have to scan each list only once, and it's assumed that getting the hash codes and performing lookups on the hash codes is an O(1)(constant) operation.

请注意,这两种方法都是(如果我在这里使用 Big-O 表示法,我深表歉意)O(N + M)whereN是第一个数组中M的项目数,是第二个数组中的项目数。您只需扫描每个列表一次,并且假定获取哈希码并在哈希码上执行查找是一个O(1)(恒定)操作。

回答by shahkalpesh

How about using BinarySearch method instead of iterating over all elements in the inner loop?

如何使用 BinarySearch 方法而不是迭代内部循环中的所有元素?

回答by ChrisW

Load the whole of ListA into a HashSet instance, and then test foreach item in ListB against the HastSet: I'm pretty sure that this would be O(N).

将整个 ListA 加载到 HashSet 实例中,然后针对 HastSet 测试 ListB 中的每个项目:我很确定这将是 O(N)。

//untested code ahead
HashSet<int> hashSet = new HashSet<int>(ListA);
foreach (int i in ListB)
{
    if (hashSet.Contains(i))
        return true;
}

Here's the same thing in one line:

这是同一行中的同一件事:

return new HashSet<int>(ListA).Overlaps(ListB);

HashSet doesn't exist in .NET 3.5, so in .NET 2.0 you can use Dictionary<int,object>(instead of using HashSet<int>), and always store null as the object/value in the Dictionary since you're only interested in the keys.

HashSet 在 .NET 3.5 中不存在,因此在 .NET 2.0 中您可以使用Dictionary<int,object>(而不是使用HashSet<int>),并且始终将 null 存储为字典中的对象/值,因为您只对键感兴趣。

回答by Metro Smurf

Instead of iterating through each list, take a look at the List.Containsmethod:

不是遍历每个列表,而是查看List.Contains方法:

foreach (int a in ListA)
{
  if (ListB.Contains(a))
    return true;
}

回答by PolyThinker

Chris gives an O(N) solution by hashing. Now, depending on the constant factor (due to hashing), it might be worth considering an O(N log(N)) solution by sorting. There are a few different variants that you may consider depending on your use case.

Chris 通过散列给出了一个 O(N) 的解决方案。现在,根据常数因子(由于散列),可能值得通过排序来考虑 O(N log(N)) 解决方案。根据您的用例,您可以考虑几种不同的变体。

  1. Sort ListB ( O(N log(N) ), and use a searching algorithm to parse each element in ListA (which is again O(N) * O(log(N))).

  2. Sort both ListA and ListB ( O(N log(N) ), and use an O(N) algorithm to compare these lists for duplicates.

  1. 对 ListB 进行排序( O(N log(N) ),并使用搜索算法解析 ListA 中的每个元素(同样是 O(N) * O(log(N)))。

  2. 对 ListA 和 ListB 进行排序( O(N log(N) ),并使用 O(N) 算法比较这些列表的重复项。

If both lists are going to be used more than once, the second method is preferred.

如果两个列表都将被多次使用,则首选第二种方法。