在 C# 中跨多个列表查找公共项的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41159/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to find common items across multiple lists in C#
提问by JC.
Given the following:
鉴于以下情况:
List<List<Option>> optionLists;
what would be a quick way to determine the subset of Option objects that appear in all N lists? Equality is determined through some string property such as option1.Value == option2.Value.
确定出现在所有 N 个列表中的 Option 对象子集的快速方法是什么?相等性是通过一些字符串属性确定的,例如 option1.Value == option2.Value。
So we should end up with List<Option>
where each item appears only once.
所以我们应该得到List<Option>
每个项目只出现一次的地方。
采纳答案by Matt Hamilton
Ok, this will find the list of Option objects that have a Value appearing in everylist.
好的,这将找到在每个列表中都有一个 Value 的 Option 对象列表。
var x = from list in optionLists
from option in list
where optionLists.All(l => l.Any(o => o.Value == option.Value))
orderby option.Value
select option;
It doesn't do a "distinct" select so it'll return multiple Option objects, some of them with the same Value.
它不做“不同”的选择,所以它会返回多个 Option 对象,其中一些具有相同的值。
回答by Lasse V. Karlsen
Sort, then do something akin to a merge-sort.
排序,然后做一些类似于合并排序的事情。
Basically you would do this:
基本上你会这样做:
- Retrieve the first item from each list
- Compare the items, if equal, output
- If any of the items are before the others, sort-wise, retrieve a new item from the corresponding list to replace it, otherwise, retrieve new items to replace them all, from all the list
- As long as you still got items, go back to 2.
- 从每个列表中检索第一项
- 比较项,如果相等,输出
- 如果任何项目在其他项目之前,排序,从相应的列表中检索新项目以替换它,否则,从所有列表中检索新项目以替换它们
- 只要你还有物品,就回到2。
回答by sven
what about using a hashSet? that way you can do what you want in O(n) where n is the number of items in all the lists combined, and I think that's the fastest way to do it.
使用hashSet怎么样?这样你就可以在 O(n) 中做你想做的事情,其中 n 是所有列表中项目的数量组合,我认为这是最快的方法。
you just have to iterate over every list and insert the values you find into the hashset When you insert a key that already exists you will receive falseas the return value of the .add method,otherwise trueis returned
您只需要遍历每个列表并将您找到的值插入到哈希集中当您插入一个已经存在的键时,您将收到false作为.add 方法的返回值,否则返回true
回答by Skizz
Here's a much more efficent implementation:
这是一个更有效的实现:
static SortedDictionary<T,bool>.KeyCollection FindCommon<T> (List<List<T>> items)
{
SortedDictionary<T, bool>
current_common = new SortedDictionary<T, bool> (),
common = new SortedDictionary<T, bool> ();
foreach (List<T> list in items)
{
if (current_common.Count == 0)
{
foreach (T item in list)
{
common [item] = true;
}
}
else
{
foreach (T item in list)
{
if (current_common.ContainsKey(item))
common[item] = true;
else
common[item] = false;
}
}
if (common.Count == 0)
{
current_common.Clear ();
break;
}
SortedDictionary<T, bool>
swap = current_common;
current_common = common;
common = swap;
common.Clear ();
}
return current_common.Keys;
}
It works by creating a set of all items common to all lists processed so far and comparing each list with this set, creating a temporary set of the items common to the current list and the list of common items so far. Effectively an O(n.m) where n is the number of lists and m the number of items in the lists.
它的工作原理是创建一个所有项目的集合,这些项目对迄今为止处理的所有列表都是通用的,并将每个列表与这个集合进行比较,创建一个当前列表共有的项目和到目前为止的公共项目列表的临时集。实际上是一个 O(nm),其中 n 是列表的数量,m 是列表中的项目数量。
An example of using it:
使用它的一个例子:
static void Main (string [] args)
{
Random
random = new Random();
List<List<int>>
items = new List<List<int>>();
for (int i = 0 ; i < 10 ; ++i)
{
List<int>
list = new List<int> ();
items.Add (list);
for (int j = 0 ; j < 100 ; ++j)
{
list.Add (random.Next (70));
}
}
SortedDictionary<int, bool>.KeyCollection
common = FindCommon (items);
foreach (List<int> list in items)
{
list.Sort ();
}
for (int i = 0 ; i < 100 ; ++i)
{
for (int j = 0 ; j < 10 ; ++j)
{
System.Diagnostics.Trace.Write (String.Format ("{0,-4:D} ", items [j] [i]));
}
System.Diagnostics.Trace.WriteLine ("");
}
foreach (int item in common)
{
System.Diagnostics.Trace.WriteLine (String.Format ("{0,-4:D} ", item));
}
}
回答by Anthony Mastrean
I don't have the performance stats, but if you don't want to roll your own method, various collections libraries have a 'Set' or 'Set(T)' object that offer the usual set procedures. (listed in the order I would use them).
我没有性能统计数据,但如果您不想推出自己的方法,各种集合库都有一个“Set”或“Set(T)”对象,它们提供通常的设置过程。(按我使用它们的顺序列出)。
- IESI Collections(literally just Set classes)
- PowerCollections(not updated in a while)
- C5(never personally used)
- IESI 集合(字面上只是设置类)
- PowerCollections(暂时没有更新)
- C5(从未个人使用过)
回答by Emperor XLII
Building on Matt's answer, since we are only interested in options that all lists have in common, we can simply check for any options in the first list that the others share:
基于Matt 的回答,因为我们只对所有列表共有的选项感兴趣,所以我们可以简单地检查第一个列表中其他人共享的任何选项:
var sharedOptions =
from option in optionLists.First( ).Distinct( )
where optionLists.Skip( 1 ).All( l => l.Contains( option ) )
select option;
If an option list cannot contain duplicate entires, the Distinct
call is unnecessary. If the lists vary greatly in size, it would be better to iterate over the options in the shortest list, rather than whatever list happens to be First
. Sorted or hashed collections could be used to improve the lookup time of the Contains
call, though it should not make much difference for a moderate number of items.
如果选项列表不能包含重复的整数,Distinct
则不需要调用。如果列表的大小差异很大,最好迭代最短列表中的选项,而不是任何列表恰好是First
。排序或散列集合可用于改善Contains
调用的查找时间,尽管它对于中等数量的项目应该没有太大区别。
回答by logicnp
You can do this by counting occurrences of all items in all lists - those items whose occurrence count is equal to the number of lists, are common to all lists:
您可以通过计算所有列表中所有项目的出现次数来做到这一点 - 出现次数等于列表数量的项目对所有列表都是通用的:
static List<T> FindCommon<T>(IEnumerable<List<T>> lists)
{
Dictionary<T, int> map = new Dictionary<T, int>();
int listCount = 0; // number of lists
foreach (IEnumerable<T> list in lists)
{
listCount++;
foreach (T item in list)
{
// Item encountered, increment count
int currCount;
if (!map.TryGetValue(item, out currCount))
currCount = 0;
currCount++;
map[item] = currCount;
}
}
List<T> result= new List<T>();
foreach (KeyValuePair<T,int> kvp in map)
{
// Items whose occurrence count is equal to the number of lists are common to all the lists
if (kvp.Value == listCount)
result.Add(kvp.Key);
}
return result;
}
回答by user2102327
/// <summary>
/// The method FindCommonItems, returns a list of all the COMMON ITEMS in the lists contained in the listOfLists.
/// The method expects lists containing NO DUPLICATE ITEMS.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="allSets"></param>
/// <returns></returns>
public static List<T> FindCommonItems<T>(IEnumerable<List<T>> allSets)
{
Dictionary<T, int> map = new Dictionary<T, int>();
int listCount = 0; // Number of lists.
foreach (IEnumerable<T> currentSet in allSets)
{
int itemsCount = currentSet.ToList().Count;
HashSet<T> uniqueItems = new HashSet<T>();
bool duplicateItemEncountered = false;
listCount++;
foreach (T item in currentSet)
{
if (!uniqueItems.Add(item))
{
duplicateItemEncountered = true;
}
if (map.ContainsKey(item))
{
map[item]++;
}
else
{
map.Add(item, 1);
}
}
if (duplicateItemEncountered)
{
uniqueItems.Clear();
List<T> duplicateItems = new List<T>();
StringBuilder currentSetItems = new StringBuilder();
List<T> currentSetAsList = new List<T>(currentSet);
for (int i = 0; i < itemsCount; i++)
{
T currentItem = currentSetAsList[i];
if (!uniqueItems.Add(currentItem))
{
duplicateItems.Add(currentItem);
}
currentSetItems.Append(currentItem);
if (i < itemsCount - 1)
{
currentSetItems.Append(", ");
}
}
StringBuilder duplicateItemsNamesEnumeration = new StringBuilder();
int j = 0;
foreach (T item in duplicateItems)
{
duplicateItemsNamesEnumeration.Append(item.ToString());
if (j < uniqueItems.Count - 1)
{
duplicateItemsNamesEnumeration.Append(", ");
}
}
throw new Exception("The list " + currentSetItems.ToString() + " contains the following duplicate items: " + duplicateItemsNamesEnumeration.ToString());
}
}
List<T> result= new List<T>();
foreach (KeyValuePair<T, int> itemAndItsCount in map)
{
if (itemAndItsCount.Value == listCount) // Items whose occurrence count is equal to the number of lists are common to all the lists.
{
result.Add(itemAndItsCount.Key);
}
}
return result;
}
回答by user2102327
@Skizz The method is not correct. It returns also items that are not common to all the lists in items. Here is the corrected method:
@Skizz 方法不正确。它还返回项目中所有列表不通用的项目。下面是更正的方法:
/// <summary>.
/// The method FindAllCommonItemsInAllTheLists, returns a HashSet that contains all the common items in the lists contained in the listOfLists,
/// regardless of the order of the items in the various lists.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="listOfLists"></param>
/// <returns></returns>
public static HashSet<T> FindAllCommonItemsInAllTheLists<T>(List<List<T>> listOfLists)
{
if (listOfLists == null || listOfLists.Count == 0)
{
return null;
}
HashSet<T> currentCommon = new HashSet<T>();
HashSet<T> common = new HashSet<T>();
foreach (List<T> currentList in listOfLists)
{
if (currentCommon.Count == 0)
{
foreach (T item in currentList)
{
common.Add(item);
}
}
else
{
foreach (T item in currentList)
{
if (currentCommon.Contains(item))
{
common.Add(item);
}
}
}
if (common.Count == 0)
{
currentCommon.Clear();
break;
}
currentCommon.Clear(); // Empty currentCommon for a new iteration.
foreach (T item in common) /* Copy all the items contained in common to currentCommon.
* currentCommon = common;
* does not work because thus currentCommon and common would point at the same object and
* the next statement:
* common.Clear();
* will also clear currentCommon.
*/
{
if (!currentCommon.Contains(item))
{
currentCommon.Add(item);
}
}
common.Clear();
}
return currentCommon;
}
回答by birdus
After searching the 'net and not really coming up with something I liked (or that worked), I slept on it and came up with this. My SearchResult
is similar to your Option
. It has an EmployeeId
in it and that's the thing I need to be common across lists. I return all records that have an EmployeeId
in every list. It's not fancy, but it's simple and easy to understand, just what I like. For small lists (my case) it should perform just fine—and anyone can understand it!
在网上搜索并没有真正想出我喜欢(或有效)的东西后,我睡在上面并想出了这个。我SearchResult
的和你的类似Option
。它里面有一个EmployeeId
,这就是我需要在列表中通用的东西。我返回EmployeeId
在每个列表中都有的所有记录。不花哨,但简单易懂,正是我喜欢的。对于小列表(我的情况),它应该表现得很好——任何人都可以理解!
private List<SearchResult> GetFinalSearchResults(IEnumerable<IEnumerable<SearchResult>> lists)
{
Dictionary<int, SearchResult> oldList = new Dictionary<int, SearchResult>();
Dictionary<int, SearchResult> newList = new Dictionary<int, SearchResult>();
oldList = lists.First().ToDictionary(x => x.EmployeeId, x => x);
foreach (List<SearchResult> list in lists.Skip(1))
{
foreach (SearchResult emp in list)
{
if (oldList.Keys.Contains(emp.EmployeeId))
{
newList.Add(emp.EmployeeId, emp);
}
}
oldList = new Dictionary<int, SearchResult>(newList);
newList.Clear();
}
return oldList.Values.ToList();
}