C# 使用 LINQ 将列表拆分为子列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/419019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split List into Sublists with LINQ
提问by Felipe Lima
Is there any way I can separate a List<SomeObject>
into several separate lists of SomeObject
, using the item index as the delimiter of each split?
有什么办法可以将 aList<SomeObject>
分成几个单独的 列表SomeObject
,使用项目索引作为每个拆分的分隔符?
Let me exemplify:
让我举例说明:
I have a List<SomeObject>
and I need a List<List<SomeObject>>
or List<SomeObject>[]
, so that each of these resulting lists will contain a group of 3 items of the original list (sequentially).
我有一个List<SomeObject>
,我需要一个List<List<SomeObject>>
or List<SomeObject>[]
,这样每个结果列表都将包含原始列表的一组 3 个项目(按顺序)。
eg.:
例如。:
Original List:
[a, g, e, w, p, s, q, f, x, y, i, m, c]
Resulting lists:
[a, g, e], [w, p, s], [q, f, x], [y, i, m], [c]
原列表:
[a, g, e, w, p, s, q, f, x, y, i, m, c]
结果列表:
[a, g, e], [w, p, s], [q, f, x], [y, i, m], [c]
I'd also need the resulting lists size to be a parameter of this function.
我还需要生成的列表大小作为此函数的参数。
采纳答案by JaredPar
Try the following code.
试试下面的代码。
public static IList<IList<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 3)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
The idea is to first group the elements by indexes. Dividing by three has the effect of grouping them into groups of 3. Then convert each group to a list and the IEnumerable
of List
to a List
of List
s
这个想法是首先按索引对元素进行分组。除以三的效果是将它们分成3个一组。然后将每个组转换为一个列表,将IEnumerable
of转换List
为a List
of List
s
回答by Jobo
If the list is of type system.collections.generic you can use the "CopyTo" method available to copy elements of your array to other sub arrays. You specify the start element and number of elements to copy.
如果列表是 system.collections.generic 类型,您可以使用“CopyTo”方法将数组元素复制到其他子数组。您指定要复制的起始元素和元素数。
You could also make 3 clones of your original list and use the "RemoveRange" on each list to shrink the list to the size you want.
您还可以制作 3 个原始列表的克隆,并使用每个列表上的“RemoveRange”将列表缩小到您想要的大小。
Or just create a helper method to do it for you.
或者只是创建一个辅助方法来为你做这件事。
回答by casperOne
You coulduse a number of queries that use Take
and Skip
, but that would add too many iterations on the original list, I believe.
您可以使用许多使用Take
and的查询Skip
,但我相信这会在原始列表上添加太多迭代。
Rather, I think you should create an iterator of your own, like so:
相反,我认为您应该创建自己的迭代器,如下所示:
public static IEnumerable<IEnumerable<T>> GetEnumerableOfEnumerables<T>(
IEnumerable<T> enumerable, int groupSize)
{
// The list to return.
List<T> list = new List<T>(groupSize);
// Cycle through all of the items.
foreach (T item in enumerable)
{
// Add the item.
list.Add(item);
// If the list has the number of elements, return that.
if (list.Count == groupSize)
{
// Return the list.
yield return list;
// Set the list to a new list.
list = new List<T>(groupSize);
}
}
// Return the remainder if there is any,
if (list.Count != 0)
{
// Return the list.
yield return list;
}
}
You can then call this and it is LINQ enabled so you can perform other operations on the resulting sequences.
然后您可以调用它并且它启用了 LINQ,因此您可以对结果序列执行其他操作。
In light of Sam's answer, I felt there was an easier way to do this without:
根据Sam 的回答,我觉得有一种更简单的方法可以做到这一点:
- Iterating through the list again (which I didn't do originally)
- Materializing the items in groups before releasing the chunk (for large chunks of items, there would be memory issues)
- All of the code that Sam posted
- 再次遍历列表(我最初没有这样做)
- 在释放块之前将项目分组实现(对于大块的项目,会有内存问题)
- Sam 发布的所有代码
That said, here's another pass, which I've codified in an extension method to IEnumerable<T>
called Chunk
:
也就是说,这是另一个传递,我已将其编入扩展方法中以IEnumerable<T>
调用Chunk
:
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
// Validate parameters.
if (source == null) throw new ArgumentNullException("source");
if (chunkSize <= 0) throw new ArgumentOutOfRangeException("chunkSize",
"The chunkSize parameter must be a positive value.");
// Call the internal implementation.
return source.ChunkInternal(chunkSize);
}
Nothing surprising up there, just basic error checking.
没有什么令人惊讶的,只是基本的错误检查。
Moving on to ChunkInternal
:
继续ChunkInternal
:
private static IEnumerable<IEnumerable<T>> ChunkInternal<T>(
this IEnumerable<T> source, int chunkSize)
{
// Validate parameters.
Debug.Assert(source != null);
Debug.Assert(chunkSize > 0);
// Get the enumerator. Dispose of when done.
using (IEnumerator<T> enumerator = source.GetEnumerator())
do
{
// Move to the next element. If there's nothing left
// then get out.
if (!enumerator.MoveNext()) yield break;
// Return the chunked sequence.
yield return ChunkSequence(enumerator, chunkSize);
} while (true);
}
Basically, it gets the IEnumerator<T>
and manually iterates through each item. It checks to see if there any items currently to be enumerated. After each chunk is enumerated through, if there aren't any items left, it breaks out.
基本上,它获取IEnumerator<T>
并手动迭代每个项目。它检查当前是否有任何要枚举的项目。枚举完每个块后,如果没有任何项目剩余,它就会爆发。
Once it detects there are items in the sequence, it delegates the responsibility for the inner IEnumerable<T>
implementation to ChunkSequence
:
一旦它检测到序列中有项目,它就会将内部IEnumerable<T>
实现的责任委托给ChunkSequence
:
private static IEnumerable<T> ChunkSequence<T>(IEnumerator<T> enumerator,
int chunkSize)
{
// Validate parameters.
Debug.Assert(enumerator != null);
Debug.Assert(chunkSize > 0);
// The count.
int count = 0;
// There is at least one item. Yield and then continue.
do
{
// Yield the item.
yield return enumerator.Current;
} while (++count < chunkSize && enumerator.MoveNext());
}
Since MoveNext
was already called on the IEnumerator<T>
passed to ChunkSequence
, it yields the item returned by Current
and then increments the count, making sure never to return more than chunkSize
items and moving to the next item in the sequence after every iteration (but short-circuited if the number of items yielded exceeds the chunk size).
由于MoveNext
已经在IEnumerator<T>
传递的 to上调用ChunkSequence
,它产生由返回的项目Current
,然后增加计数,确保永远不会返回超过chunkSize
项目并在每次迭代后移动到序列中的下一个项目(但如果数量超过产生的项目超过块大小)。
If there are no items left, then the InternalChunk
method will make another pass in the outer loop, but when MoveNext
is called a second time, it will still return false, as per the documentation(emphasis mine):
如果没有剩余的项目,那么该InternalChunk
方法将在外循环中再次传递,但是当MoveNext
第二次调用时,它仍然会返回 false,根据文档(强调我的):
If MoveNext passes the end of the collection, the enumerator is positioned after the last element in the collection and MoveNext returns false. When the enumerator is at this position, subsequent calls to MoveNext also return false until Reset is called.
如果 MoveNext 到达集合的末尾,则枚举器位于集合中的最后一个元素之后,并且 MoveNext 返回 false。当枚举器位于此位置时,对 MoveNext 的后续调用也会返回 false,直到调用 Reset。
At this point, the loop will break, and the sequence of sequences will terminate.
此时,循环将中断,序列序列将终止。
This is a simple test:
这是一个简单的测试:
static void Main()
{
string s = "agewpsqfxyimc";
int count = 0;
// Group by three.
foreach (IEnumerable<char> g in s.Chunk(3))
{
// Print out the group.
Console.Write("Group: {0} - ", ++count);
// Print the items.
foreach (char c in g)
{
// Print the item.
Console.Write(c + ", ");
}
// Finish the line.
Console.WriteLine();
}
}
Output:
输出:
Group: 1 - a, g, e,
Group: 2 - w, p, s,
Group: 3 - q, f, x,
Group: 4 - y, i, m,
Group: 5 - c,
An important note, this will notwork if you don't drain the entire child sequence or break at any point in the parent sequence. This is an important caveat, but if your use case is that you will consume everyelement of the sequence of sequences, then this will work for you.
一个重要的注意,这会不会,如果你不亲本序列中任何一点耗尽整个子序列或破工作。这是一个重要的警告,但如果您的用例是您将使用序列序列的每个元素,那么这对您有用。
Additionally, it will do strange things if you play with the order, just as Sam's did at one point.
此外,如果您按照订单进行操作,它会做一些奇怪的事情,就像Sam 曾经做过的那样。
回答by Amy B
Here's a list splitting routine I wrote a couple months ago:
这是我几个月前写的列表拆分例程:
public static List<List<T>> Chunk<T>(
List<T> theList,
int chunkSize
)
{
List<List<T>> result = theList
.Select((x, i) => new {
data = x,
indexgroup = i / chunkSize
})
.GroupBy(x => x.indexgroup, x => x.data)
.Select(g => new List<T>(g))
.ToList();
return result;
}
回答by mwHymanson
We found David B's solution worked the best. But we adapted it to a more general solution:
我们发现 David B 的解决方案效果最好。但我们将其调整为更通用的解决方案:
list.GroupBy(item => item.SomeProperty)
.Select(group => new List<T>(group))
.ToArray();
回答by CaseyB
This question is a bit old, but I just wrote this, and I think it's a little more elegant than the other proposed solutions:
这个问题有点老了,但我刚刚写了这个,我认为它比其他提出的解决方案更优雅:
/// <summary>
/// Break a list of items into chunks of a specific size
/// </summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
while (source.Any())
{
yield return source.Take(chunksize);
source = source.Skip(chunksize);
}
}
回答by Sam Saffron
In general the approach suggested by CaseyBworks fine, in fact if you are passing in a List<T>
it is hard to fault it, perhaps I would change it to:
一般来说,CaseyB建议的方法工作正常,事实上,如果你传入 aList<T>
很难出错,也许我会将其更改为:
public static IEnumerable<IEnumerable<T>> ChunkTrivialBetter<T>(this IEnumerable<T> source, int chunksize)
{
var pos = 0;
while (source.Skip(pos).Any())
{
yield return source.Skip(pos).Take(chunksize);
pos += chunksize;
}
}
Which will avoid massive call chains. Nonetheless, this approach has a general flaw. It materializes two enumerations per chunk, to highlight the issue try running:
这将避免大量调用链。尽管如此,这种方法有一个普遍的缺陷。它实现了每个块的两个枚举,以突出问题尝试运行:
foreach (var item in Enumerable.Range(1, int.MaxValue).Chunk(8).Skip(100000).First())
{
Console.WriteLine(item);
}
// wait forever
To overcome this we can try Cameron'sapproach, which passes the above test in flying colors as it only walks the enumeration once.
为了克服这个问题,我们可以尝试Cameron 的方法,它通过了上述测试,因为它只遍历枚举一次。
Trouble is that it has a different flaw, it materializes every item in each chunk, the trouble with that approach is that you run high on memory.
问题是它有一个不同的缺陷,它实现了每个块中的每个项目,这种方法的问题是你的内存占用很大。
To illustrate that try running:
为了说明这一点,请尝试运行:
foreach (var item in Enumerable.Range(1, int.MaxValue)
.Select(x => x + new string('x', 100000))
.Clump(10000).Skip(100).First())
{
Console.Write('.');
}
// OutOfMemoryException
Finally, any implementation should be able to handle out of order iteration of chunks, for example:
最后,任何实现都应该能够处理块的乱序迭代,例如:
Enumerable.Range(1,3).Chunk(2).Reverse().ToArray()
// should return [3],[1,2]
Many highly optimal solutions like my first revisionof this answer failed there. The same issue can be seen in casperOne's optimizedanswer.
许多高度优化的解决方案,比如我对这个答案的第一次修订,在那里失败了。在casperOne 的优化答案中可以看到同样的问题。
To address all these issues you can use the following:
要解决所有这些问题,您可以使用以下方法:
namespace ChunkedEnumerator
{
public static class Extensions
{
class ChunkedEnumerable<T> : IEnumerable<T>
{
class ChildEnumerator : IEnumerator<T>
{
ChunkedEnumerable<T> parent;
int position;
bool done = false;
T current;
public ChildEnumerator(ChunkedEnumerable<T> parent)
{
this.parent = parent;
position = -1;
parent.wrapper.AddRef();
}
public T Current
{
get
{
if (position == -1 || done)
{
throw new InvalidOperationException();
}
return current;
}
}
public void Dispose()
{
if (!done)
{
done = true;
parent.wrapper.RemoveRef();
}
}
object System.Collections.IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
position++;
if (position + 1 > parent.chunkSize)
{
done = true;
}
if (!done)
{
done = !parent.wrapper.Get(position + parent.start, out current);
}
return !done;
}
public void Reset()
{
// per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
throw new NotSupportedException();
}
}
EnumeratorWrapper<T> wrapper;
int chunkSize;
int start;
public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start)
{
this.wrapper = wrapper;
this.chunkSize = chunkSize;
this.start = start;
}
public IEnumerator<T> GetEnumerator()
{
return new ChildEnumerator(this);
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
class EnumeratorWrapper<T>
{
public EnumeratorWrapper (IEnumerable<T> source)
{
SourceEumerable = source;
}
IEnumerable<T> SourceEumerable {get; set;}
Enumeration currentEnumeration;
class Enumeration
{
public IEnumerator<T> Source { get; set; }
public int Position { get; set; }
public bool AtEnd { get; set; }
}
public bool Get(int pos, out T item)
{
if (currentEnumeration != null && currentEnumeration.Position > pos)
{
currentEnumeration.Source.Dispose();
currentEnumeration = null;
}
if (currentEnumeration == null)
{
currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
}
item = default(T);
if (currentEnumeration.AtEnd)
{
return false;
}
while(currentEnumeration.Position < pos)
{
currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
currentEnumeration.Position++;
if (currentEnumeration.AtEnd)
{
return false;
}
}
item = currentEnumeration.Source.Current;
return true;
}
int refs = 0;
// needed for dispose semantics
public void AddRef()
{
refs++;
}
public void RemoveRef()
{
refs--;
if (refs == 0 && currentEnumeration != null)
{
var copy = currentEnumeration;
currentEnumeration = null;
copy.Source.Dispose();
}
}
}
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
if (chunksize < 1) throw new InvalidOperationException();
var wrapper = new EnumeratorWrapper<T>(source);
int currentPos = 0;
T ignore;
try
{
wrapper.AddRef();
while (wrapper.Get(currentPos, out ignore))
{
yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos);
currentPos += chunksize;
}
}
finally
{
wrapper.RemoveRef();
}
}
}
class Program
{
static void Main(string[] args)
{
int i = 10;
foreach (var group in Enumerable.Range(1, int.MaxValue).Skip(10000000).Chunk(3))
{
foreach (var n in group)
{
Console.Write(n);
Console.Write(" ");
}
Console.WriteLine();
if (i-- == 0) break;
}
var stuffs = Enumerable.Range(1, 10).Chunk(2).ToArray();
foreach (var idx in new [] {3,2,1})
{
Console.Write("idx " + idx + " ");
foreach (var n in stuffs[idx])
{
Console.Write(n);
Console.Write(" ");
}
Console.WriteLine();
}
/*
10000001 10000002 10000003
10000004 10000005 10000006
10000007 10000008 10000009
10000010 10000011 10000012
10000013 10000014 10000015
10000016 10000017 10000018
10000019 10000020 10000021
10000022 10000023 10000024
10000025 10000026 10000027
10000028 10000029 10000030
10000031 10000032 10000033
idx 3 7 8
idx 2 5 6
idx 1 3 4
*/
Console.ReadKey();
}
}
}
There is also a round of optimisations you could introduce for out-of-order iteration of chunks, which is out of scope here.
您还可以为块的乱序迭代引入一轮优化,这超出了此处的范围。
As to which method you should choose? It totally depends on the problem you are trying to solve. If you are not concerned with the first flaw the simple answer is incredibly appealing.
至于您应该选择哪种方法?这完全取决于您要解决的问题。如果您不关心第一个缺陷,那么简单的答案非常有吸引力。
Noteas with most methods, this is not safe for multi threading, stuff can get weird if you wish to make it thread safe you would need to amend EnumeratorWrapper
.
请注意,与大多数方法一样,这对于多线程来说是不安全的,如果您希望使其线程安全,则需要修改EnumeratorWrapper
.
回答by dahlbyk
System.Interactiveprovides Buffer()
for this purpose. Some quick testing shows performance is similar to Sam's solution.
System.InteractiveBuffer()
为此目的而提供。一些快速测试表明性能类似于 Sam 的解决方案。
回答by Cameron MacFarland
I wrote a Clump extension method several years ago. Works great, and is the fastest implementation here. :P
几年前我写了一个 Clump 扩展方法。效果很好,是这里最快的实现。:P
/// <summary>
/// Clumps items into same size lots.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source">The source list of items.</param>
/// <param name="size">The maximum size of the clumps to make.</param>
/// <returns>A list of list of items, where each list of items is no bigger than the size given.</returns>
public static IEnumerable<IEnumerable<T>> Clump<T>(this IEnumerable<T> source, int size)
{
if (source == null)
throw new ArgumentNullException("source");
if (size < 1)
throw new ArgumentOutOfRangeException("size", "size must be greater than 0");
return ClumpIterator<T>(source, size);
}
private static IEnumerable<IEnumerable<T>> ClumpIterator<T>(IEnumerable<T> source, int size)
{
Debug.Assert(source != null, "source is null.");
T[] items = new T[size];
int count = 0;
foreach (var item in source)
{
items[count] = item;
count++;
if (count == size)
{
yield return items;
items = new T[size];
count = 0;
}
}
if (count > 0)
{
if (count == size)
yield return items;
else
{
T[] tempItems = new T[count];
Array.Copy(items, tempItems, count);
yield return tempItems;
}
}
}
回答by Colonel Panic
We can improve @JaredPar's solution to do true lazy evaluation. We use a GroupAdjacentBy
method that yields groups of consecutive elements with the same key:
我们可以改进@JaredPar 的解决方案来做真正的懒惰评估。我们使用一种GroupAdjacentBy
方法来生成具有相同键的连续元素组:
sequence
.Select((x, i) => new { Value = x, Index = i })
.GroupAdjacentBy(x=>x.Index/3)
.Select(g=>g.Select(x=>x.Value))
Because the groups are yielded one-by-one, this solution works efficiently with long or infinite sequences.
由于这些组是一对一产生的,因此该解决方案可以有效地处理长序列或无限序列。