C# 将列表拆分为 N 大小的较小列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11463734/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split a List into smaller lists of N size
提问by sazr
I am attempting to split a list into a series of smaller lists.
我正在尝试将列表拆分为一系列较小的列表。
My Problem:My function to split lists doesn't split them into lists of the correct size. It should split them into lists of size 30 but instead it splits them into lists of size 114?
我的问题:我拆分列表的功能不会将它们拆分为正确大小的列表。它应该将它们拆分为大小为 30 的列表,而是将它们拆分为大小为 114 的列表?
How can I make my function split a list into X number of Lists of size 30 or less?
如何让我的函数将列表拆分为 X 个大小为30 或更小的列表?
public static List<List<float[]>> splitList(List <float[]> locations, int nSize=30)
{
List<List<float[]>> list = new List<List<float[]>>();
for (int i=(int)(Math.Ceiling((decimal)(locations.Count/nSize))); i>=0; i--) {
List <float[]> subLocat = new List <float[]>(locations);
if (subLocat.Count >= ((i*nSize)+nSize))
subLocat.RemoveRange(i*nSize, nSize);
else subLocat.RemoveRange(i*nSize, subLocat.Count-(i*nSize));
Debug.Log ("Index: "+i.ToString()+", Size: "+subLocat.Count.ToString());
list.Add (subLocat);
}
return list;
}
If I use the function on a list of size 144 then the output is:
如果我在大小为 144 的列表上使用该函数,则输出为:
Index: 4, Size: 120
Index: 3, Size: 114
Index: 2, Size: 114
Index: 1, Size: 114
Index: 0, Size: 114
索引:4,大小:120
索引:3,大小:114
索引:2,大小:114
索引:1,大小:114
索引:0,大小:114
采纳答案by Serj-Tm
public static List<List<float[]>> SplitList(List<float[]> locations, int nSize=30)
{
var list = new List<List<float[]>>();
for (int i = 0; i < locations.Count; i += nSize)
{
list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
}
return list;
}
Generic version:
通用版本:
public static IEnumerable<List<T>> SplitList<T>(List<T> locations, int nSize=30)
{
for (int i = 0; i < locations.Count; i += nSize)
{
yield return locations.GetRange(i, Math.Min(nSize, locations.Count - i));
}
}
回答by Rafal
how about:
怎么样:
while(locations.Any())
{
list.Add(locations.Take(nSize).ToList());
locations= locations.Skip(nSize).ToList();
}
回答by Tianzhen Lin
I have a generic method that would take any types include float, and it's been unit-tested, hope it helps:
我有一个通用方法,可以采用任何类型,包括浮点数,并且已经过单元测试,希望它有所帮助:
/// <summary>
/// Breaks the list into groups with each group containing no more than the specified group size
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="values">The values.</param>
/// <param name="groupSize">Size of the group.</param>
/// <returns></returns>
public static List<List<T>> SplitList<T>(IEnumerable<T> values, int groupSize, int? maxCount = null)
{
List<List<T>> result = new List<List<T>>();
// Quick and special scenario
if (values.Count() <= groupSize)
{
result.Add(values.ToList());
}
else
{
List<T> valueList = values.ToList();
int startIndex = 0;
int count = valueList.Count;
int elementCount = 0;
while (startIndex < count && (!maxCount.HasValue || (maxCount.HasValue && startIndex < maxCount)))
{
elementCount = (startIndex + groupSize > count) ? count - startIndex : groupSize;
result.Add(valueList.GetRange(startIndex, elementCount));
startIndex += elementCount;
}
}
return result;
}
回答by Dmitry Pavlov
I would suggest to use this extension method to chunk the source list to the sub-lists by specified chunk size:
我建议使用此扩展方法按指定的块大小将源列表分块到子列表:
/// <summary>
/// Helper methods for the lists.
/// </summary>
public static class ListExtensions
{
public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / chunkSize)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
}
For example, if you chunk the list of 18 items by 5 items per chunk, it gives you the list of 4 sub-lists with the following items inside: 5-5-5-3.
例如,如果您将 18 个项目的列表按每个块 5 个项目进行分块,则会为您提供 4 个子列表的列表,其中包含以下项目:5-5-5-3。
回答by equintas
Serj-Tm solution is fine, also this is the generic version as extension method for lists (put it into a static class):
Serj-Tm 解决方案很好,这也是作为列表扩展方法的通用版本(将其放入静态类):
public static List<List<T>> Split<T>(this List<T> items, int sliceSize = 30)
{
List<List<T>> list = new List<List<T>>();
for (int i = 0; i < items.Count; i += sliceSize)
list.Add(items.GetRange(i, Math.Min(sliceSize, items.Count - i)));
return list;
}
回答by Linas
I find accepted answer (Serj-Tm) most robust, but I'd like to suggest a generic version.
我发现公认的答案 (Serj-Tm) 最强大,但我想建议一个通用版本。
public static List<List<T>> splitList<T>(List<T> locations, int nSize = 30)
{
var list = new List<List<T>>();
for (int i = 0; i < locations.Count; i += nSize)
{
list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
}
return list;
}
回答by Sidron
Library MoreLinq have method called Batch
库 MoreLinq 有方法调用 Batch
List<int> ids = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 }; // 10 elements
int counter = 1;
foreach(var batch in ids.Batch(2))
{
foreach(var eachId in batch)
{
Console.WriteLine("Batch: {0}, Id: {1}", counter, eachId);
}
counter++;
}
Result is
结果是
Batch: 1, Id: 1
Batch: 1, Id: 2
Batch: 2, Id: 3
Batch: 2, Id: 4
Batch: 3, Id: 5
Batch: 3, Id: 6
Batch: 4, Id: 7
Batch: 4, Id: 8
Batch: 5, Id: 9
Batch: 5, Id: 0
idsare splitted into 5 chunks with 2 elements.
ids被分成 5 个块,有 2 个元素。
回答by Harald Coppoolse
Addition after very useful comment of mhand at the end
在最后 mhand 的非常有用的评论之后添加
Original answer
原答案
Although most solutions might work, I think they are not very efficiently. Suppose if you only want the first few items of the first few chunks. Then you wouldn't want to iterate over all (zillion) items in your sequence.
尽管大多数解决方案可能有效,但我认为它们不是很有效。假设您只想要前几个块的前几个项目。那么您就不想遍历序列中的所有(无数)项。
The following will at utmost enumerate twice: once for the Take and once for the Skip. It won't enumerate over any more elements than you will use:
以下将最多枚举两次:一次用于 Take,一次用于 Skip。它不会枚举比您将使用的元素更多的元素:
public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>
(this IEnumerable<TSource> source, int chunkSize)
{
while (source.Any()) // while there are elements left
{ // still something to chunk:
yield return source.Take(chunkSize); // return a chunk of chunkSize
source = source.Skip(chunkSize); // skip the returned chunk
}
}
How many times will this Enumerate the sequence?
这将枚举序列多少次?
Suppose you divide your source into chunks of chunkSize. You enumerate only the first N chunks. From every enumerated chunk you'll only enumerate the first M elements.
假设您将源分为chunkSize. 您仅枚举前 N 个块。从每个枚举块中,您将只枚举前 M 个元素。
While(source.Any())
{
...
}
the Any will get the Enumerator, do 1 MoveNext() and returns the returned value after Disposing the Enumerator. This will be done N times
Any 将获取枚举器,执行 1 MoveNext() 并在处理枚举器后返回返回值。这将做N次
yield return source.Take(chunkSize);
According to the reference sourcethis will do something like:
根据参考来源,这将执行以下操作:
public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source, int count)
{
return TakeIterator<TSource>(source, count);
}
static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
{
foreach (TSource element in source)
{
yield return element;
if (--count == 0) break;
}
}
This doesn't do a lot until you start enumerating over the fetched Chunk. If you fetch several Chunks, but decide not to enumerate over the first Chunk, the foreach is not executed, as your debugger will show you.
在您开始枚举获取的 Chunk 之前,这不会做很多事情。如果您获取多个块,但决定不枚举第一个块,则不会执行 foreach,因为您的调试器会显示给您。
If you decide to take the first M elements of the first chunk then the yield return is executed exactly M times. This means:
如果您决定采用第一个块的前 M 个元素,则 yield return 将执行 M 次。这意味着:
- get the enumerator
- call MoveNext() and Current M times.
- Dispose the enumerator
- 获取枚举器
- 调用 MoveNext() 和 Current M 次。
- 处理枚举器
After the first chunk has been yield returned, we skip this first Chunk:
在返回第一个块后,我们跳过第一个块:
source = source.Skip(chunkSize);
Once again: we'll take a look at reference sourceto find the skipiterator
再次:我们将查看参考源以找到skipiterator
static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
{
using (IEnumerator<TSource> e = source.GetEnumerator())
{
while (count > 0 && e.MoveNext()) count--;
if (count <= 0)
{
while (e.MoveNext()) yield return e.Current;
}
}
}
As you see, the SkipIteratorcalls MoveNext()once for every element in the Chunk. It doesn't call Current.
如您所见,对 Chunk 中的每个元素SkipIterator调用MoveNext()一次。它不会调用Current.
So per Chunk we see that the following is done:
因此,每个 Chunk 我们看到完成了以下操作:
- Any(): GetEnumerator; 1 MoveNext(); Dispose Enumerator;
Take():
- nothing if the content of the chunk is not enumerated.
If the content is enumerated: GetEnumerator(), one MoveNext and one Current per enumerated item, Dispose enumerator;
Skip(): for every chunk that is enumerated (NOT the contents of the chunk): GetEnumerator(), MoveNext() chunkSize times, no Current! Dispose enumerator
- Any(): GetEnumerator; 1 移动下一步();处置枚举器;
拿():
- 如果块的内容没有被枚举,则什么都没有。
如果枚举内容:GetEnumerator(),每个枚举项一个 MoveNext 和一个 Current,Dispose enumerator;
Skip():对于枚举的每个块(不是块的内容):GetEnumerator(), MoveNext() chunkSize 次,没有 Current!配置枚举器
If you look at what happens with the enumerator, you'll see that there are a lot of calls to MoveNext(), and only calls to Currentfor the TSource items you actually decide to access.
如果您查看枚举器发生的情况,您会发现有很多对 MoveNext() 的调用,并且只调用了Current您实际决定访问的 TSource 项。
If you take N Chunks of size chunkSize, then calls to MoveNext()
如果取 N 个大小为 chunkSize 的块,则调用 MoveNext()
- N times for Any()
- not yet any time for Take, as long as you don't enumerate the Chunks
- N times chunkSize for Skip()
- Any() 的 N 次
- 还没到 Take 的时候,只要你不列举 Chunks
- Skip() 的 N 倍 chunkSize
If you decide to enumerate only the first M elements of every fetched chunk, then you need to call MoveNext M times per enumerated Chunk.
如果您决定仅枚举每个获取的块的前 M 个元素,那么您需要为每个枚举的块调用 MoveNext M 次。
The total
总数
MoveNext calls: N + N*M + N*chunkSize
Current calls: N*M; (only the items you really access)
So if you decide to enumerate all elements of all chunks:
因此,如果您决定枚举所有块的所有元素:
MoveNext: numberOfChunks + all elements + all elements = about twice the sequence
Current: every item is accessed exactly once
Whether MoveNext is a lot of work or not, depends on the type of source sequence. For lists and arrays it is a simple index increment, with maybe an out of range check.
MoveNext 是否需要大量工作,取决于源序列的类型。对于列表和数组,它是一个简单的索引增量,可能带有超出范围的检查。
But if your IEnumerable is the result of a database query, make sure that the data is really materialized on your computer, otherwise the data will be fetched several times. DbContext and Dapper will properly transfer the data to local process before it can be accessed. If you enumerate the same sequence several times it is not fetched several times. Dapper returns an object that is a List, DbContext remembers that the data is already fetched.
但是如果你的 IEnumerable 是数据库查询的结果,请确保数据确实在你的计算机上具体化,否则数据将被多次获取。DbContext 和 Dapper 会在数据被访问之前正确地将数据传输到本地进程。如果多次枚举相同的序列,则不会多次提取。Dapper 返回一个 List 对象,DbContext 记住数据已经被获取。
It depends on your Repository whether it is wise to call AsEnumerable() or ToLists() before you start to divide the items in Chunks
在开始划分块中的项目之前调用 AsEnumerable() 或 ToLists() 是否明智取决于您的存储库
回答by mhand
While plenty of the answers above do the job, they all fail horribly on a never ending sequence (or a really long sequence). The following is a completely on-line implementation which guarantees best time and memory complexity possible. We only iterate the source enumerable exactly once and use yield return for lazy evaluation. The consumer could throw away the list on each iteration making the memory footprint equal to that of the list w/ batchSizenumber of elements.
虽然上面的很多答案都可以解决问题,但它们在永无止境的序列(或非常长的序列)中都失败了。下面是一个完全在线的实现,它保证了最佳的时间和内存复杂度。我们只迭代一次可枚举的源,并使用 yield return 进行惰性评估。消费者可以在每次迭代时丢弃列表,从而使内存占用量等于带有batchSize元素数量的列表的内存占用量。
public static IEnumerable<List<T>> BatchBy<T>(this IEnumerable<T> enumerable, int batchSize)
{
using (var enumerator = enumerable.GetEnumerator())
{
List<T> list = null;
while (enumerator.MoveNext())
{
if (list == null)
{
list = new List<T> {enumerator.Current};
}
else if (list.Count < batchSize)
{
list.Add(enumerator.Current);
}
else
{
yield return list;
list = new List<T> {enumerator.Current};
}
}
if (list?.Count > 0)
{
yield return list;
}
}
}
EDIT: Just now realizing the OP asks about breaking a List<T>into smaller List<T>, so my comments regarding infinite enumerables aren't applicable to the OP, but may help others who end up here. These comments were in response to other posted solutions that do use IEnumerable<T>as an input to their function, yet enumerate the source enumerable multiple times.
编辑:刚刚意识到 OP 要求将 aList<T>分解为更小的List<T>,所以我关于无限可枚举的评论不适用于 OP,但可能会帮助其他人。这些评论是对其他发布的解决方案的回应,这些解决方案确实IEnumerable<T>用作其函数的输入,但多次枚举可枚举的源。
回答by Scott Hannen
public static IEnumerable<IEnumerable<T>> SplitIntoSets<T>
(this IEnumerable<T> source, int itemsPerSet)
{
var sourceList = source as List<T> ?? source.ToList();
for (var index = 0; index < sourceList.Count; index += itemsPerSet)
{
yield return sourceList.Skip(index).Take(itemsPerSet);
}
}

