C# 将 IEnumerable<T> 拆分为固定大小的块(返回 IEnumerable<IEnumerable<T>>,其中内部序列的长度是固定的)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/13709626/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split an IEnumerable<T> into fixed-sized chunks (return an IEnumerable<IEnumerable<T>> where the inner sequences are of fixed length)
提问by Alastair Maw
I want to take an IEnumerable<T>and split it up into fixed-sized chunks.
我想把IEnumerable<T>它分成固定大小的块。
I have this, but it seems inelegant due to all the list creation/copying:
我有这个,但由于所有列表创建/复制,它似乎不优雅:
private static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    List<T> partition = new List<T>(partitionSize);
    foreach (T item in items)
    {
        partition.Add(item);
        if (partition.Count == partitionSize)
        {
            yield return partition;
            partition = new List<T>(partitionSize);
        }
    }
    // Cope with items.Count % partitionSize != 0
    if (partition.Count > 0) yield return partition;
}
Is there something more idiomatic?
有没有更地道的东西?
EDIT: Although this has been marked as a duplicate of Divide array into an array of subsequence arrayit is not - that question deals with splitting an array, whereas this is about IEnumerable<T>. In addition that question requires that the last subsequence is padded. The two questions are closely related but aren't the same.
编辑:虽然这已被标记为Divide array into a array of subsequence array的副本,但它不是 - 该问题涉及拆分数组,而这是关于IEnumerable<T>. 此外,该问题要求填充最后一个子序列。这两个问题密切相关,但又不一样。
采纳答案by takemyoxygen
You could try to implement Batch method mentioned above on your own like this:
您可以尝试像这样自己实现上面提到的 Batch 方法:
    static class MyLinqExtensions 
    { 
        public static IEnumerable<IEnumerable<T>> Batch<T>( 
            this IEnumerable<T> source, int batchSize) 
        { 
            using (var enumerator = source.GetEnumerator()) 
                while (enumerator.MoveNext()) 
                    yield return YieldBatchElements(enumerator, batchSize - 1); 
        } 
        private static IEnumerable<T> YieldBatchElements<T>( 
            IEnumerator<T> source, int batchSize) 
        { 
            yield return source.Current; 
            for (int i = 0; i < batchSize && source.MoveNext(); i++) 
                yield return source.Current; 
        } 
    }
I've grabbed this code from http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx.
我从http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx 获取了这段代码。
UPDATE: Please note, that this implementation not only lazily evaluates batches but also items inside batches, which means it will only produce correct results when batch is enumerated only after all previous batches were enumerated. For example:
更新:请注意,此实现不仅会延迟评估批次,还会延迟评估批次内的项目,这意味着只有在枚举所有先前批次之后才枚举批次时,它才会产生正确的结果。例如:
public static void Main(string[] args)
{
    var xs = Enumerable.Range(1, 20);
    Print(xs.Batch(5).Skip(1)); // should skip first batch with 5 elements
}
public static void Print<T>(IEnumerable<IEnumerable<T>> batches)
{
    foreach (var batch in batches)
    {
        Console.WriteLine($"[{string.Join(", ", batch)}]");
    }
}
will output:
将输出:
[2, 3, 4, 5, 6] //only first element is skipped.
[7, 8, 9, 10, 11]
[12, 13, 14, 15, 16]
[17, 18, 19, 20]
So, if you use case assumes batching when batches are sequentially evaluated, then lazy solution above will work, otherwise if you can't guarantee strictly sequential batch processing (e.g. when you want to process batches in parallel), you will probably need a solution which eagerly enumerates batch content, similar to one mentioned in the question above or in the MoreLINQ
因此,如果您的用例假设在按顺序评估批次时进行批处理,那么上面的懒惰解决方案将起作用,否则如果您不能保证严格的顺序批处理(例如,当您想并行处理批次时),您可能需要一个解决方案它急切地枚举批处理内容,类似于上面的问题或MoreLINQ 中提到的内容
回答by Christoffer
How about the partitioner classes in the System.Collections.Concurrentnamespace?
System.Collections.Concurrent命名空间中的分区器类怎么样?
回答by JustAnotherUser
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, 
                                                       int partitionSize)
{
    int i = 0;
    return items.GroupBy(x => i++ / partitionSize).ToArray();
}
回答by L.B
Maybe?
也许?
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    return items.Select((item, inx) => new { item, inx })
                .GroupBy(x => x.inx / partitionSize)
                .Select(g => g.Select(x => x.item));
}
There is an already implemented one too: morelinq's Batch.
还有一个已经实现的:morelinq 的Batch。
回答by Adam Maras
You can do this using an overload of Enumerable.GroupByand taking advantage of integer division.
您可以使用整数除法的重载Enumerable.GroupBy并利用整数除法来做到这一点。
return items.Select((element, index) => new { Element = element, Index = index })
    .GroupBy(obj => obj.Index / partitionSize, (_, partition) => partition);
回答by Tilak
For elegant solution, You can also have a look at MoreLinq.Batch.
对于优雅的解决方案,您还可以查看MoreLinq.Batch。
It batches the source sequence into sized buckets.
它将源序列分批处理到大小合适的桶中。
Example:
例子:
int[] ints = new int[] {1,2,3,4,5,6};
var batches = ints.Batch(2); // batches -> [0] : 1,2 ; [1]:3,4 ; [2] :5,6
回答by Sergey Teplyakov
Craziest solution (with Reactive Extensions):
最疯狂的解决方案(使用Reactive Extensions):
public static IEnumerable<IList<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    return items
            .ToObservable() // Converting sequence to observable sequence
            .Buffer(partitionSize) // Splitting it on spececified "partitions"
            .ToEnumerable(); // Converting it back to ordinary sequence
}
I know that I changed signature but anyway we all know that we'll have some fixed size collection as a chunk.
我知道我改变了签名,但无论如何我们都知道我们会有一些固定大小的集合作为一个块。
BTW if you will use iterator block do not forget to split your implementation into two methods to validate arguments eagerly!
顺便说一句,如果您将使用迭代器块,请不要忘记将您的实现分成两个方法来急切地验证参数!
回答by Jeppe Stig Nielsen
It feels like you want twoiterator blocks ("yield returnmethods"). I wrote this extension method:
感觉就像你想要两个迭代器块(“yield return方法”)。我写了这个扩展方法:
static class Extensions
{
  public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
  {
    return new PartitionHelper<T>(items, partitionSize);
  }
  private sealed class PartitionHelper<T> : IEnumerable<IEnumerable<T>>
  {
    readonly IEnumerable<T> items;
    readonly int partitionSize;
    bool hasMoreItems;
    internal PartitionHelper(IEnumerable<T> i, int ps)
    {
      items = i;
      partitionSize = ps;
    }
    public IEnumerator<IEnumerable<T>> GetEnumerator()
    {
      using (var enumerator = items.GetEnumerator())
      {
        hasMoreItems = enumerator.MoveNext();
        while (hasMoreItems)
          yield return GetNextBatch(enumerator).ToList();
      }
    }
    IEnumerable<T> GetNextBatch(IEnumerator<T> enumerator)
    {
      for (int i = 0; i < partitionSize; ++i)
      {
        yield return enumerator.Current;
        hasMoreItems = enumerator.MoveNext();
        if (!hasMoreItems)
          yield break;
      }
    }
    IEnumerator IEnumerable.GetEnumerator()
    {
      return GetEnumerator();      
    }
  }
}

