C# 如何批量循环遍历 IEnumerable

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15414347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 16:45:20  来源:igfitidea点击:

How to loop through IEnumerable in batches

c#ienumerable

提问by user1526912

I am developing a c# program which has an "IEnumerable users" that stores the ids of 4 million users. I need to loop through the Ienummerable and extract a batch 1000 ids each time to perform some operations in another method.

我正在开发 ac# 程序,它有一个“IEnumerable users”,可以存储 400 万用户的 ID。我需要遍历 Ienummerable 并每次提取一批 1000 个 ID 以在另一种方法中执行一些操作。

How do I extract 1000 ids at a time from start of the Ienumerable ...do some thing else then fetch the next batch of 1000 and so on ?

我如何从 Ienumerable 的开始一次提取 1000 个 id...做其他事情然后获取下一批 1000 等等?

Is this possible?

这可能吗?

采纳答案by Bill

Sounds like you need to use Skip and Take methods of your object. Example:

听起来您需要使用对象的 Skip 和 Take 方法。例子:

users.Skip(1000).Take(1000)

this would skip the first 1000 and take the next 1000. You'd just need to increase the amount skipped with each call

这将跳过前 1000 次并取下 1000 次。您只需要增加每次调用跳过的数量

You could use an integer variable with the parameter for Skip and you can adjust how much is skipped. You can then call it in a method.

您可以使用带有 Skip 参数的整数变量,您可以调整跳过的数量。然后您可以在方法中调用它。

public IEnumerable<user> GetBatch(int pageNumber)
{
    return users.Skip(pageNumber * 1000).Take(1000);
}

回答by p.s.w.g

The easiest way to do this is probably just to use the GroupBymethod in LINQ:

最简单的方法可能就是使用GroupByLINQ 中的方法:

var batches = myEnumerable
    .Select((x, i) => new { x, i })
    .GroupBy(p => (p.i / 1000), (p, i) => p.x);

But for a more sophisticated solution, see this blog poston how to create your own extension method to do this. Duplicated here for posterity:

但是对于更复杂的解决方案,请参阅此博客文章,了解如何创建自己的扩展方法来执行此操作。为后代复制在这里:

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
{
    List<T> nextbatch = new List<T>(batchSize);
    foreach (T item in collection)
    {
        nextbatch.Add(item);
        if (nextbatch.Count == batchSize)
        {
            yield return nextbatch;
            nextbatch = new List<T>(); 
            // or nextbatch.Clear(); but see Servy's comment below
        }
    }

    if (nextbatch.Count > 0)
        yield return nextbatch;
}

回答by Krishnaswamy Subramanian

You can achieve that using Take and Skip Enumerable extension method. For more information on usage checkout linq 101

您可以使用 Take 和 Skip Enumerable 扩展方法来实现这一点。有关使用情况的更多信息,请查看linq 101

回答by Sergey Berezovskiy

You can use MoreLINQ's Batch operator(available from NuGet):

您可以使用MoreLINQ 的 Batch 运算符(可从 NuGet 获得):

foreach(IEnumerable<User> batch in users.Batch(1000))
   // use batch


If simple usage of library is not an option, you can reuse implementation:

如果库的简单使用不是一种选择,您可以重用实现:

public static IEnumerable<IEnumerable<T>> Batch<T>(
        this IEnumerable<T> source, int size)
{
    T[] bucket = null;
    var count = 0;

    foreach (var item in source)
    {
       if (bucket == null)
           bucket = new T[size];

       bucket[count++] = item;

       if (count != size)                
          continue;

       yield return bucket.Select(x => x);

       bucket = null;
       count = 0;
    }

    // Return the last bucket with all remaining elements
    if (bucket != null && count > 0)
    {
        Array.Resize(ref bucket, count);
        yield return bucket.Select(x => x);
    }
}

BTW for performance you can simply return bucket without calling Select(x => x). Select is optimized for arrays, but selector delegate still would be invoked on each item. So, in your case it's better to use

顺便说一句,为了提高性能,您可以简单地返回存储桶而无需调用Select(x => x). Select 针对数组进行了优化,但仍会在每个项目上调用选择器委托。所以,在你的情况下,最好使用

yield return bucket;

回答by Aghilas Yakoub

You can use Take operator linq

您可以使用 Take operator linq

Link : http://msdn.microsoft.com/fr-fr/library/vstudio/bb503062.aspx

链接:http: //msdn.microsoft.com/fr-fr/library/vstudio/bb503062.aspx

回答by Zaki

try using this:

尝试使用这个:

  public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
        this IEnumerable<TSource> source,
        int batchSize)
    {
        var batch = new List<TSource>();
        foreach (var item in source)
        {
            batch.Add(item);
            if (batch.Count == batchSize)
            {
                 yield return batch;
                 batch = new List<TSource>();
            }
        }

        if (batch.Any()) yield return batch;
    }

and to use above function:

并使用上述功能:

foreach (var list in Users.Batch(1000))
{

}

回答by JLRishe

Something like this would work:

像这样的事情会起作用:

List<MyClass> batch = new List<MyClass>();
foreach (MyClass item in items)
{
    batch.Add(item);

    if (batch.Count == 1000)
    {
        // Perform operation on batch
        batch.Clear();
    }
}

// Process last batch
if (batch.Any())
{
    // Perform operation on batch
}

And you could generalize this into a generic method, like this:

您可以将其概括为通用方法,如下所示:

static void PerformBatchedOperation<T>(IEnumerable<T> items, 
                                       Action<IEnumerable<T>> operation, 
                                       int batchSize)
{
    List<T> batch = new List<T>();
    foreach (T item in items)
    {
        batch.Add(item);

        if (batch.Count == batchSize)
        {
            operation(batch);
            batch.Clear();
        }
    }

    // Process last batch
    if (batch.Any())
    {
        operation(batch);
    }
}

回答by Kabindas

How about

怎么样

int batchsize = 5;
List<string> colection = new List<string> { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"};
for (int x = 0; x < Math.Ceiling((decimal)colection.Count / batchsize); x++)
{
    var t = colection.Skip(x * batchsize).Take(batchsize);
}

回答by Mihai Ciureanu

In a streaming context, where the enumerator might get blocked in the middle of the batch, simply because the value is not yet produced (yield) it is useful to have a timeout method so that the last batch is produced after a given time. I used this for example for tailing a cursor in MongoDB. It's a little bit complicated, because the enumeration has to be done in another thread.

在流上下文中,枚举器可能在批处理中间被阻塞,仅仅因为值尚未生成(yield),使用超时方法很有用,以便在给定时间后生成最后一个批处理。例如,我用它来跟踪 MongoDB 中的游标。有点复杂,因为枚举必须在另一个线程中完成。

    public static IEnumerable<List<T>> TimedBatch<T>(this IEnumerable<T> collection, double timeoutMilliseconds, long maxItems)
    {
        object _lock = new object();
        List<T> batch = new List<T>();
        AutoResetEvent yieldEventTriggered = new AutoResetEvent(false);
        AutoResetEvent yieldEventFinished = new AutoResetEvent(false);
        bool yieldEventTriggering = false; 

        var task = Task.Run(delegate
        {
            foreach (T item in collection)
            {
                lock (_lock)
                {
                    batch.Add(item);

                    if (batch.Count == maxItems)
                    {
                        yieldEventTriggering = true;
                        yieldEventTriggered.Set();
                    }
                }

                if (yieldEventTriggering)
                {
                    yieldEventFinished.WaitOne(); //wait for the yield to finish, and batch to be cleaned 
                    yieldEventTriggering = false;
                }
            }
        });

        while (!task.IsCompleted)
        {
            //Wait for the event to be triggered, or the timeout to finish
            yieldEventTriggered.WaitOne(TimeSpan.FromMilliseconds(timeoutMilliseconds));
            lock (_lock)
            {
                if (batch.Count > 0) //yield return only if the batch accumulated something
                {
                    yield return batch;
                    batch.Clear();
                    yieldEventFinished.Set();
                }
            }
        }
        task.Wait();
    }