C# 实体框架大数据集,内存不足异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18169859/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 11:26:42  来源:igfitidea点击:

Entity framework large data set, out of memory exception

c#entity-framework

提问by Mike Norgate

I am working the a very large data set, roughly 2 million records. I have the code below but get an out of memory exception after it has process around three batches, about 600,000 records. I understand that as it loops through each batch entity framework lazy loads, which is then trying to build up the full 2 million records into memory. Is there any way to unload the batch one I've processed it?

我正在处理一个非常大的数据集,大约有 200 万条记录。我有下面的代码,但是在它处理了大约三批大约 600,000 条记录后出现内存不足异常。我知道当它循环遍历每个批处理实体框架延迟加载时,它会尝试将完整的 200 万条记录构建到内存中。有没有办法卸载我处理过的批次?

ModelContext dbContext = new ModelContext();
IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.OrderBy(t => t.TownID).Batch(200000);
foreach (var batch in towns)
{
    SearchClient.Instance.IndexMany(batch, SearchClient.Instance.Settings.DefaultIndex, "Town", new SimpleBulkParameters() { Refresh = false });
}

Note: The Batch method comes from this project: https://code.google.com/p/morelinq/

注意:Batch 方法来自这个项目:https: //code.google.com/p/morelinq/

The search client is this: https://github.com/Mpdreamz/NEST

搜索客户端是这样的:https: //github.com/Mpdreamz/NEST

采纳答案by Not loved

The issue is that when you get data from EF there are actually two copies of the data created, one which is returned to the user and a second which EF holds onto and uses for change detection (so that it can persist changes to the database). EF holds this second set for the lifetime of the context and its this set thats running you out of memory.

问题是,当您从 EF 获取数据时,实际上创建了两个数据副本,一个返回给用户,另一个 EF 保留并用于更改检测(以便它可以将更改持久化到数据库) . EF 在上下文的生命周期内保存第二个集合,并且它的这个集合使您的内存不足。

You have 2 options to deal with this

你有 2 个选项来处理这个问题

  1. renew your context each batch
  2. Use .AsNoTracking() in your query eg:

    IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.AsNoTracking().OrderBy(t => t.TownID).Batch(200000);
    
  1. 每批更新您的上下文
  2. 在您的查询中使用 .AsNoTracking() 例如:

    IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.AsNoTracking().OrderBy(t => t.TownID).Batch(200000);
    

this tells EF not to keep a copy for change detection. You can read a little more about what AsNoTracking does and the performance impacts of this on my blog: http://blog.staticvoid.co.nz/2012/4/2/entity_framework_and_asnotracking

这告诉 EF 不要为更改检测保留副本。您可以在我的博客上阅读更多关于 AsNoTracking 的作用及其对性能的影响:http: //blog.staticvoid.co.nz/2012/4/2/entity_framework_and_asnotracking

回答by Wolfgang Grinfeld

I wrote a migration routine that reads from one DB and writes (with minor changes in layout) into another DB (of a different type) and in this case, renewing the connection for each batch and using AsNoTracking() did not cut it for me.

我编写了一个迁移例程,它从一个数据库读取并写入(布局略有变化)到另一个数据库(不同类型)中,在这种情况下,为每个批次更新连接并使用 AsNoTracking() 并没有为我削减它.

Note that this problem occurs using a '98 version of JET. It may work flawlessly with other DBs.

请注意,使用 '98 版本的 JET 会出现此问题。它可以与其他数据库完美配合。

However, the following algorithm did solve the Out-of-memory issue:

但是,以下算法确实解决了内存不足问题:

  • use one connection for reading and one for writing/updating
  • Read with AsNoTracking()
  • every 50 rows or so written/updated, check the memory usage, recover memory + reset output DB context (and connected tables) as needed:

    var before = System.Diagnostics.Process.GetCurrentProcess().VirtualMemorySize64;
    if (before > 800000000)
    {
        dbcontextOut.SaveChanges();
        dbcontextOut.Dispose();
        GC.Collect();
        GC.WaitForPendingFinalizers();
        dbcontextOut = dbcontextOutFunc();
        tableOut = Dynamic.InvokeGet(dbcontextOut, outputTableName);
    }
    
  • 使用一个连接进行读取,使用一个连接进行写入/更新
  • 使用 AsNoTracking() 读取
  • 每 50 行左右写入/更新,检查内存使用情况,根据需要恢复内存 + 重置输出数据库上下文(和连接的表):

    var before = System.Diagnostics.Process.GetCurrentProcess().VirtualMemorySize64;
    if (before > 800000000)
    {
        dbcontextOut.SaveChanges();
        dbcontextOut.Dispose();
        GC.Collect();
        GC.WaitForPendingFinalizers();
        dbcontextOut = dbcontextOutFunc();
        tableOut = Dynamic.InvokeGet(dbcontextOut, outputTableName);
    }