.NET“致命执行引擎错误”故障排除

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2823440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 14:17:50  来源:igfitidea点击:

Troubleshooting .NET "Fatal Execution Engine Error"

.net.net-4.0.net-3.5fatal-error

提问by JYelton

Summary:

概括:

I periodically get a .NET Fatal Execution Engine Error on an application which I cannot seem to debug. The dialog that comes up only offers to close the program or send information about the error to Microsoft. I've tried looking at the more detailed information but I don't know how to make use of it.

我定期在我似乎无法调试的应用程序上收到 .NET 致命执行引擎错误。出现的对话框仅提供关闭程序或向 Microsoft 发送有关错误的信息。我已经尝试查看更详细的信息,但我不知道如何使用它。

Error:

错误:

The error is visible in Event Viewer under Applications and is as follows:

该错误在应用程序下的事件查看器中可见,如下所示:

.NET Runtime version 2.0.50727.3607 - Fatal Execution Engine Error (7A09795E) (80131506)

.NET 运行时版本 2.0.50727.3607 - 致命执行引擎错误 (7A09795E) (80131506)

The computer running it is Windows XP Professional SP 3. (Intel Core2Quad Q6600 2.4GHz w/ 2.0 GB of RAM) Other .NET-based projects that lack multi-threaded downloading (see below) seem to run just fine.

运行它的计算机是 Windows XP Professional SP 3。(Intel Core2Quad Q6600 2.4GHz w/ 2.0 GB RAM)其他缺乏多线程下载的基于 .NET 的项目(见下文)似乎运行得很好。

Application:

应用:

The application is written in C#/.NET 3.5 using VS2008, and installed via a setup project.

该应用程序使用 VS2008 以 C#/.NET 3.5 编写,并通过安装项目进行安装。

The app is multi-threaded and downloads data from multiple web servers using System.Net.HttpWebRequestand its methods. I've determined that the .NET error has something to do with either threading or HttpWebRequest but I haven't been able to get any closer as this particular error seems impossible to debug.

该应用程序是多线程的,并使用System.Net.HttpWebRequest其方法从多个 Web 服务器下载数据。我已经确定 .NET 错误与线程或 HttpWebRequest 有关系,但我无法更接近,因为这个特定错误似乎无法调试。

I've tried handling errors on many levels, including the following in Program.cs:

我尝试过处理多个级别的错误,包括 Program.cs 中的以下内容:

// handle UI thread exceptions
Application.ThreadException += Application_ThreadException;

// handle non-UI thread exceptions
AppDomain.CurrentDomain.UnhandledException += CurrentDomain_UnhandledException;

Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);

// force all windows forms errors to go through our handler
Application.SetUnhandledExceptionMode(UnhandledExceptionMode.CatchException);

More Notes and What I've Tried...

更多笔记和我尝试过的...

  • Installed Visual Studio 2008 on the target machine and tried running in debug mode, but the error still occurs, with no hint as to where in source code it occurred.
  • When running the program from its installed version (Release) the error occurs more frequently, usually within minutes of launching the application. When running the program in debug mode inside of VS2008, it can run for hours or days before generating the error.
  • Reinstalled .NET 3.5 and made sure all updates are applied.
  • Broke random cubicle objects in frustration.
  • Rewritten parts of code that deal with threading and downloading in attempts to catch and log exceptions, though logging seemed to aggravate the problem (and never provided any data).
  • 在目标机器上安装了 Visual Studio 2008 并尝试在调试模式下运行,但错误仍然发生,没有提示它发生在源代码的位置。
  • 从其安装版本(发行版)运行程序时,错误发生的频率更高,通常在启动应用程序的几分钟内。在 VS2008 中以调试模式运行程序时,它可能会运行数小时或数天,然后才会产生错误。
  • 重新安装 .NET 3.5 并确保应用了所有更新。
  • 沮丧地打破了随机的小隔间物体。
  • 重写了处理线程和下载的代码部分,以尝试捕获和记录异常,尽管记录似乎加剧了问题(并且从未提供任何数据)。

Question:

题:

What steps can I take to troubleshoot or debug this kind of error? Memory dumps and the like seem to be the next step, but I'm not experienced at interpreting them. Perhaps there's something more I can do in the code to try and catch errors... It would be nice if the "Fatal Execution Engine Error" was more informative, but internet searches have only told me that it's a common error for a lot of .NET-related items.

我可以采取哪些步骤来排除或调试此类错误?内存转储等似乎是下一步,但我在解释它们方面没有经验。也许我可以在代码中做更多的事情来尝试捕获错误......如果“致命执行引擎错误”提供更多信息就好了,但互联网搜索只告诉我这是很多人的常见错误.NET 相关项目。

回答by Hans Passant

Well, you've got a Big Problem. That exception is raised by the CLR when it detects that the garbage collected heap integrity is compromised. Heap corruption, the bane of any programmer that ever wrote code in an unmanaged language like C or C++.

好吧,你有一个大问题。当 CLR 检测到垃圾收集堆完整性受到损害时,就会引发该异常。堆损坏,任何曾经用非托管语言(如 C 或 C++)编写代码的程序员的祸根。

Those languages make it veryeasy to corrupt the heap, all it takes is to write past the end of an array that's allocated on the heap. Or using memory after it has been released. Or having a bad value for a pointer. The kind of bugz that managed code was invented to solve.

这些语言容易破坏堆,只需要写越过堆上分配的数组的末尾即可。或者在释放后使用内存。或者指针的值不好。发明托管代码来解决的那种 bugz。

But you are using managed code, judging from your question. Well, mostly, yourcode is managed. But you are executing lotsof unmanaged code. All the low-level code that actually makes a HttpWebRequest work is unmanaged. And so is the CLR, it was written in C++ so is technically just as likely to corrupt the heap. But after over four thousand revisions of it, and millions of programs using it, the odds that it still suffers from heap cooties are verysmall.

但是从您的问题来看,您正在使用托管代码。嗯,大多数情况下,您的代码是托管的。但是您正在执行大量非托管代码。实际上使 HttpWebRequest 工作的所有低级代码都是不受管理的。CLR 也是如此,它是用 C++ 编写的,因此在技术上很可能会损坏堆。但是经过四千多次修改,以及数百万个程序使用它之后,它仍然遭受堆 cooties 的可能性非常小。

The same isn't true for all the other unmanaged code that wants a piece of HttpWebRequest. The code you don't know about because you didn't write it and isn't documented by Microsoft. Your firewall. Your virus scanner. Your company's Internet usage monitor. Lord knows whose "download accelerator".

对于需要一块 HttpWebRequest 的所有其他非托管代码,情况并非如此。您不知道的代码,因为您没有编写它并且没有由 Microsoft 记录。你的防火墙。您的病毒扫描程序。您公司的 Internet 使用情况监视器。天知道谁的“下载加速器”。

Isolate the problem, assume it is neither your code nor Microsoft's code that causes the problem. Assume it is environmental first and get rid of the crapware.

隔离问题,假设导致问题的既不是您的代码,也不是 Microsoft 的代码。假设它首先是环境并摆脱垃圾软件。

For an epic environmental FEEE story, read this thread.

有关史诗般的环境费用故事,请阅读此主题

回答by ouflak

Since the previous suggestions are fairly generic in nature, I thought it might be of use to post my own battle against this exception with specific code examples, the background changes I implemented to cause this exception to occur, and how I solved it.

由于之前的建议本质上是相当通用的,我认为用特定的代码示例发布我自己与此异常的斗争、我实现的导致此异常发生的背景更改以及我如何解决它可能会有所帮助。

First, the short version: I was using an in-house dll that was written in C++ (unmanaged). I passed in an array of a specific size from my .NET executable. The unmanaged code attempted to write to an array location that was not allocated by the managed code. This caused a corruption in memory that was later set to be garbage collected. When garbage collector prepares to collect memory, it first checks the status of the memory (and bounds). When it finds the corruption, BOOM.

首先,简短版本:我使用的是用 C++(非托管)编写的内部 dll。我从我的 .NET 可执行文件中传入了一个特定大小的数组。非托管代码尝试写入未由托管代码分配的数组位置。这导致了后来被设置为垃圾收集的内存损坏。当垃圾收集器准备收集内存时,它首先检查内存的状态(和边界)。当它发现损坏时,BOOM

Now the TL;DR version:

现在是 TL;DR 版本

I am using an unmanaged dll developed in-house, written in C++. My own GUI development is in C# .Net 4.0. I am calling a variety of those unmanaged methods. That dll effectively acts as my data source. An example extern definition from the dll:

我正在使用内部开发的非托管 dll,用 C++ 编写。我自己的 GUI 开发是在 C# .Net 4.0 中进行的。我正在调用各种非托管方法。该 dll 有效地充当了我的数据源。来自 dll 的示例 extern 定义:

    [DllImport(@"C:\Program Files\MyCompany\dataSource.dll",
        EntryPoint = "get_sel_list",
        CallingConvention = CallingConvention.Winapi)]
    private static extern int ExternGetSelectionList(
        uint parameterNumber,
        uint[] list,
        uint[] limits,
        ref int size);

I then wrap the methods in my own interface for use throughout my project:

然后我将这些方法包装在我自己的界面中以在整个项目中使用:

    /// <summary>
    /// Get the data for a ComboBox (Drop down selection).
    /// </summary>
    /// <param name="parameterNumber"> The parameter number</param>
    /// <param name="messageList"> Message number </param>
    /// <param name="valueLimits"> The limits </param>
    /// <param name="size"> The maximum size of the memory buffer to 
    /// allocate for the data </param>
    /// <returns> 0 - If successful, something else otherwise. </returns>
    public int GetSelectionList(uint parameterNumber, 
           ref uint[] messageList, 
           ref uint[] valueLimits, 
           int size)
    {
        int returnValue = -1;
        returnValue = ExternGetSelectionList(parameterNumber,
                                         messageList, 
                                         valueLimits, 
                                         ref size);
        return returnValue;
    }

An example call of this method:

此方法的示例调用:

            uint[] messageList = new uint[3];
            uint[] valueLimits = new uint[3];
            int dataReferenceParameter = 1;

            // BUFFERSIZE = 255.
            MainNavigationWindow.MainNavigationProperty.DataSourceWrapper.GetSelectionList(
                          dataReferenceParameter, 
                          ref messageList, 
                          ref valueLimits, 
                          BUFFERSIZE);

In the GUI, one navigates through different pages containing a variety of graphics and user inputs. The previous method allowed me to get the data to populate ComboBoxes. An example of my navigation setup and call at the time before this exception:

在 GUI 中,可以浏览包含各种图形和用户输入的不同页面。以前的方法允许我获取要填充的数据ComboBoxes。在此异常之前的时间我的导航设置和调用的示例:

In my host window, I set up a property:

在我的主机窗口中,我设置了一个属性:

    /// <summary>
    /// Gets or sets the User interface page
    /// </summary>
    internal UserInterfacePage UserInterfacePageProperty
    {
        get
        {
            if (this.userInterfacePage == null)
            {
                this.userInterfacePage = new UserInterfacePage();
            }

            return this.userInterfacePage;
        }

        set { this.userInterfacePage = value; }
    }

Then, when needed, I navigate to the page:

然后,在需要时,我导航到该页面:

MainNavigationWindow.MainNavigationProperty.Navigate(
        MainNavigation.MainNavigationProperty.UserInterfacePageProperty);

Everything worked well enough, though I did have some serious creeping issues. When navigating using the object (NavigationService.Navigate Method (Object)), the default setting for the IsKeepAliveproperty is true. But the issue is more nefarious than that. Even if you set the IsKeepAlivevalue in the constructor of that page specifically to false, it is still left alone by the garbage collector as if it was true. Now for many of my pages, this was no big deal. They had small memory footprints with not all that much going on. But many other of these pages had some large highly detailed graphics on them for illustration purposes. It wasn't too long before normal usage of this interface by operators of our equipment caused huge allocations of memory that never cleared and eventually clogged up all the processes on the machine. After the rush of initial development subsided from a tsunami to more of a tidal bore, I finally decided to tackle the memory leaks once and for all. I won't go into the details of all the tricks I implemented to clean up the memory (WeakReferences to images, unhooking event handlers on Unload(), using a custom timer implementing the IWeakEventListenerinterface, etc...). The key change I made was to navigate to the pages using the Uri instead of the object (NavigationService.Navigate Method (Uri)). There are two important differences when using this type of navigation:

一切都运行良好,尽管我确实遇到了一些严重的爬行问题。使用对象(NavigationService.Navigate Method (Object))进行导航时,该IsKeepAlive属性的默认设置为true。但这个问题比这更邪恶。即使您IsKeepAlive将该页面的构造函数中的值专门设置为false,垃圾收集器仍将其单独放置,就像true. 现在对于我的许多页面来说,这没什么大不了的。他们的内存占用很小,而且没有那么多事情发生。但是这些页面中的许多其他页面上都有一些非常详细的大型图形,用于说明目的。不久之后,我们设备的操作员正常使用此接口会导致大量内存分配,这些内存从未清除并最终阻塞了机器上的所有进程。在最初的开发热潮从海啸消退到更多潮汐之后,我终于决定一劳永逸地解决内存泄漏问题。我不会详细介绍我为清理内存而实现的所有技巧(图像的WeakReference、Unload() 上的取消钩子事件处理程序、使用实现IWeakEventListener的自定义计时器界面等...)。我所做的关键更改是使用 Uri 而不是对象(NavigationService.Navigate Method (Uri))导航到页面。使用这种类型的导航有两个重要的区别:

  1. IsKeepAliveis set to falseby default.
  2. The garbage collector now will try to clean up the navigation object as if IsKeepAlivewas set to false.
  1. IsKeepAlivefalse默认设置为。
  2. 垃圾收集器现在将尝试清理导航对象,就像IsKeepAlive设置为 一样false

So now my navigation looks like:

所以现在我的导航看起来像:

MainNavigation.MainNavigationProperty.Navigate(
    new Uri("/Pages/UserInterfacePage.xaml", UriKind.Relative));

Something else to note here: This not only affects how the objects are cleaned up by the garbage collector, this affects how they are initially allocated in memory, as I would soon find out.

这里还有一点需要注意:这不仅会影响垃圾收集器清理对象的方式,还会影响它们最初在内存中的分配方式,我很快就会发现。

Everything seemed to worked great. My memory would quickly get cleaned up to near my initial state as I navigated through the graphics intensive pages, until I hit this particular page with that particular call to the dataSource dll to fill in some comboBoxes. Then I got this nasty FatalEngineExecutionError. After days of research and finding vague suggestions, or highly specific solutions that didn't apply to me, as well as unleashing just about every debugging weapon in my personal programming arsenal, I finally decided that the only way I was really going to nail this down was the extreme measure of rebuilding an exact copy of this particular page, element by element, method by method, line by line, until I finally came across the code that threw this exception. It was as tedious and painful as I'm implying, but I finally tracked it down.

一切似乎都很好。当我浏览图形密集型页面时,我的记忆会很快被清理到接近我的初始状态,直到我通过对 dataSource dll 的特定调用来填充一些组合框,然后点击这个特定页面。然后我得到了这个讨厌的FatalEngineExecutionError。经过几天的研究并找到模糊的建议,或对我不适用的高度具体的解决方案,以及在我的个人编程库中释放几乎所有调试武器后,我终于决定,我真正要解决这个问题的唯一方法down 是重建这个特定页面的精确副本的极端措施,一个元素一个元素,一个方法一个方法,一行一行,直到我最终遇到抛出这个异常的代码。这和我暗示的一样乏味和痛苦,但我终于找到了它。

It turned out to be in the way the unmanaged dll was allocating memory to write data into the arrays I was sending in for populating. That particular method would actually look at the parameter number and, from that information, allocate an array of a particular size based on the amount of data it expected to write into the array I sent in. The code that crashed:

事实证明,非托管 dll 分配内存以将数据写入我发送以进行填充的数组的方式。该特定方法实际上会查看参数编号,并根据该信息,根据它希望写入我发送的数组的数据量分配一个特定大小的数组。崩溃的代码:

            uint[] messageList = new uint[2];
            uint[] valueLimits = new uint[2];
            int dataReferenceParameter = 1;

            // BUFFERSIZE = 255.
            MainNavigationWindow.MainNavigationProperty.DataSourceWrapper.GetSelectionList(
                           dataReferenceParameter, 
                           ref messageList, 
                           ref valueLimits, 
                           BUFFERSIZE);

This code might seem identical to the sample above, but it has one tiny difference. The array size I allocate is 2not 3. I did this because I knew that this particular ComboBox would only have two selection items as opposed to the other ComboBoxes on the page which all had three selection items. However the unmanaged code didn't see things the way I saw it. It got the array I handed in, and tried to write a size[ 3 ] array into my size[ 2 ] allocation, and that was it. * bang!* * crash!* I changed the allocation size to 3, and the error went away.

这段代码可能看起来与上面的示例相同,但它有一个微小的区别。我分配的数组大小是2而不是3。我这样做是因为我知道这个特定的 ComboBox 只有两个选择项,而不是页面上的其他 ComboBox 都有三个选择项。然而,非托管代码并没有像我看到的那样看待事物。它得到了我提交的数组,并尝试将一个 size[3] 数组写入我的 size[2] 分配中,就是这样。*砰!* *崩溃!* 我将分配大小更改为 3,错误消失了。

Now this particular code had already been running without this error for atleast a year. But the simple act of navigating to this page via a Urias opposed to an Objectcaused the crash to appear. This implies that the initial object must be allocated differently because of the navigation method I used. Since with my old navigation method, the memory was just piled into place and left to do with as I saw fit for eternity, it didn't seem to matter if it was a bit corrupted in one or two small locations. Once the garbage collector had to actually do something with that memory (such as clean it up), it detected the memory corruption and threw the exception. Ironically, my major memory leak was covering up a fatal memory error!

现在这个特定的代码已经运行了至少一年没有这个错误。但是通过 aUri而不是a 导航到此页面的简单行为Object导致崩溃出现。这意味着由于我使用的导航方法,必须以不同的方式分配初始对象。由于使用我的旧导航方法,记忆只是堆积到位并留在我认为适合永恒的地方,因此它是否在一两个小位置有点损坏似乎并不重要。一旦垃圾收集器不得不对该内存进行实际操作(例如清理它),它就会检测到内存损坏并抛出异常。具有讽刺意味的是,我的主要内存泄漏掩盖了一个致命的内存错误!

Obviously we are going to review this interface to avoid such simple assumptions causing such crashes in the future. Hope this helps guide some others to find out what's going on in their own code.

显然,我们将重新审视这个界面,以避免这种简单的假设在未来导致此类崩溃。希望这有助于指导其他一些人找出他们自己的代码中发生了什么。

回答by Eamon Nerbonne

A presentation that might be a nice tutorial on where to start with this kind of issue is this: Hardcore production debugging in .NET by Ingo Rammer.

关于从哪里开始解决此类问题的一个演示文稿可能是一个很好的教程:Ingo Rammer 在 .NET 中进行硬核生产调试

I do a bit a of C++/CLI coding, and heap corruption doesn't usually result in this error; usually heap corruption either causes a data corruption and a subsequent normal exception or a memory protection error - which probably doesn't mean anything.

我做了一些 C++/CLI 编码,堆损坏通常不会导致这个错误;通常堆损坏会导致数据损坏和随后的正常异常或内存保护错误 - 这可能并不意味着什么。

In addition to trying .net 4.0 (which loads unmanaged code differently) you should compare x86 and x64 editions of the CLR - if possible - the x64 version has a larger address space and thus completely different malloc (+fragmentation) behavior and so you just might get lucky and have a different (more debuggable) error there (if it occurs at all).

除了尝试 .net 4.0(以不同方式加载非托管代码)之外,您还应该比较 CLR 的 x86 和 x64 版本 - 如果可能的话 - x64 版本具有更大的地址空间,因此完全不同的 malloc(+碎片)行为,因此您只需可能会很幸运并且在那里有一个不同的(更可调试的)错误(如果它发生的话)。

Also, have you turned on unmanaged code debugging in the debugger (a project option), when you run with visual studio on? And do you have Managed Debug Assistants on?

另外,当您在 Visual Studio 上运行时,您是否在调试器(项目选项)中打开了非托管代码调试?你有托管调试助手吗?

回答by youen

In my case I had installed an exception handler with AppDomain.CurrentDomain.FirstChanceException. This handler was logging some exceptions, and all was fine for a few years (actually this debugging code should not have stayed in production).

在我的情况下,我安装了一个异常处理程序AppDomain.CurrentDomain.FirstChanceException。这个处理程序记录了一些异常,几年来一切都很好(实际上这个调试代码不应该留在生产中)。

But following a configuration error, the logger started to fail, and the handler itself was throwing, which apparently resulted in a FatalExecutionEngineErrorseemingly coming from nowhere.

但是在配置错误之后,记录器开始失败,并且处理程序本身正在抛出,这显然导致FatalExecutionEngineError似乎无处可去。

So anyone encountering this error could spend a few seconds searching for occurrences of FirstChanceExceptionanywhere in the code and maybe save a few hours of head scratching :)

因此,遇到此错误的任何人都可以花几秒钟时间搜索FirstChanceException代码中任何地方的出现,并且可能会节省几个小时的头疼:)

回答by Mehmet ünlüel

If you are using thread.sleep() that can be the reason. Unmanaged code can only be sleeped from kernell.32 sleep() function.

如果您正在使用 thread.sleep() 这可能是原因。非托管代码只能从 kernell.32 sleep() 函数中休眠。