Windows SuspendThread 没有?(GetThreadContext 失败)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3444190/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 15:00:30  来源:igfitidea点击:

Windows SuspendThread doesn't? (GetThreadContext fails)

windowsmultithreadingwinapisuspend

提问by Ira Baxter

We have an Windows32 application in which one thread can stop another to inspect its state [PC, etc.], by doing SuspendThread/GetThreadContext/ResumeThread.

我们有一个 Windows32 应用程序,其中一个线程可以通过执行 SuspendThread/GetThreadContext/ResumeThread 来停止另一个线程以检查其状态 [PC 等]。

if (SuspendThread((HANDLE)hComputeThread[threadId])<0)  // freeze thread
   ThreadOperationFault("SuspendThread","InterruptGranule");
CONTEXT Context, *pContext;
Context.ContextFlags = (CONTEXT_INTEGER | CONTEXT_CONTROL);
if (!GetThreadContext((HANDLE)hComputeThread[threadId],&Context))
   ThreadOperationFault("GetThreadContext","InterruptGranule");

Extremely rarely, on a multicore system, GetThreadContext returns error code 5 (Windows system error code "Access Denied").

极少数情况下,在多核系统上,GetThreadContext 返回错误代码 5(Windows 系统错误代码“拒绝访问”)。

The SuspendThread documentation seems to clearly indicate that the targeted thread is suspended, if no error is returned. We are checking the return status of SuspendThread and ResumeThread; they aren't complaining, ever.

SuspendThread 文档似乎清楚地表明目标线程已挂起,如果没有返回错误。我们正在检查 SuspendThread 和 ResumeThread 的返回状态;他们从来没有抱怨过。

How can it be the case that I can suspend a thread, but can't access its context?

怎么可能我可以挂起一个线程,但不能访问它的上下文?

This blog http://www.dcl.hpi.uni-potsdam.de/research/WRK/2009/01/what-does-suspendthread-really-do/

这个博客 http://www.dcl.hpi.uni-potsdam.de/research/WRK/2009/01/what-does-suspendthread-really-do/

suggests that SuspendThread, when it returns, may have startedthe suspension of the other thread, but that thread hasn't yet suspended. In this case, I can kind of see how GetThreadContext would be problematic, but this seems like a stupid way to define SuspendThread. (How would the call of SuspendThread know when the target thread was actually suspended?)

表明 SuspendThread 在返回时可能已开始暂停另一个线程,但该线程尚未暂停。在这种情况下,我可以看出 GetThreadContext 会有什么问题,但这似乎是定义 SuspendThread 的愚蠢方法。(SuspendThread 的调用如何知道目标线程何时真正挂起?)

EDIT: I lied.I said this was for Windows.

编辑:我撒谎了。我说这是针对 Windows 的。

Well, the strange truth is that I don't see this behavior under Windows XP 64 (at least not in the last week and I don't really know what happened before that)... but we have been testing this Windows application under Wine on Ubuntu 10.x. The Wine source for the guts of GetThreadContextcontains an Access Denied return response on line 819 when an attempt to grab the thread state fails for some reason. I'm guessing, but it appears that Wine GetThreadStatus believes that a thread just might not be accessible repeatedly. Why that would be true after a SuspendThead is beyond me, but there's the code. Thoughts?

好吧,奇怪的事实是,我在 Windows XP 64 下没有看到这种行为(至少在上周没有,我真的不知道在那之前发生了什么)……但是我们一直在测试这个 Windows 应用程序Ubuntu 10.x 上的 Wine。当由于某种原因尝试获取线程状态失败时,GetThreadContextWine 源在第 819 行包含拒绝访问的返回响应。我在猜测,但似乎 Wine GetThreadStatus 认为线程可能无法重复访问。为什么在 SuspendThead 超出我的范围之后这会是真的,但有代码。想法?

EDIT2: I lied again. I said we only saw the behavior on Wine. Nope... we have now found a Vista Ultimate system that seems to produce the same error (again, rarely). So, it appears that Wine and Windows agree on an obscure case. It also appears that the mere enabling of the Sysinternals Process monitor program aggravates the situation and causes the problem to appear on Windows XP 64; I suspect a Heisenbug. (The Process Monitor doesn't even exist on the Wine-tasting (:-) machine or the XP 64 system I use for development).

EDIT2:我又撒谎了。我说我们只看到了 Wine 上的行为。不......我们现在发现了一个 Vista Ultimate 系统,它似乎会产生同样的错误(同样,很少)。因此,Wine 和 Windows 似乎就一个晦涩的案例达成了一致。似乎仅启用 Sysinternals Process 监控程序会加剧情况并导致问题出现在 Windows XP 64 上;我怀疑是海森虫。(进程监视器甚至不存在于品酒 (:-) 机器或我用于开发的 XP 64 系统上)。

What on earth is it?

它到底是什么?

EDIT3: Sept 15 2010. I've added careful checking to the error return status, without otherwise disturbing the code, for SuspendThread, ResumeThread, and GetContext. I haven't seen anyhint of this behavior on Windows systems since I did that. Haven't gotten back to the Wine experiment.

EDIT3:2010 年 9 月 15 日。对于 SuspendThread、ResumeThread 和 GetContext,我已对错误返回状态进行了仔细检查,而不会以其他方式干扰代码。自从我这样做以来,我还没有在 Windows 系统上看到任何这种行为的迹象。还没有回到 Wine 实验。

Nov 2010: Strange. It seems that if I compile this under VisualStudio 2005, it fails on Windows Vista and 7, but not earlier OSes. If I compile under VisualStudio 2010, it doesn't fail anywhere. One might point a finger at VisualStudio2005, but I'm suspicious of a location-sensitivve problem, and different optimizers in VS 2005 and VS 2010 place the code a slightly different places.

2010 年 11 月:奇怪。似乎如果我在 VisualStudio 2005 下编译它,它会在 Windows Vista 和 7 上失败,但不会在更早的操作系统上失败。如果我在 VisualStudio 2010 下编译,它不会在任何地方失败。有人可能会指责 VisualStudio2005,但我怀疑位置敏感问题,VS 2005 和 VS 2010 中的不同优化器将代码放置在略有不同的位置。

Nov 2012: Saga continues. We see this failure on a number of XP and Windows 7 machines, at a pretty low rate (once every several thousand runs). Our Suspend activities are applied to threads that mostly execute pure computational code but that sometimes make calls into Windows. I don't recall seeing this issue when the PC of the thread was in our computational code. Of course, I can't see the PC of the thread when it hangs because GetContext won't give it to me, so I can't directly confirm that the problem only happens when executing system calls. But, all our system calls are channeled through one point, and so far the evidence is that point was executed when we get the hang. So the indirect evidence suggests GetContext on a thread only fails if a system call is being executed by that thread. I haven't had the energy to build a critical experiment to test this hypothesis yet.

2012 年 11 月:传奇仍在继续。我们在许多 XP 和 Windows 7 机器上看到这种故障,发生率非常低(每几千次运行一次)。我们的 Suspend 活动应用于主要执行纯计算代码但有时会调用 Windows 的线程。当线程的 PC 在我们的计算代码中时,我不记得看到这个问题。当然,挂掉的时候是看不到线程的PC的,因为GetContext不会给我,所以我不能直接确认问题只发生在执行系统调用的时候。但是,我们所有的系统调用都通过一个点进行引导,到目前为止,证据表明该点是在我们挂起时执行的。因此,间接证据表明线程上的 GetContext 仅在该线程正在执行系统调用时才会失败。我没有

回答by Lior Kogan

Let me quote from Richter/Nassare's "Windows via C++ 5Ed" which may shed some light:

让我引用 Richter/Nassare 的“ Windows via C++ 5Ed”,这可能会有所启发

DWORD SuspendThread(HANDLE hThread);

Any thread can call this function to suspend another thread (as long as you have the thread's handle). It goes without saying (but I'll say it anyway) that a thread can suspend itself but cannot resume itself. Like ResumeThread, SuspendThread returns the thread's previous suspend count. A thread can be suspended as many as MAXIMUM_SUSPEND_COUNT times (defined as 127 in WinNT.h). Note that SuspendThread is asynchronous with respect to kernel-mode execution, but user-mode execution does not occur until the thread is resumed.

In real life, an application must be careful when it calls SuspendThread because you have no idea what the thread might be doing when you attempt to suspend it. If the thread is attempting to allocate memory from a heap, for example, the thread will have a lock on the heap. As other threads attempt to access the heap, their execution will be halted until the first thread is resumed. SuspendThread is safe only if you know exactly what the target thread is (or might be doing) and you take extreme measures to avoid problems or deadlocks caused by suspending the thread.

...

Windows actually lets you look inside a thread's kernel object and grab its current set of CPU registers. To do this, you simply call GetThreadContext:

BOOL GetThreadContext( HANDLE hThread, PCONTEXT pContext);

To call this function, just allocate a CONTEXT structure, initialize some flags (the structure's ContextFlags member) indicating which registers you want to get back, and pass the address of the structure to GetThreadContext. The function then fills in the members you've requested.

You should call SuspendThread before calling GetThreadContext; otherwise, the thread might be scheduled and the thread's context might be different from what you get back. A thread actually has two contexts: user mode and kernel mode. GetThreadContext can return only the user-mode context of a thread. If you call SuspendThread to stop a thread but that thread is currently executing in kernel mode, its user-mode context is stable even though SuspendThread hasn't actually suspended the thread yet. But the thread cannot execute any more user-mode code until it is resumed, so you can safely consider the thread suspended and GetThreadContext will work.

DWORD SuspendThread(HANDLE hThread);

任何线程都可以调用这个函数来挂起另一个线程(只要你有线程的句柄)。不言而喻(但我还是要说)一个线程可以挂起自己但不能恢复自己。与 ResumeThread 一样,SuspendThread 返回线程的先前挂起计数。一个线程最多可以挂起 MAXIMUM_SUSPEND_COUNT 次(在 WinNT.h 中定义为 127)。请注意, SuspendThread 相对于内核模式执行是异步的,但在线程恢复之前不会发生用户模式执行。

在现实生活中,应用程序在调用 SuspendThread 时必须小心,因为当您尝试挂起它时,您不知道该线程可能在做什么。例如,如果线程试图从堆分配内存,则该线程将在堆上锁定。当其他线程尝试访问堆时,它们的执行将暂停,直到第一个线程恢复。只有当您确切地知道目标线程正在做什么(或可能正在做什么)并且您采取极端措施来避免由挂起线程引起的问题或死锁时,SuspendThread 才是安全的。

...

Windows 实际上允许您查看线程的内核对象内部并获取其当前的 CPU 寄存器集。为此,您只需调用 GetThreadContext:

BOOL GetThreadContext(HANDLE hThread, PCONTEXT pContext);

要调用此函数,只需分配一个 CONTEXT 结构,初始化一些标志(该结构的 ContextFlags 成员),指示您要取回哪些寄存器,并将该结构的地址传递给 GetThreadContext。然后该函数会填写您请求的成员。

您应该在调用 GetThreadContext 之前调用 SuspendThread;否则,该线程可能已被调度,并且该线程的上下文可能与您返回的内容不同。一个线程实际上有两个上下文:用户模式和内核模式。GetThreadContext 只能返回线程的用户模式上下文。如果您调用 SuspendThread 来停止一个线程,但该线程当前正在内核模式下执行,则即使 SuspendThread 尚未真正挂起该线程,它的用户模式上下文也是稳定的。但是线程在恢复之前不能再执行任何用户模式代码,因此您可以放心地认为线程已挂起并且 GetThreadContext 将工作。

My guess is that GetThreadContext may fail if you just called SuspendThread, while the thread is in kernel mode, and the kernel is locking the thread context block at this time.

我的猜测是,如果您只是在线程处于内核模式时调用 SuspendThread,则 GetThreadContext 可能会失败,而此时内核正在锁定线程上下文块。

Maybe on multicore systems, one core is handling the kernel-mode execution of the thread that it's user mode was just suspended, keep locking the CONTEXT structure of the thread, exactly when the other core is calling GetThreadContext.

也许在多核系统上,一个内核正在处理它的用户模式刚刚挂起的线程的内核模式执行,保持锁定线程的 CONTEXT 结构,恰好在另一个内核调用 GetThreadContext 时。

Since this behaviour is not documented, I suggest contacting microsoft.

由于未记录此行为,我建议联系 microsoft。

回答by D.Shawley

There are some particular problems surrounding suspending a thread that owns a CriticalSection. I can't find a good reference to it now, but there is one mention of it on Raymond Chen's blogand another mention on Chris Brumme's blog. Basically, if you are unlucky enough to call SuspendThreadwhile the thread is accessing an OS lock (e.g., heap lock, DllMainlock, etc.), then really strangethings can happen. I would assume that this is the case that you are running into extremely rarely.

挂起拥有CriticalSection. 现在我不能找到一个很好的参考,但有是Raymond Chen的博客它的一个提克里斯Brumme的博客再次提及。基本上,如果您不幸SuspendThread在线程访问操作系统锁(例如,堆锁、DllMain锁等)时调用,那么可能会发生非常奇怪的事情。我认为您很少遇到这种情况。

Does retrying the call to GetThreadContextwork after a processor yield like Sleep(0)?

GetThreadContext在处理器产生后重试调用是否工作Sleep(0)

回答by Seph

Old issue but good to see you still kept it updated with status changes after experiencing the issue for another more than 2 years.

老问题,但很高兴看到您在经历了 2 年多的问题后仍然保持更新状态变化。

The cause of your problem is that there is a bug in the translation layer of the x64 version of WoW64, as per:

你的问题的原因是WoW64的x64版本的翻译层有一个bug,如下:

http://social.msdn.microsoft.com/Forums/en/windowscompatibility/thread/1558e9ca-8180-4633-a349-534e8d51cf3a

http://social.msdn.microsoft.com/Forums/en/windowscompatibility/thread/1558e9ca-8180-4633-a349-534e8d51cf3a

There is a rather critical bug in GetThreadContext under WoW64 which makes it return stale contents which makes it unusable in many situations. The contents is stored in user-mode This is why you think the value is not-null but in the stale contents it is still null.

在 WoW64 下的 GetThreadContext 中有一个相当严重的错误,这使得它返回陈旧的内容,这使得它在许多情况下无法使用。内容存储在用户模式这就是为什么您认为该值不是空的,但在陈旧的内容中它仍然是空的。

This is why it fails on newer OS but not older ones, try running it on Windows 7 32bit OS.

这就是为什么它在较新的操作系统上失败而不是在较旧的操作系统上失败的原因,请尝试在 Windows 7 32 位操作系统上运行它。

As for why this bug seems to happen less often with solutions built on Visual Studio 2010 / 2012 it is likely that there is something the compiler is doing which is mitigating most of the problem, for this you should inspect the IL generated from both 2005 and 2010 and see what the differences are. For example does the problem happen if the project is built without optimizations perhaps?

至于为什么在 Visual Studio 2010 / 2012 上构建的解决方案似乎不太经常发生此错误,很可能编译器正在做一些事情来缓解大部分问题,为此您应该检查从 2005 和 2005 生成的 IL 2010,看看有什么不同。例如,如果项目是在没有优化的情况下构建的,问题是否会发生?

Finally, some further reading:

最后,进一步阅读:

http://www.nynaeve.net/?p=129

http://www.nynaeve.net/?p=129

回答by SridharKritha

Calling SuspendThreadon a thread that owns a synchronization object, such as a mutexor critical section, can lead to a deadlockif the calling thread tries to obtain a synchronization object owned by a suspended thread. - MSDN

如果调用线程尝试获取挂起线程拥有的同步对象,则在拥有同步对象(例如互斥锁临界区)的线程上调用SuspendThread可能会导致死锁。- MSDN

回答by Mike

Maybe a thread safety issue. Are you sure that the hComputeThread struct isn't changing out from under you? Maybe the thread was exiting when you called suspend? This may cause suspend to succeed, but by the time you call get context it is gone and the handle is invalid.

可能是线程安全问题。你确定 hComputeThread 结构没有从你下面改变吗?也许当您调用挂起时线程正在退出?这可能会导致挂起成功,但是当您调用 get context 时它已经消失并且句柄无效。