windows 在 CUDA 错误后重置 GPU 和驱动程序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10871412/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 09:33:33  来源:igfitidea点击:

Resetting GPU and driver after CUDA error

windowscudagpu

提问by Roger Dahl

Sometimes, bugs in my CUDA programs cause the desktop graphics to break (in Windows). Typically, the screen remains somewhat readable, but when graphics change, such as when dragging a window, lots of semi-random colored pixels and small blocks appear.

有时,我的 CUDA 程序中的错误会导致桌面图形损坏(在 Windows 中)。通常,屏幕保持一定程度的可读性,但是当图形发生变化时,例如拖动窗口时,会出现大量半随机彩色像素和小块。

I have tried to reset the GPU and driver by changing the desktop resolution, but that doesn't help. The only fix I have found is to reboot the computer.

我曾尝试通过更改桌面分辨率来重置 GPU 和驱动程序,但这无济于事。我找到的唯一解决方法是重新启动计算机。

Is there a program out there or some trick I can use to get the driver and GPU to reset without rebooting?

是否有程序或一些技巧可以让我在不重新启动的情况下重置驱动程序和 GPU?

Background:

背景:

I have had 1.0, 1.1, 1.3 and 2.0 cards but I only have a 1.1 and 2.0 card now. I've seen the issue on 1.0 and 1.1. I'm pretty sure I've seen it on 1.3. I'm unsure about 2.0. Did memory protection get added some time around 1.3? I am almost sure it's not due to unstable hardware as the problems have seemed to be triggered by bugs in my code and have disappeared when the bugs were fixed. When running finished code, the cards have been stable. I wrote this question after seeing it on my 1.1 card, but it disappeared after I fixed a bug and now I don't have any code that reproduces it. Maybe I should try to write to random locations on the 1.1 card and see if anything happens...

我有 1.0、1.1、1.3 和 2.0 卡,但我现在只有 1.1 和 2.0 卡。我在 1.0 和 1.1 上看到过这个问题。我很确定我在 1.3 上见过它。我不确定 2.0. 内存保护是否在 1.3 左右添加了一段时间?我几乎可以肯定这不是由于硬件不稳定,因为问题似乎是由我的代码中的错误触发的,并且在修复错误后消失了。当运行完成的代码时,卡已经稳定了。我在我的 1.1 卡上看到它后写了这个问题,但是在我修复了一个错误后它消失了,现在我没有任何代码可以重现它。也许我应该尝试写入 1.1 卡上的随机位置,看看是否会发生任何事情......

采纳答案by harrism

Edit:

编辑:

If you are on Tesla hardware on Linux and can run nvidia-smi, then you can reset the GPU using

如果您在 Linux 上使用 Tesla 硬件并且可以运行 nvidia-smi,那么您可以使用以下命令重置 GPU

nvidia-smi -r

or

或者

nvidia-smi --gpu-reset

Here is the manoutput for this switch:

这是man此开关的输出:

Resets GPU state. Can be used to clear double bit ECC errors or recover hung GPU. Requires -i switch to target specific device. Available on Linux only.

重置 GPU 状态。可用于清除双位 ECC 错误或恢复挂起的 GPU。需要 -i 切换到目标特定设备。仅在 Linux 上可用。

Otherwise...

除此以外...



The way to truly reset the hardware is to reboot.

真正重置硬件的方法是重新启动。

What you describe shouldn't happen. I recommend testing with different hardware and let us know if it still occurs.

你描述的应该不会发生。我建议使用不同的硬件进行测试,如果它仍然发生,请告诉我们。

回答by fraank

Because the same problem occurs sometimes on unix and google forwarded me to this thread, I hope this helps somebody else..

因为在 unix 上有时会发生同样的问题,谷歌将我转发到这个线程,我希望这对其他人有帮助..

On ubuntu unloading and reloading the nvidia kernel module solved the problem for me:

在 ubuntu 上卸载并重新加载 nvidia 内核模块为我解决了这个问题:

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

回答by jorge

I have a GeForce GTX 260 over NVDIA GPU SDK 4.2 and I am experiencing the some problems. Sometimes developing I have bugs in the programs. This causes the screen to show the random colored pixels described in this post.

我有一个基于 NVDIA GPU SDK 4.2 的 GeForce GTX 260,但我遇到了一些问题。有时开发我在程序中有错误。这会导致屏幕显示本文中描述的随机彩色像素。

As stated here, if I change resolution they do not disappear. Moreover, if I only change the COLOUR DEPTH from 32 to 16 bits, the random colored pixels disappear, but going back to 32 bits (without rebooting) make them appear again. Last bug that caused this behaviour was using __constant__ memory but passing it as a pointer:

如此处所述,如果我更改分辨率,它们不会消失。此外,如果我只将 COLOR DEPTH 从 32 位更改为 16 位,随机彩色像素会消失,但回到 32 位(无需重新启动)会使它们再次出现。导致此行为的最后一个错误是使用 __constant__ 内存但将其作为指针传递:

test<<<grid, threadsPerBlock>>>( cuda_malloc_data, cuda_constant_data );

If I do not pass cudb_constant_data, then there is no bug (and consequently, the random coloured pixels do not appear).

如果我没有通过cudb_constant_data,则没有错误(因此,不会出现随机彩色像素)。

回答by Matija Grcic

To reset the graphics stack in Windows, press Win+Ctrl+Shift+B.

要在 Windows 中重置图形堆栈,请按Win+ Ctrl+ Shift+ B

回答by Ava Assadi

  1. from "device manager", under Display adapters tab, find the driver
  2. disable it
  3. press win + ctrl +shift + B (monitor will blink)
  4. enable the driver
  1. 从“设备管理器”,在“显示适配器”选项卡下,找到驱动程序
  2. 禁用它
  3. 按 win + ctrl +shift + B(显示器会闪烁)
  4. 启用驱动程序

there you go.

你去吧。