Python 如何在 tensorflow 中获取当前可用的 GPU?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38559755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get current available GPUs in tensorflow?
提问by Sangwon Kim
I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.
我有一个使用分布式 TensorFlow 的计划,我看到 TensorFlow 可以使用 GPU 进行训练和测试。在集群环境中,每台机器可能有 0 个或 1 个或更多 GPU,我想在尽可能多的机器上将我的 TensorFlow 图运行到 GPU 中。
I found that when running tf.Session()
TensorFlow gives information about GPU in the log messages like below:
我发现运行tf.Session()
TensorFlow 时会在日志消息中提供有关 GPU 的信息,如下所示:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way. I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.
我的问题是如何从 TensorFlow 获取有关当前可用 GPU 的信息?我可以从日志中获取加载的 GPU 信息,但我想以更复杂的编程方式来完成。我也可以使用 CUDA_VISIBLE_DEVICES 环境变量有意限制 GPU,所以我不想知道从操作系统内核获取 GPU 信息的方法。
In short, I want a function like tf.get_available_gpus()
that will return ['/gpu:0', '/gpu:1']
if there are two GPUs available in the machine. How can I implement this?
简而言之,如果机器中有两个可用的 GPU ,我想要一个这样的函数tf.get_available_gpus()
将返回['/gpu:0', '/gpu:1']
。我该如何实施?
回答by mrry
There is an undocumented method called device_lib.list_local_devices()
that enables you to list the devices available in the local process. (N.B.As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes
protocol bufferobjects. You can extract a list of string device names for the GPU devices as follows:
有一种未公开的方法device_lib.list_local_devices()
可以让您列出本地进程中可用的设备。(注意作为一种未记录的方法,这会受到向后不兼容的更改。)该函数返回DeviceAttributes
协议缓冲区对象的列表。您可以提取 GPU 设备的字符串设备名称列表,如下所示:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices()
will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction
, or allow_growth=True
, to prevent all of the memory being allocated. See this questionfor more details.
请注意(至少到 TensorFlow 1.4),调用device_lib.list_local_devices()
将运行一些初始化代码,默认情况下,这些代码将分配所有设备上的所有 GPU 内存(GitHub 问题)。为避免这种情况,首先创建一个带有显式 smallper_process_gpu_fraction
或的会话allow_growth=True
,以防止分配所有内存。有关更多详细信息,请参阅此问题。
回答by hyun woo Cho
You can check all device list using following code:
您可以使用以下代码检查所有设备列表:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
回答by Soham Bhattacharyya
There is also a method in the test util. So all that has to be done is:
test util 中还有一个方法。所以所要做的就是:
tf.test.is_gpu_available()
and/or
和/或
tf.test.gpu_device_name()
Look up the Tensorflow docs for arguments.
查找 Tensorflow 文档以获取参数。
回答by MiniQuark
In TensorFlow 2.0, you can use tf.config.experimental.list_physical_devices('GPU')
:
在 TensorFlow 2.0 中,您可以使用tf.config.experimental.list_physical_devices('GPU')
:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
print("Name:", gpu.name, " Type:", gpu.device_type)
If you have two GPUs installed, it outputs this:
如果您安装了两个 GPU,它会输出以下内容:
Name: /physical_device:GPU:0 Type: GPU
Name: /physical_device:GPU:1 Type: GPU
From 2.1, you can drop experimental
:
从 2.1 开始,您可以删除experimental
:
gpus = tf.config.list_physical_devices('GPU')
See:
看:
回答by mamad amin
The accepted answergives you the number of GPUs but it also allocates all the memory on those GPUs. You can avoid this by creating a session with fixed lower memory before calling device_lib.list_local_devices() which may be unwanted for some applications.
该接受的答案给你GPU的数量,但它也分配所有这些GPU的内存。您可以通过在调用 device_lib.list_local_devices() 之前创建具有固定较低内存的会话来避免这种情况,这对于某些应用程序来说可能是不需要的。
I ended up using nvidia-smi to get the number of GPUs without allocating any memory on them.
我最终使用 nvidia-smi 来获取 GPU 的数量,而没有为它们分配任何内存。
import subprocess
n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')
回答by Salvador Dali
Apart from the excellent explanation by Mrry, where he suggested to use device_lib.list_local_devices()
I can show you how you can check for GPU related information from the command line.
除了 Mrry 的出色解释(他建议使用的地方)之外,device_lib.list_local_devices()
我还可以向您展示如何从命令行检查 GPU 相关信息。
Because currently only Nvidia's gpus work for NN frameworks, the answer covers only them. Nvidia has a pagewhere they document how you can use the /proc filesystem interface to obtain run-time information about the driver, any installed NVIDIA graphics cards, and the AGP status.
因为目前只有 Nvidia 的 gpus 适用于 NN 框架,所以答案仅涵盖它们。Nvidia 有一个页面,其中记录了如何使用 /proc 文件系统接口获取有关驱动程序、任何已安装的 NVIDIA 显卡和 AGP 状态的运行时信息。
/proc/driver/nvidia/gpus/0..N/information
Provide information about each of the installed NVIDIA graphics adapters (model name, IRQ, BIOS version, Bus Type). Note that the BIOS version is only available while X is running.
/proc/driver/nvidia/gpus/0..N/information
提供有关每个已安装 NVIDIA 图形适配器的信息(型号名称、IRQ、BIOS 版本、总线类型)。请注意,BIOS 版本仅在 X 运行时可用。
So you can run this from command line cat /proc/driver/nvidia/gpus/0/information
and see information about your first GPU. It is easy to run this from pythonand also you can check second, third, fourth GPU till it will fail.
所以你可以从命令行运行它cat /proc/driver/nvidia/gpus/0/information
并查看有关你的第一个 GPU 的信息。从 python 运行它很容易,你也可以检查第二、第三、第四个 GPU,直到它失败。
Definitely Mrry's answer is more robust and I am not sure whether my answer will work on non-linux machine, but that Nvidia's page provide other interesting information, which not many people know about.
肯定 Mrry 的答案更可靠,我不确定我的答案是否适用于非 linux 机器,但 Nvidia 的页面提供了其他有趣的信息,但很少有人知道。
回答by Mike Gates
The following works in tensorflow 2:
以下在 tensorflow 2 中有效:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
print("Name:", gpu.name, " Type:", gpu.device_type)
From 2.1, you can drop experimental
:
从 2.1 开始,您可以删除experimental
:
gpus = tf.config.list_physical_devices('GPU')
https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices
https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices
回答by Arash Hatami
Use this way and check all parts :
使用这种方式并检查所有部件:
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
version = tf.__version__
executing_eagerly = tf.executing_eagerly()
hub_version = hub.__version__
available = tf.config.experimental.list_physical_devices("GPU")
print("Version: ", version)
print("Eager mode: ", executing_eagerly)
print("Hub Version: ", h_version)
print("GPU is", "available" if avai else "NOT AVAILABLE")
回答by lakshmikandan
Ensure you have the latest TensorFlow 2.xGPU installed in your GPU supporting machine, Execute the following code in python,
确保您的 GPU 支持机器中安装了最新的TensorFlow 2.xGPU,在 python 中执行以下代码,
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Will get an output looks like,
会得到一个输出看起来像,
2020-02-07 10:45:37.587838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-02-07 10:45:37.588896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7 Num GPUs Available: 8
2020-02-07 10:45:37.587838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] 成功从 SysFS 读取的 NUMA 节点具有负值 (-1),但必须至少有一个 NUMA 节点,因此返回NUMA 节点零 2020-02-07 10:45:37.588896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] 添加可见 gpu 设备:0, 1, 2, 3, 4, 5, 6, 7 Num可用 GPU:8
回答by Hafizur Rahman
I got a NVIDIA GTX GeForce 1650 Ti in my system. I have got tensorflow-gpu==2.2.0
installed.
我的系统中有一个 NVIDIA GTX GeForce 1650 Ti。我已经tensorflow-gpu==2.2.0
安装好了。
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Num GPUs Available: 1