Python inter_op_parallelism_threads 和 intra_op_parallelism_threads 的含义
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41233635/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads
提问by itsamineral
Can somebody please explain the following TensorFlow terms
有人可以解释以下 TensorFlow 术语吗
inter_op_parallelism_threads
intra_op_parallelism_threads
inter_op_parallelism_threads
intra_op_parallelism_threads
or, please, provide links to the right source of explanation.
或者,请提供指向正确解释来源的链接。
I have conducted a few tests by changing the parameters, but the results have not been consistent to arrive at a conclusion.
我通过改变参数进行了一些测试,但结果并不一致,无法得出结论。
回答by mrry
The inter_op_parallelism_threads
and intra_op_parallelism_threads
options are documented in the source of the tf.ConfigProto
protocol buffer. These options configure two thread pools used by TensorFlow to parallelize execution, as the comments describe:
的inter_op_parallelism_threads
和intra_op_parallelism_threads
选项都记录在所述的源tf.ConfigProto
协议缓冲器。这些选项配置了 TensorFlow 使用的两个线程池来并行执行,如注释所述:
// The execution of an individual op (for some op types) can be
// parallelized on a pool of intra_op_parallelism_threads.
// 0 means the system picks an appropriate number.
int32 intra_op_parallelism_threads = 2;
// Nodes that perform blocking operations are enqueued on a pool of
// inter_op_parallelism_threads available in each process.
//
// 0 means the system picks an appropriate number.
//
// Note that the first Session created in the process sets the
// number of threads for all future sessions unless use_per_session_threads is
// true or session_inter_op_thread_pool is configured.
int32 inter_op_parallelism_threads = 5;
There are several possible forms of parallelism when running a TensorFlow graph, and these options provide some control multi-core CPU parallelism:
运行 TensorFlow 图时有几种可能的并行形式,这些选项提供了一些控制多核 CPU 的并行性:
If you have an operation that can be parallelized internally, such as matrix multiplication (
tf.matmul()
) or a reduction (e.g.tf.reduce_sum()
), TensorFlow will execute it by scheduling tasks in a thread pool withintra_op_parallelism_threads
threads. This configuration option therefore controls the maximum parallel speedup for a single operation. Note that if you run multiple operations in parallel, these operations will share this thread pool.If you have many operations that are independent in your TensorFlow graph—because there is no directed path between them in the dataflow graph—TensorFlow will attempt to run them concurrently, using a thread pool with
inter_op_parallelism_threads
threads. If those operations have a multithreaded implementation, they will (in most cases) share the same thread pool for intra-op parallelism.
如果您有一个可以在内部并行化的操作,例如矩阵乘法 (
tf.matmul()
) 或约简 (egtf.reduce_sum()
),TensorFlow 将通过在具有线程的线程池中调度任务来执行它intra_op_parallelism_threads
。因此,此配置选项控制单个操作的最大并行加速。请注意,如果您并行运行多个操作,这些操作将共享此线程池。如果您的 TensorFlow 图中有许多独立的操作——因为在数据流图中它们之间没有定向路径——TensorFlow 将尝试使用带有线程的线程池并发运行它们
inter_op_parallelism_threads
。如果这些操作具有多线程实现,则它们(在大多数情况下)将共享相同的线程池以实现操作内并行性。
Finally, both configuration options take a default value of 0
, which means "the system picks an appropriate number." Currently, this means that each thread pool will have one thread per CPU core in your machine.
最后,两个配置选项都采用默认值0
,这意味着“系统会选择一个合适的数字”。目前,这意味着每个线程池将在您的机器中的每个 CPU 核心有一个线程。
回答by mrk
To get the best performance from a machine, change the parallelism threads and OpenMP settings as below for the tensorflow backend(from here):
要从机器获得最佳性能,请更改tensorflow 后端的并行线程和 OpenMP 设置,如下所示(来自此处):
import tensorflow as tf
#Assume that the number of cores per socket in the machine is denoted as NUM_PARALLEL_EXEC_UNITS
# when NUM_PARALLEL_EXEC_UNITS=0 the system chooses appropriate settings
config = tf.ConfigProto(intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS,?
inter_op_parallelism_threads=2,
allow_soft_placement=True,
device_count = {'CPU': NUM_PARALLEL_EXEC_UNITS})
session = tf.Session(config=config)
Answer to the comment bellow:[source]
对下面评论的回答:[来源]
allow_soft_placement=True
If you would like TensorFlow to automatically choose an existing and supported device to run the operations in case the specified one doesn't exist, you can set allow_soft_placement
to True in the configuration option when creating the session. In simple words it allows dynamic allocation of GPU memory.
如果您希望 TensorFlow 自动选择现有且受支持的设备来运行操作,以防指定的设备不存在,您可以allow_soft_placement
在创建会话时在配置选项中设置为 True。简而言之,它允许动态分配 GPU 内存。
回答by Tensorflow Support
Tensorflow 2.0 Compatible Answer: If we want to execute in Graph Mode of Tensorflow Version 2.0
, the function in which we can configure inter_op_parallelism_threads
and intra_op_parallelism_threads
is
Tensorflow 2.0兼容答:如果我们想在图形模式下执行Tensorflow Version 2.0
,在此我们可以配置的功能inter_op_parallelism_threads
,并intra_op_parallelism_threads
为
tf.compat.v1.ConfigProto
.
tf.compat.v1.ConfigProto
.