Python inter_op_parallelism_threads 和 intra_op_parallelism_threads 的含义

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41233635/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:39:21  来源:igfitidea点击:

Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads

pythonparallel-processingtensorflowdistributed-computing

提问by itsamineral

Can somebody please explain the following TensorFlow terms

有人可以解释以下 TensorFlow 术语吗

  1. inter_op_parallelism_threads

  2. intra_op_parallelism_threads

  1. inter_op_parallelism_threads

  2. intra_op_parallelism_threads

or, please, provide links to the right source of explanation.

或者,请提供指向正确解释来源的链接。

I have conducted a few tests by changing the parameters, but the results have not been consistent to arrive at a conclusion.

我通过改变参数进行了一些测试,但结果并不一致,无法得出结论。

回答by mrry

The inter_op_parallelism_threadsand intra_op_parallelism_threadsoptions are documented in the source of the tf.ConfigProtoprotocol buffer. These options configure two thread pools used by TensorFlow to parallelize execution, as the comments describe:

inter_op_parallelism_threadsintra_op_parallelism_threads选项都记录在所述的源tf.ConfigProto协议缓冲器。这些选项配置了 TensorFlow 使用的两个线程池来并行执行,如注释所述:

// The execution of an individual op (for some op types) can be
// parallelized on a pool of intra_op_parallelism_threads.
// 0 means the system picks an appropriate number.
int32 intra_op_parallelism_threads = 2;

// Nodes that perform blocking operations are enqueued on a pool of
// inter_op_parallelism_threads available in each process.
//
// 0 means the system picks an appropriate number.
//
// Note that the first Session created in the process sets the
// number of threads for all future sessions unless use_per_session_threads is
// true or session_inter_op_thread_pool is configured.
int32 inter_op_parallelism_threads = 5;

There are several possible forms of parallelism when running a TensorFlow graph, and these options provide some control multi-core CPU parallelism:

运行 TensorFlow 图时有几种可能的并行形式,这些选项提供了一些控制多核 CPU 的并行性:

  • If you have an operation that can be parallelized internally, such as matrix multiplication (tf.matmul()) or a reduction (e.g. tf.reduce_sum()), TensorFlow will execute it by scheduling tasks in a thread pool with intra_op_parallelism_threadsthreads. This configuration option therefore controls the maximum parallel speedup for a single operation. Note that if you run multiple operations in parallel, these operations will share this thread pool.

  • If you have many operations that are independent in your TensorFlow graph—because there is no directed path between them in the dataflow graph—TensorFlow will attempt to run them concurrently, using a thread pool with inter_op_parallelism_threadsthreads. If those operations have a multithreaded implementation, they will (in most cases) share the same thread pool for intra-op parallelism.

  • 如果您有一个可以在内部并行化的操作,例如矩阵乘法 ( tf.matmul()) 或约简 (eg tf.reduce_sum()),TensorFlow 将通过在具有线程的线程池中调度任务来执行它intra_op_parallelism_threads。因此,此配置选项控制单个操作的最大并行加速。请注意,如果您并行运行多个操作,这些操作将共享此线程池。

  • 如果您的 TensorFlow 图中有许多独立的操作——因为在数据流图中它们之间没有定向路径——TensorFlow 将尝试使用带有线程的线程池并发运行它们inter_op_parallelism_threads。如果这些操作具有多线程实现,则它们(在大多数情况下)将共享相同的线程池以实现操作内并行性。

Finally, both configuration options take a default value of 0, which means "the system picks an appropriate number." Currently, this means that each thread pool will have one thread per CPU core in your machine.

最后,两个配置选项都采用默认值0,这意味着“系统会选择一个合适的数字”。目前,这意味着每个线程池将在您的机器中的每个 CPU 核心有一个线程。

回答by mrk

To get the best performance from a machine, change the parallelism threads and OpenMP settings as below for the tensorflow backend(from here):

要从机器获得最佳性能,请更改tensorflow 后端的并行线程和 OpenMP 设置,如下所示(来自此处):

import tensorflow as tf

#Assume that the number of cores per socket in the machine is denoted as NUM_PARALLEL_EXEC_UNITS
#  when NUM_PARALLEL_EXEC_UNITS=0 the system chooses appropriate settings 

config = tf.ConfigProto(intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS,?
                        inter_op_parallelism_threads=2, 
                        allow_soft_placement=True,
                        device_count = {'CPU': NUM_PARALLEL_EXEC_UNITS})

session = tf.Session(config=config)

Answer to the comment bellow:[source]

对下面评论的回答:[来源]

allow_soft_placement=True

If you would like TensorFlow to automatically choose an existing and supported device to run the operations in case the specified one doesn't exist, you can set allow_soft_placementto True in the configuration option when creating the session. In simple words it allows dynamic allocation of GPU memory.

如果您希望 TensorFlow 自动选择现有且受支持的设备来运行操作,以防指定的设备不​​存在,您可以allow_soft_placement在创建会话时在配置选项中设置为 True。简而言之,它允许动态分配 GPU 内存。

回答by Tensorflow Support

Tensorflow 2.0 Compatible Answer: If we want to execute in Graph Mode of Tensorflow Version 2.0, the function in which we can configure inter_op_parallelism_threadsand intra_op_parallelism_threadsis

Tensorflow 2.0兼容答:如果我们想在图形模式下执行Tensorflow Version 2.0,在此我们可以配置的功能inter_op_parallelism_threads,并intra_op_parallelism_threads

tf.compat.v1.ConfigProto.

tf.compat.v1.ConfigProto.