Python inter_op_parallelism_threads 和 intra_op_parallelism_threads 的含义

Question

提问by itsamineral

Can somebody please explain the following TensorFlow terms

有人可以解释以下 TensorFlow 术语吗

inter_op_parallelism_threads
intra_op_parallelism_threads

inter_op_parallelism_threads
intra_op_parallelism_threads

or, please, provide links to the right source of explanation.

或者，请提供指向正确解释来源的链接。

I have conducted a few tests by changing the parameters, but the results have not been consistent to arrive at a conclusion.

我通过改变参数进行了一些测试，但结果并不一致，无法得出结论。

Answer 1

回答by mrry

The inter_op_parallelism_threadsand intra_op_parallelism_threadsoptions are documented in the source of the tf.ConfigProtoprotocol buffer. These options configure two thread pools used by TensorFlow to parallelize execution, as the comments describe:

的inter_op_parallelism_threads和intra_op_parallelism_threads选项都记录在所述的源tf.ConfigProto协议缓冲器。这些选项配置了 TensorFlow 使用的两个线程池来并行执行，如注释所述：

// The execution of an individual op (for some op types) can be
// parallelized on a pool of intra_op_parallelism_threads.
// 0 means the system picks an appropriate number.
int32 intra_op_parallelism_threads = 2;

// Nodes that perform blocking operations are enqueued on a pool of
// inter_op_parallelism_threads available in each process.
//
// 0 means the system picks an appropriate number.
//
// Note that the first Session created in the process sets the
// number of threads for all future sessions unless use_per_session_threads is
// true or session_inter_op_thread_pool is configured.
int32 inter_op_parallelism_threads = 5;

There are several possible forms of parallelism when running a TensorFlow graph, and these options provide some control multi-core CPU parallelism:

运行 TensorFlow 图时有几种可能的并行形式，这些选项提供了一些控制多核 CPU 的并行性：

If you have an operation that can be parallelized internally, such as matrix multiplication (tf.matmul()) or a reduction (e.g. tf.reduce_sum()), TensorFlow will execute it by scheduling tasks in a thread pool with intra_op_parallelism_threadsthreads. This configuration option therefore controls the maximum parallel speedup for a single operation. Note that if you run multiple operations in parallel, these operations will share this thread pool.
If you have many operations that are independent in your TensorFlow graph—because there is no directed path between them in the dataflow graph—TensorFlow will attempt to run them concurrently, using a thread pool with inter_op_parallelism_threadsthreads. If those operations have a multithreaded implementation, they will (in most cases) share the same thread pool for intra-op parallelism.

如果您有一个可以在内部并行化的操作，例如矩阵乘法 ( tf.matmul()) 或约简 (eg tf.reduce_sum())，TensorFlow 将通过在具有线程的线程池中调度任务来执行它intra_op_parallelism_threads。因此，此配置选项控制单个操作的最大并行加速。请注意，如果您并行运行多个操作，这些操作将共享此线程池。
如果您的 TensorFlow 图中有许多独立的操作——因为在数据流图中它们之间没有定向路径——TensorFlow 将尝试使用带有线程的线程池并发运行它们inter_op_parallelism_threads。如果这些操作具有多线程实现，则它们（在大多数情况下）将共享相同的线程池以实现操作内并行性。

Finally, both configuration options take a default value of 0, which means "the system picks an appropriate number." Currently, this means that each thread pool will have one thread per CPU core in your machine.

最后，两个配置选项都采用默认值0，这意味着“系统会选择一个合适的数字”。目前，这意味着每个线程池将在您的机器中的每个 CPU 核心有一个线程。

Answer 2

回答by mrk

To get the best performance from a machine, change the parallelism threads and OpenMP settings as below for the tensorflow backend(from here):

要从机器获得最佳性能，请更改tensorflow 后端的并行线程和 OpenMP 设置，如下所示（来自此处）：

import tensorflow as tf

#Assume that the number of cores per socket in the machine is denoted as NUM_PARALLEL_EXEC_UNITS
#  when NUM_PARALLEL_EXEC_UNITS=0 the system chooses appropriate settings 

config = tf.ConfigProto(intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS,?
                        inter_op_parallelism_threads=2, 
                        allow_soft_placement=True,
                        device_count = {'CPU': NUM_PARALLEL_EXEC_UNITS})

session = tf.Session(config=config)

Answer to the comment bellow:[source]

对下面评论的回答：[来源]

allow_soft_placement=True

If you would like TensorFlow to automatically choose an existing and supported device to run the operations in case the specified one doesn't exist, you can set allow_soft_placementto True in the configuration option when creating the session. In simple words it allows dynamic allocation of GPU memory.

如果您希望 TensorFlow 自动选择现有且受支持的设备来运行操作，以防指定的设备不存在，您可以allow_soft_placement在创建会话时在配置选项中设置为 True。简而言之，它允许动态分配 GPU 内存。

Answer 3

回答by Tensorflow Support

Tensorflow 2.0 Compatible Answer: If we want to execute in Graph Mode of Tensorflow Version 2.0, the function in which we can configure inter_op_parallelism_threadsand intra_op_parallelism_threadsis

Tensorflow 2.0兼容答：如果我们想在图形模式下执行Tensorflow Version 2.0，在此我们可以配置的功能inter_op_parallelism_threads，并intra_op_parallelism_threads为

tf.compat.v1.ConfigProto.

Python inter_op_parallelism_threads 和 intra_op_parallelism_threads 的含义

提问by itsamineral

回答by mrry

回答by mrk

回答by Tensorflow Support

相关推荐

最近更新

标签

Python inter_op_parallelism_threads 和 intra_op_parallelism_threads 的含义

提问by itsamineral

回答by mrry

回答by mrk

回答by Tensorflow Support

相关推荐

Python 如何在 keras 中使用 lambda 层？

python-pylint 'C0103: 常量名无效

Python 将 pyspark.sql.dataframe.DataFrame 类型的 Dataframe 转换为 Dictionary

Python 如何创建拖放界面？

相关推荐

最近更新

标签