Python 多处理模块的 .join() 方法到底在做什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25391025/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What exactly is Python multiprocessing Module's .join() Method Doing?
提问by MikeiLL
Learning about Python Multiprocessing(from a PMOTW article) and would love some clarification on what exactly the join()method is doing.
了解 Python多处理(来自PMOTW 文章),并希望了解该join()方法究竟在做什么。
In an old tutorial from 2008it states that without the p.join()call in the code below, "the child process will sit idle and not terminate, becoming a zombie you must manually kill".
在2008年的旧教程中,它指出如果没有p.join()下面代码中的调用,“子进程将处于空闲状态并且不会终止,成为必须手动杀死的僵尸”。
from multiprocessing import Process
def say_hello(name='world'):
print "Hello, %s" % name
p = Process(target=say_hello)
p.start()
p.join()
I added a printout of the PIDas well as a time.sleepto test and as far as I can tell, the process terminates on its own:
我添加了 和 的打印输出PID以time.sleep进行测试,据我所知,该过程自行终止:
from multiprocessing import Process
import sys
import time
def say_hello(name='world'):
print "Hello, %s" % name
print 'Starting:', p.name, p.pid
sys.stdout.flush()
print 'Exiting :', p.name, p.pid
sys.stdout.flush()
time.sleep(20)
p = Process(target=say_hello)
p.start()
# no p.join()
within 20 seconds:
20 秒内:
936 ttys000 0:00.05 /Library/Frameworks/Python.framework/Versions/2.7/Reso
938 ttys000 0:00.00 /Library/Frameworks/Python.framework/Versions/2.7/Reso
947 ttys001 0:00.13 -bash
after 20 seconds:
20 秒后:
947 ttys001 0:00.13 -bash
Behavior is the same with p.join()added back at end of the file. Python Module of the Week offers a very readable explanation of the module; "To wait until a process has completed its work and exited, use the join() method.", but it seems like at least OS X was doing that anyway.
行为与p.join()在文件末尾添加回相同。本周 Python 模块提供了对该模块的非常易读的解释;“要等到进程完成其工作并退出,请使用 join() 方法。”,但似乎至少 OS X 正在这样做。
Am also wondering about the name of the method. Is the .join()method concatenating anything here? Is it concatenating a process with it's end? Or does it just share a name with Python's native .join()method?
我也想知道方法的名称。是.join()方法串联什么吗?它是否将一个过程与它的结束连接起来?或者它只是与 Python 的本地.join()方法共享一个名称?
采纳答案by dano
The join()method, when used with threadingor multiprocessing, is not related to str.join()- it's not actually concatenating anything together. Rather, it just means "wait for this [thread/process] to complete". The name joinis used because the multiprocessingmodule's API is meant to look as similar to the threadingmodule's API, and the threadingmodule uses joinfor its Threadobject. Using the term jointo mean "wait for a thread to complete" is common across many programming languages, so Python just adopted it as well.
该join()方法与threading或 一起使用时与multiprocessing无关str.join()- 它实际上并没有将任何东西连接在一起。相反,它只是意味着“等待这个 [线程/进程] 完成”。join之所以使用该名称,是因为multiprocessing模块的 API 看起来与threading模块的 API相似,并且threading模块join用于其Thread对象。使用该术语join来表示“等待线程完成”在许多编程语言中都很常见,因此 Python 也只是采用了它。
Now, the reason you see the 20 second delay both with and without the call to join()is because by default, when the main process is ready to exit, it will implicitly call join()on all running multiprocessing.Processinstances. This isn't as clearly stated in the multiprocessingdocs as it should be, but it is mentioned in the Programming Guidelinessection:
现在,您在调用和不调用 to 时都看到 20 秒延迟的原因join()是因为默认情况下,当主进程准备退出时,它将隐式调用join()所有正在运行的multiprocessing.Process实例。这在multiprocessing文档中没有像它应该的那样清楚地说明,但是在编程指南部分中提到了:
Remember also that non-daemonic processes will be automatically be joined.
还要记住,非守护进程将自动加入。
You can override this behavior by setting the daemonflag on the Processto Trueprior to starting the process:
您可以通过设置覆盖此行为daemon上的标志Process来True之前,要启动的过程:
p = Process(target=say_hello)
p.daemon = True
p.start()
# Both parent and child will exit here, since the main process has completed.
If you do that, the child process will be terminated as soon as the main process completes:
如果这样做,子进程将在主进程完成后立即终止:
daemon
The process's daemon flag, a Boolean value. This must be set before start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic child processes.
守护进程
进程的守护进程标志,一个布尔值。这必须在调用 start() 之前设置。
初始值继承自创建过程。
当一个进程退出时,它会尝试终止其所有守护进程。
回答by Russell Borogove
Without the join(), the main process can complete before the child process does. I'm not sure under what circumstances that leads to zombieism.
没有join(),主进程可以在子进程之前完成。我不确定在什么情况下会导致僵尸主义。
The main purpose of join()is to ensure that a child process has completed before the main process does anything that depends on the work of the child process.
的主要目的join()是确保在主进程执行任何依赖于子进程的工作之前子进程已经完成。
The etymology of join()is that it's the opposite of fork, which is the common term in Unix-family operating systems for creating child processes. A single process "forks" into several, then "joins" back into one.
的词源join()是它的反义词fork,它是 Unix 系列操作系统中用于创建子进程的常用术语。单个进程“分叉”为多个,然后“合并”为一个。
回答by Fred Foo
I'm not going to explain in detail what joindoes, but here's the etymology and the intuition behind it, which should help you remember its meaning more easily.
我不打算详细解释join它的作用,但这里是词源和背后的直觉,它应该可以帮助您更轻松地记住它的含义。
The idea is that execution "forks" into multiple processes of which one is the master, the rest workers (or "slaves"). When the workers are done, they "join" the master so that serial execution may be resumed.
这个想法是执行“分叉”到多个进程中,其中一个是主进程,其余的工作进程(或“从进程”)。当工作人员完成后,他们“加入”主人,以便可以恢复串行执行。
The joinmethod causes the master process to wait for a worker to join it. The method might better have been called "wait", since that's the actual behavior it causes in the master (and that's what it's called in POSIX, although POSIX threads call it "join" as well). The joining only occurs as an effect of the threads cooperating properly, it's not something the master does.
该join方法使主进程等待工作人员加入它。该方法最好称为“等待”,因为这是它在主服务器中引起的实际行为(这就是它在 POSIX 中的调用方式,尽管 POSIX 线程也将其称为“连接”)。连接仅作为线程正确协作的结果而发生,这不是主人所做的事情。
The names "fork" and "join" have been used with this meaning in multiprocessing since 1963.
自 1963 年以来,名称“fork”和“join”已在多处理中具有此含义。
回答by Ani Menon
join()is used to wait for the worker processes to exit. One must call close()or terminate()before using join().
join()用于等待工作进程退出。在使用之前必须调用close()或。terminate()join()
Like @Russell mentioned joinis like the opposite of fork(which Spawns sub-processes).
就像@Russell 提到的join就像fork的对立面(它产生子进程)。
For join to run you have to run close()which will prevent any more tasks from being submitted to the pool and exit once all tasks complete. Alternatively, running terminate()will just exit by stopping all worker processes immediately.
要加入运行,您必须运行close(),这将阻止任何更多任务提交到池中,并在所有任务完成后退出。或者,运行terminate()将通过立即停止所有工作进程来退出。
"the child process will sit idle and not terminate, becoming a zombie you must manually kill"this is possible when the main (parent) process exits but the child process is still running and once completed it has no parent process to return its exit status to.
"the child process will sit idle and not terminate, becoming a zombie you must manually kill"当主(父)进程退出但子进程仍在运行并且一旦完成它没有父进程返回其退出状态时,这是可能的。
回答by Yi Xiang Chong
The join()call ensures that subsequent lines of your code (or main process) are not called before all the multiprocessing processes are completed.
该join()调用可确保在所有多处理进程完成之前不会调用后续代码行(或主进程)。
For example, without the join(), the following code will call restart_program()even before the processes finish, which is similar to asynchronous and is not what we want (you can try):
例如,没有join(),下面的代码restart_program()甚至会在进程完成之前调用,这类似于异步,不是我们想要的(您可以尝试):
num_processes = 5
for i in range(num_processes):
p = multiprocessing.Process(target=calculate_stuff, args=(i,))
p.start()
processes.append(p)
for p in processes:
p.join() # call to ensure subsequent lines (e.g. restart_program)
# are not called until all processes finish
restart_program()

