在python中填充队列和管理多处理
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17241663/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filling a queue and managing multiprocessing in python
提问by Tibo
I'm having this problem in python:
我在python中遇到了这个问题:
- I have a queue of URLs that I need to check from time to time
- if the queue is filled up, I need to process each item in the queue
- Each item in the queue must be processed by a single process (multiprocessing)
- 我有一个需要不时检查的 URL 队列
- 如果队列已满,我需要处理队列中的每个项目
- 队列中的每个项目都必须由单个进程处理(多处理)
So far I managed to achieve this "manually" like this:
到目前为止,我设法像这样“手动”实现了这个目标:
while 1:
self.updateQueue()
while not self.mainUrlQueue.empty():
domain = self.mainUrlQueue.get()
# if we didn't launched any process yet, we need to do so
if len(self.jobs) < maxprocess:
self.startJob(domain)
#time.sleep(1)
else:
# If we already have process started we need to clear the old process in our pool and start new ones
jobdone = 0
# We circle through each of the process, until we find one free ; only then leave the loop
while jobdone == 0:
for p in self.jobs :
#print "entering loop"
# if the process finished
if not p.is_alive() and jobdone == 0:
#print str(p.pid) + " job dead, starting new one"
self.jobs.remove(p)
self.startJob(domain)
jobdone = 1
However that leads to tons of problems and errors. I wondered if I was not better suited using a Pool of process. What would be the right way to do this?
然而,这会导致大量的问题和错误。我想知道我是否更适合使用进程池。这样做的正确方法是什么?
However, a lot of times my queue is empty, and it can be filled by 300 items in a second, so I'm not too sure how to do things here.
但是,很多时候我的队列是空的,一秒钟可以填满 300 个项目,所以我不太确定这里该怎么做。
回答by Sylvain Leroux
You could use the blocking capabilities of queueto spawn multiple process at startup (using multiprocessing.Pool) and letting them sleep until some data are available on the queue to process. If your not familiar with that, you could try to "play" with that simple program:
您可以使用 的阻塞功能queue在启动时生成多个进程(使用multiprocessing.Pool)并让它们休眠,直到队列上有一些数据可供处理。如果您对此不熟悉,可以尝试使用该简单程序“玩”:
import multiprocessing
import os
import time
the_queue = multiprocessing.Queue()
def worker_main(queue):
print os.getpid(),"working"
while True:
item = queue.get(True)
print os.getpid(), "got", item
time.sleep(1) # simulate a "long" operation
the_pool = multiprocessing.Pool(3, worker_main,(the_queue,))
# don't forget the coma here ^
for i in range(5):
the_queue.put("hello")
the_queue.put("world")
time.sleep(10)
Tested with Python 2.7.3 on Linux
在 Linux 上用 Python 2.7.3 测试
This will spawn 3 processes (in addition of the parent process). Each child executes the worker_mainfunction. It is a simple loop getting a new item from the queue on each iteration. Workers will block if nothing is ready to process.
这将产生 3 个进程(除了父进程)。每个孩子执行该worker_main功能。这是一个简单的循环,在每次迭代时从队列中获取一个新项目。如果没有准备好处理,工作人员将阻塞。
At startup all 3 process will sleep until the queue is fed with some data. When a data is available one of the waiting workers get that item and starts to process it. After that, it tries to get an other item from the queue, waiting again if nothing is available...
在启动时,所有 3 个进程都将休眠,直到队列收到一些数据。当数据可用时,其中一个等待的工作人员会获取该项目并开始处理它。之后,它尝试从队列中获取另一个项目,如果没有可用的内容,则再次等待......

