Python Celery 并发、worker 和 autoscaling 的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31898311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Celery difference between concurrency, workers and autoscaling
提问by Joseph
In my /etc/defaults/celeryd
config file, I've set:
在我的/etc/defaults/celeryd
配置文件中,我设置了:
CELERYD_NODES="agent1 agent2 agent3 agent4 agent5 agent6 agent7 agent8"
CELERYD_OPTS="--autoscale=10,3 --concurrency=5"
I understand that the daemon spawns 8 celery workers, but I'm fully not sure what autoscale
and concurrency
do together. I thought that concurrency was a way to specify the max number of threads that a worker can use and autoscale was a way for the worker to scale up and down child workers, if necessary.
我知道守护进程产生了 8 个 celery 工人,但我完全不确定要一起做什么autoscale
和concurrency
做什么。我认为并发是一种指定工作人员可以使用的最大线程数的方法,而自动缩放是工作人员在必要时扩大和缩小子工作人员的一种方式。
The tasks have a largish payload (some 20-50kB) and there are like 2-3 million such tasks, but each task runs in less than a second. I'm seeing memory usage spike up because the broker distributes the tasks to every worker, thus replicating the payload multiple times.
这些任务具有较大的有效载荷(大约 20-50kB),并且有大约 2-3 百万个这样的任务,但每个任务的运行时间不到一秒。我看到内存使用量激增,因为代理将任务分配给每个工作人员,从而多次复制有效负载。
I think the issue is in the config and that the combination of workers + concurrency + autoscaling is excessive and I would like to get a better understanding of what these three options do.
我认为问题出在配置中,而且工作人员 + 并发 + 自动缩放的组合过多,我想更好地了解这三个选项的作用。
采纳答案by scytale
Let's distinguish between workers and worker processes. You spawn a celery worker, this then spawns a number of processes (depending on things like --concurrency
and --autoscale
, the default is to spawn as many processes as cores on the machine). There is no point in running more than one worker on a particular machine unless you want to do routing.
让我们区分工作进程和工作进程。您生成一个芹菜工人,一些处理此则产卵(视之类的东西--concurrency
和--autoscale
,默认是产卵多的进程,在计算机上的内核)。除非您想进行路由,否则在特定机器上运行多个 worker 是没有意义的。
I would suggest running only 1 worker per machine with the default number of processes. This will reduce memory usage by eliminating the duplication of data between workers.
我建议每台机器只运行 1 个工人,使用默认的进程数。这将通过消除工作人员之间的数据重复来减少内存使用。
If you still have memory issues then save the data to a store and pass only an id to the workers.
如果您仍然有内存问题,则将数据保存到商店并仅将 id 传递给工作人员。