Python Celery 并发、worker 和 autoscaling 的区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31898311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:42:35  来源:igfitidea点击:

Celery difference between concurrency, workers and autoscaling

pythonconcurrencycelery

提问by Joseph

In my /etc/defaults/celerydconfig file, I've set:

在我的/etc/defaults/celeryd配置文件中,我设置了:

CELERYD_NODES="agent1 agent2 agent3 agent4 agent5 agent6 agent7 agent8"
CELERYD_OPTS="--autoscale=10,3 --concurrency=5"

I understand that the daemon spawns 8 celery workers, but I'm fully not sure what autoscaleand concurrencydo together. I thought that concurrency was a way to specify the max number of threads that a worker can use and autoscale was a way for the worker to scale up and down child workers, if necessary.

我知道守护进程产生了 8 个 celery 工人,但我完全不确定要一起做什么autoscaleconcurrency做什么。我认为并发是一种指定工作人员可以使用的最大线程数的方法,而自动缩放是工作人员在必要时扩大和缩小子工作人员的一种方式。

The tasks have a largish payload (some 20-50kB) and there are like 2-3 million such tasks, but each task runs in less than a second. I'm seeing memory usage spike up because the broker distributes the tasks to every worker, thus replicating the payload multiple times.

这些任务具有较大的有效载荷(大约 20-50kB),并且有大约 2-3 百万个这样的任务,但每个任务的运行时间不到一秒。我看到内存使用量激增,因为代理将任务分配给每个工作人员,从而多次复制有效负载。

I think the issue is in the config and that the combination of workers + concurrency + autoscaling is excessive and I would like to get a better understanding of what these three options do.

我认为问题出在配置中,而且工作人员 + 并发 + 自动缩放的组合过多,我想更好地了解这三个选项的作用。

采纳答案by scytale

Let's distinguish between workers and worker processes. You spawn a celery worker, this then spawns a number of processes (depending on things like --concurrencyand --autoscale, the default is to spawn as many processes as cores on the machine). There is no point in running more than one worker on a particular machine unless you want to do routing.

让我们区分工作进程和工作进程。您生成一个芹菜工人,一些处理此则产卵(视之类的东西--concurrency--autoscale,默认是产卵多的进程,在计算机上的内核)。除非您想进行路由,否则在特定机器上运行多个 worker 是没有意义的。

I would suggest running only 1 worker per machine with the default number of processes. This will reduce memory usage by eliminating the duplication of data between workers.

我建议每台机器只运行 1 个工人,使用默认的进程数。这将通过消除工作人员之间的数据重复来减少内存使用。

If you still have memory issues then save the data to a store and pass only an id to the workers.

如果您仍然有内存问题,则将数据保存到商店并仅将 id 传递给工作人员。