Python Celery 并发、worker 和 autoscaling 的区别

Question

提问by Joseph

In my /etc/defaults/celerydconfig file, I've set:

在我的/etc/defaults/celeryd配置文件中，我设置了：

CELERYD_NODES="agent1 agent2 agent3 agent4 agent5 agent6 agent7 agent8"
CELERYD_OPTS="--autoscale=10,3 --concurrency=5"

I understand that the daemon spawns 8 celery workers, but I'm fully not sure what autoscaleand concurrencydo together. I thought that concurrency was a way to specify the max number of threads that a worker can use and autoscale was a way for the worker to scale up and down child workers, if necessary.

我知道守护进程产生了 8 个 celery 工人，但我完全不确定要一起做什么autoscale和concurrency做什么。我认为并发是一种指定工作人员可以使用的最大线程数的方法，而自动缩放是工作人员在必要时扩大和缩小子工作人员的一种方式。

The tasks have a largish payload (some 20-50kB) and there are like 2-3 million such tasks, but each task runs in less than a second. I'm seeing memory usage spike up because the broker distributes the tasks to every worker, thus replicating the payload multiple times.

这些任务具有较大的有效载荷（大约 20-50kB），并且有大约 2-3 百万个这样的任务，但每个任务的运行时间不到一秒。我看到内存使用量激增，因为代理将任务分配给每个工作人员，从而多次复制有效负载。

I think the issue is in the config and that the combination of workers + concurrency + autoscaling is excessive and I would like to get a better understanding of what these three options do.

我认为问题出在配置中，而且工作人员 + 并发 + 自动缩放的组合过多，我想更好地了解这三个选项的作用。

Answer 1

采纳答案by scytale

Let's distinguish between workers and worker processes. You spawn a celery worker, this then spawns a number of processes (depending on things like --concurrencyand --autoscale, the default is to spawn as many processes as cores on the machine). There is no point in running more than one worker on a particular machine unless you want to do routing.

让我们区分工作进程和工作进程。您生成一个芹菜工人，一些处理此则产卵（视之类的东西--concurrency和--autoscale，默认是产卵多的进程，在计算机上的内核）。除非您想进行路由，否则在特定机器上运行多个 worker 是没有意义的。

I would suggest running only 1 worker per machine with the default number of processes. This will reduce memory usage by eliminating the duplication of data between workers.

我建议每台机器只运行 1 个工人，使用默认的进程数。这将通过消除工作人员之间的数据重复来减少内存使用。

If you still have memory issues then save the data to a store and pass only an id to the workers.

如果您仍然有内存问题，则将数据保存到商店并仅将 id 传递给工作人员。

Python Celery 并发、worker 和 autoscaling 的区别

提问by Joseph

采纳答案by scytale

相关推荐

最近更新

标签

Python Celery 并发、worker 和 autoscaling 的区别

提问by Joseph

采纳答案by scytale

相关推荐

Python 逐字遍历字符串

Python Matplotlib 颜色条背景和标签放置

Python UnicodeDecodeError: 'utf8' 编解码器无法解码字节“0xc3”

Python3 和 hmac 。如何处理不是二进制的字符串

相关推荐

最近更新

标签