Python 用芹菜运行“独特”的任务

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4095940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:14:19  来源:igfitidea点击:

Running "unique" tasks with celery

pythondjangocelery

提问by Luper Rouch

I use celery to update RSS feeds in my news aggregation site. I use one @task for each feed, and things seem to work nicely.

我使用 celery 更新我的新闻聚合站点中的 RSS 提要。我为每个提要使用一个 @task,一切似乎都运行良好。

There's a detail that I'm not sure to handle well though: all feeds are updated once every minute with a @periodic_task, but what if a feed is still updating from the last periodic task when a new one is started ? (for example if the feed is really slow, or offline and the task is held in a retry loop)

有一个细节我不确定处理好:所有提要每分钟更新一次@periodic_task,但是如果提要在启动新任务时仍在从上一个定期任务更新怎么办?(例如,如果提要真的很慢,或者离线并且任务处于重试循环中)

Currently I store tasks results and check their status like this:

目前我存储任务结果并像这样检查它们的状态:

import socket
from datetime import timedelta
from celery.decorators import task, periodic_task
from aggregator.models import Feed


_results = {}


@periodic_task(run_every=timedelta(minutes=1))
def fetch_articles():
    for feed in Feed.objects.all():
        if feed.pk in _results:
            if not _results[feed.pk].ready():
                # The task is not finished yet
                continue
        _results[feed.pk] = update_feed.delay(feed)


@task()
def update_feed(feed):
    try:
        feed.fetch_articles()
    except socket.error, exc:
        update_feed.retry(args=[feed], exc=exc)

Maybe there is a more sophisticated/robust way of achieving the same result using some celery mechanism that I missed ?

也许有一种更复杂/更强大的方法可以使用我错过的一些芹菜机制来实现相同的结果?

采纳答案by MattH

From the official documentation: Ensuring a task is only executed one at a time.

来自官方文档:确保一次只执行一个任务

回答by SteveJ

Based on MattH's answer, you could use a decorator like this:

根据 MattH 的回答,您可以使用这样的装饰器:

def single_instance_task(timeout):
    def task_exc(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            lock_id = "celery-single-instance-" + func.__name__
            acquire_lock = lambda: cache.add(lock_id, "true", timeout)
            release_lock = lambda: cache.delete(lock_id)
            if acquire_lock():
                try:
                    func(*args, **kwargs)
                finally:
                    release_lock()
        return wrapper
    return task_exc

then, use it like so...

然后,像这样使用它......

@periodic_task(run_every=timedelta(minutes=1))
@single_instance_task(60*10)
def fetch_articles()
    yada yada...

回答by keithl8041

If you're looking for an example that doesn't use Django, then try this example(caveat: uses Redis instead, which I was already using).

如果你正在寻找一个不使用 Django 的例子,那么试试这个例子(注意:使用 Redis,我已经在使用了)。

The decorator code is as follows (full credit to the author of the article, go read it)

装饰器代码如下(完全归功于文章作者,去阅读吧)

import redis

REDIS_CLIENT = redis.Redis()

def only_one(function=None, key="", timeout=None):
    """Enforce only one celery task at a time."""

    def _dec(run_func):
        """Decorator."""

        def _caller(*args, **kwargs):
            """Caller."""
            ret_value = None
            have_lock = False
            lock = REDIS_CLIENT.lock(key, timeout=timeout)
            try:
                have_lock = lock.acquire(blocking=False)
                if have_lock:
                    ret_value = run_func(*args, **kwargs)
            finally:
                if have_lock:
                    lock.release()

            return ret_value

        return _caller

    return _dec(function) if function is not None else _dec

回答by user12397901

This solution for celery working at single host with concurency greater 1. Other kinds (without dependencies like redis) of locks difference file-based don't work with concurrency greater 1.

此解决方案适用于在并发性大于 1 的单个主机上工作的 celery。其他类型(无依赖关系,如 redis)基于文件差异的锁不适用于并发性大于 1 的情况。

class Lock(object):
    def __init__(self, filename):
        self.f = open(filename, 'w')

    def __enter__(self):
        try:
            flock(self.f.fileno(), LOCK_EX | LOCK_NB)
            return True
        except IOError:
            pass
        return False

    def __exit__(self, *args):
        self.f.close()


class SinglePeriodicTask(PeriodicTask):
    abstract = True
    run_every = timedelta(seconds=1)

    def __call__(self, *args, **kwargs):
        lock_filename = join('/tmp',
                             md5(self.name).hexdigest())
        with Lock(lock_filename) as is_locked:
            if is_locked:
                super(SinglePeriodicTask, self).__call__(*args, **kwargs)
            else:
                print 'already working'


class SearchTask(SinglePeriodicTask):
    restart_delay = timedelta(seconds=60)

    def run(self, *args, **kwargs):
        print self.name, 'start', datetime.now()
        sleep(5)
        print self.name, 'end', datetime.now()

回答by vdboor

Using https://pypi.python.org/pypi/celery_onceseems to do the job really nice, including reporting errors and testing against some parameters for uniqueness.

使用https://pypi.python.org/pypi/celery_once似乎可以很好地完成这项工作,包括报告错误和针对某些参数进行唯一性测试。

You can do things like:

您可以执行以下操作:

from celery_once import QueueOnce
from myapp.celery import app
from time import sleep

@app.task(base=QueueOnce, once=dict(keys=('customer_id',)))
def start_billing(customer_id, year, month):
    sleep(30)
    return "Done!"

which just needs the following settings in your project:

只需要在您的项目中进行以下设置:

ONCE_REDIS_URL = 'redis://localhost:6379/0'
ONCE_DEFAULT_TIMEOUT = 60 * 60  # remove lock after 1 hour in case it was stale