postgresql 气流:任务已排队但未运行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43524457/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 02:33:07  来源:igfitidea点击:

Airflow: Tasks queued but not running

postgresqlrabbitmqceleryairflowairbnb

提问by Deepak S

I am new to airflow and trying to setup airflow to run ETL pipelines. I was able to install

我是气流的新手,并试图设置气流来运行 ETL 管道。我能够安装

  1. airflow
  2. postgres
  3. celery
  4. rabbitmq
  1. 空气流动
  2. postgres
  3. 芹菜
  4. 兔米克

I am able to test run the turtorial dag. When i try to schedule the jobs, scheduler is able to pick it up and queue the jobs which i could see on the UI but tasks are not running. Could somebody help me fix ths issue? I believe i am missing most basic airflow concept here. below is the airflow.cfg

我能够测试运行 turtorial dag。当我尝试安排作业时,调度程序能够选取它并将我可以在 UI 上看到但任务没有运行的作业排队。有人可以帮我解决这个问题吗?我相信我在这里缺少最基本的气流概念。下面是airflow.cfg

Here is my config file:

这是我的配置文件:

[core]

airflow_home = /root/airflow

dags_folder = /root/airflow/dags

base_log_folder = /root/airflow/logs

executor = CeleryExecutor

sql_alchemy_conn = postgresql+psycopg2://xxxx.amazonaws.com:5432/airflow

api_client = airflow.api.client.local_client


[webserver]


web_server_host = 0.0.0.0

web_server_port = 8080

web_server_worker_timeout = 120

worker_refresh_batch_size = 1

worker_refresh_interval = 30

[celery]

celery_app_name = airflow.executors.celery_executor

celeryd_concurrency = 16

worker_log_server_port = 8793

broker_url = amqp://rabbit:[email protected]/rabbitmq_vhost

celery_result_backend = db+postgresql+psycopg2://postgres:[email protected]:5432/airflow


flower_host = 0.0.0.0

flower_port = 5555

default_queue = default

DAG: This is the tutorial dag i used

DAG:这是我使用的教程 dag

and the start date for my dag is -- 'start_date': datetime(2017, 4, 11),

我的 dag 的开始日期是 -- 'start_date': datetime(2017, 4, 11),

回答by Xia Wang

have your run all the three components of airflow, namely:

让您运行气流的所有三个组成部分,即:

airflow webserver
airflow scheduler
airflow worker

If you only run the previous two, the tasks will be queued, but not executed. airflow worker will provide the workers that actually execute the dags.

如果只运行前两个,任务将排队,但不会执行。气流工人将提供实际执行 dag 的工人。

Also btw, celery 4.0.2 is not compatible with airflow 1.7 or 1.8 currently. Use celery 3 instead.

另外顺便说一句,celery 4.0.2 目前与气流 1.7 或 1.8 不兼容。改用芹菜 3。

回答by Olaf

I tried to upgrade to airflow v1.8 today as well and struggled with celery and rabbitmq. What helped was the change from librabbitmq (which is used by default when just using amqp) to pyamqp in airflow.cfg

我今天也尝试升级到气流 v1.8,但在使用 celery 和 rabbitmq 时遇到了困难。有帮助的是在airflow.cfg中从librabbitmq(在使用amqp时默认使用)到pyamqp的变化

broker_url = pyamqp://rabbit:[email protected]/rabbitmq_vhost

(This is where i got the idea from: https://github.com/celery/celery/issues/3675)

(这是我的想法来自:https: //github.com/celery/celery/issues/3675

回答by Davos

I realise your problem is already answered and was related to a celery version mismatch, but I've also seen tasks queue and never run because I changed the logs location to a place where the airflow service user did not have permission to write.

我意识到您的问题已经得到解答并且与 celery 版本不匹配有关,但我也看到了任务队列并且从未运行,因为我将日志位置更改为气流服务用户无权写入的位置。

In the example airflow.cfg given in the question above: base_log_folder = /root/airflow/logs

在上述问题中给出的示例气流.cfg 中: base_log_folder = /root/airflow/logs

I am using AWS EC2 machine and changed the logs to write to base_log_folder = /mnt/airflow/logs

我正在使用 AWS EC2 机器并更改了要写入的日志 base_log_folder = /mnt/airflow/logs

In the UI there is no indication given as to why tasks are queued, it just says "unknown, all dependencies are met ..." Giving the airflow daemon/service user permission to write fixed it.

在 UI 中,没有给出任务排队原因的指示,它只是说“未知,满足所有依赖关系......”授予气流守护程序/服务用户写入修复它的权限。

回答by Jakub Bielan

If LocalExecutoris enough option for you, you can always try to get back to it. I've heard about some problems with CeleryExecutor.

如果LocalExecutor对您来说有足够的选择,您可以随时尝试回到它。我听说CeleryExecutor.

Just change executor = CeleryExecutorto executor = LocalExecutorin your airflow.cfgfile (most of the time ~/airflow/airflow.cfg).

只需在您的文件中更改executor = CeleryExecutor为(大多数情况下)。executor = LocalExecutorairflow.cfg~/airflow/airflow.cfg

Restart scheduler and that's it!

重新启动调度程序,就是这样!