postgresql 气流:任务已排队但未运行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43524457/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Airflow: Tasks queued but not running
提问by Deepak S
I am new to airflow and trying to setup airflow to run ETL pipelines. I was able to install
我是气流的新手,并试图设置气流来运行 ETL 管道。我能够安装
- airflow
- postgres
- celery
- rabbitmq
- 空气流动
- postgres
- 芹菜
- 兔米克
I am able to test run the turtorial dag. When i try to schedule the jobs, scheduler is able to pick it up and queue the jobs which i could see on the UI but tasks are not running. Could somebody help me fix ths issue? I believe i am missing most basic airflow concept here. below is the airflow.cfg
我能够测试运行 turtorial dag。当我尝试安排作业时,调度程序能够选取它并将我可以在 UI 上看到但任务没有运行的作业排队。有人可以帮我解决这个问题吗?我相信我在这里缺少最基本的气流概念。下面是airflow.cfg
Here is my config file:
这是我的配置文件:
[core]
airflow_home = /root/airflow
dags_folder = /root/airflow/dags
base_log_folder = /root/airflow/logs
executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://xxxx.amazonaws.com:5432/airflow
api_client = airflow.api.client.local_client
[webserver]
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_worker_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 30
[celery]
celery_app_name = airflow.executors.celery_executor
celeryd_concurrency = 16
worker_log_server_port = 8793
broker_url = amqp://rabbit:[email protected]/rabbitmq_vhost
celery_result_backend = db+postgresql+psycopg2://postgres:[email protected]:5432/airflow
flower_host = 0.0.0.0
flower_port = 5555
default_queue = default
DAG: This is the tutorial dag i used
DAG:这是我使用的教程 dag
and the start date for my dag is -- 'start_date': datetime(2017, 4, 11),
我的 dag 的开始日期是 -- 'start_date': datetime(2017, 4, 11),
回答by Xia Wang
have your run all the three components of airflow, namely:
让您运行气流的所有三个组成部分,即:
airflow webserver
airflow scheduler
airflow worker
If you only run the previous two, the tasks will be queued, but not executed. airflow worker will provide the workers that actually execute the dags.
如果只运行前两个,任务将排队,但不会执行。气流工人将提供实际执行 dag 的工人。
Also btw, celery 4.0.2 is not compatible with airflow 1.7 or 1.8 currently. Use celery 3 instead.
另外顺便说一句,celery 4.0.2 目前与气流 1.7 或 1.8 不兼容。改用芹菜 3。
回答by Olaf
I tried to upgrade to airflow v1.8 today as well and struggled with celery and rabbitmq. What helped was the change from librabbitmq (which is used by default when just using amqp) to pyamqp in airflow.cfg
我今天也尝试升级到气流 v1.8,但在使用 celery 和 rabbitmq 时遇到了困难。有帮助的是在airflow.cfg中从librabbitmq(在使用amqp时默认使用)到pyamqp的变化
broker_url = pyamqp://rabbit:[email protected]/rabbitmq_vhost
(This is where i got the idea from: https://github.com/celery/celery/issues/3675)
(这是我的想法来自:https: //github.com/celery/celery/issues/3675)
回答by Davos
I realise your problem is already answered and was related to a celery version mismatch, but I've also seen tasks queue and never run because I changed the logs location to a place where the airflow service user did not have permission to write.
我意识到您的问题已经得到解答并且与 celery 版本不匹配有关,但我也看到了任务队列并且从未运行,因为我将日志位置更改为气流服务用户无权写入的位置。
In the example airflow.cfg given in the question above:
base_log_folder = /root/airflow/logs
在上述问题中给出的示例气流.cfg 中:
base_log_folder = /root/airflow/logs
I am using AWS EC2 machine and changed the logs to write to
base_log_folder = /mnt/airflow/logs
我正在使用 AWS EC2 机器并更改了要写入的日志
base_log_folder = /mnt/airflow/logs
In the UI there is no indication given as to why tasks are queued, it just says "unknown, all dependencies are met ..." Giving the airflow daemon/service user permission to write fixed it.
在 UI 中,没有给出任务排队原因的指示,它只是说“未知,满足所有依赖关系......”授予气流守护程序/服务用户写入修复它的权限。
回答by Jakub Bielan
If LocalExecutor
is enough option for you, you can always try to get back to it. I've heard about some problems with CeleryExecutor
.
如果LocalExecutor
对您来说有足够的选择,您可以随时尝试回到它。我听说CeleryExecutor
.
Just change executor = CeleryExecutor
to executor = LocalExecutor
in your airflow.cfg
file (most of the time ~/airflow/airflow.cfg
).
只需在您的文件中更改executor = CeleryExecutor
为(大多数情况下)。executor = LocalExecutor
airflow.cfg
~/airflow/airflow.cfg
Restart scheduler and that's it!
重新启动调度程序,就是这样!