Apache Airflow, Which Executor to use in Production?

Celery Executor

Celery is used for running distributed asynchronous python tasks.

Hence, Celery Executor has been a part of Airflow for a long time, even before Kubernetes. With Celery Executors, you must set a specific number of worker instances.

Pros

In Airflow, you can specify the number of tasks that can run in a given worker. It is a good idea if you have a predictable number of tasks to run on a given worker.
Celery manages the workers. In case of a failure, Celery spins up a new one.

Cons

Celery needs RabbitMQ/Redis for queuing the task, an added dependency.
Multiple tasks run on the same worker, which means one task can clog all the resources available for another.
Running multiple workers all the time might lead to wasting resources when there isn't much to process.

Kubernetes Executor

KubernetesExecutor is where Airflow spins up a new pod to run an Airflow task.

Pros

Unlike Celery's executor, the advantage is you don't have a bunch of workers always running. KubernetesExecutor is on-demand, thereby reducing cost.

KuberentesExecutor lets you specify the resources required for each task giving you more control.

Cons

One downside of Kubernetes executor can be the time it takes to spin up the pod but compared to the advantages, it can be close to null.

Setting up the infrastructure can be complicated if you don't have the Kubernetes skillset in your team.

Kubernetes Celery Executor

KubernetesCeleryExecutor brings the best of both Celery and Kubernetes worlds and also the worst. It is a good idea to use them only when they are necessary.

You have a few resource-hungry tasks which need high resource and isolation so that it doesn't clog other tasks.
You have a mixture of peak times tasks with longer queues that can be run using Kubernetes and other predictable tasks with predictable resources which Kubernetes can handle.

In this post, you have seen how to utilize different Airflow executors to improve your tasks' performance while simultaneously optimizing the costs.

Got Airflow issues? I would be happy to assist you.

Schedule a free Discovery call today

Apache Airflow, Which Executor to use in Production?

Celery Executor

Pros

Cons

Kubernetes Executor

Pros

Cons

Kubernetes Celery Executor

Comments

More from this blog

Apache's Opensource World Of Data Tools

ETL vs. ELT Data Pipelines

Apache Airflow Bad vs. Best Practices In Production - 2023

How to Setup and Run Apache Airflow Locally?

Command Palette

Celery Executor

Pros

Cons

Kubernetes Executor

Pros

Cons

Kubernetes Celery Executor

Comments

More from this blog