How to Setup and Run Apache Airflow Locally?

Get started with Apache Airflow

ยท

2 min read

tl;dr get the bash script

  1. Have Python installed in your system, 3.8+

  2. Create a folder

mkdir -p "/Users/$(whoami)/projects/airflow-local"
export AIRFLOW_HOME="/Users/$(whoami)/projects/airflow-local"
  1. cd airflow-local

  2. Set airflow version AIRFLOW_VERSION=2.4.3

  3. Set airflow version PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"

  4. Set constraints version CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"

  5. Create virtualenv and install airflow

cd "/Users/$(whoami)/projects/airflow-local"
python -m venv venv
source venv/bin/activate
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
  1. mkdir -p "${AIRFLOW_HOME}/dags"

  2. Run airflow with airflow standalone

This will run, creating an SQLite DB. This is not really production friendly. You can use them to run basic pipelines and playgrounds. We can also make our local environment production friendly. We will see that in a bit.

The Airflow Project

  1. standalone_admin_password.txt has the password for your local airflow password is admin

  2. dags the folder is where you add your dags, a.k.a pipelines

  3. logs the folder will have your logs

Using LocalExecutor

Airflow, by default, used SequentialExecutor this is not great for production-level systems. When exploring Airflow, it is a good idea to use LocalExecutor along with Postgres or MySQL right away.

Stop the Airflow instance. Update the airflow.cfg with the following configs. Replace sql_alchemy_conn with respective DB credentials

[core]
load_examples = False
executor=LocalExecutor

[database]
sql_alchemy_conn = postgresql://<pg-user>:<pg-password>@<host>:<port:5423>/<db-name>

If you use Postgres, You will need psycopg the library, which you can download using the following command

pip install "apache-airflow[postgres]"

Sample Dag

Copy the example dag from the Airflow repo and place it under the dags folder

Restart Airflow

airflow standalone


Airflow has a steep learning curve. I can help you adopt Airflow for your Data engineering pipeline and your team's ecosystem. Schedule a free call today.

Did you find this article valuable?

Support Data and DevOps by becoming a sponsor. Any amount is appreciated!

ย