Table of contents
tl;dr get the bash script
Have Python installed in your system, 3.8+
Create a folder
mkdir -p "/Users/$(whoami)/projects/airflow-local"
export AIRFLOW_HOME="/Users/$(whoami)/projects/airflow-local"
cd airflow-local
Set airflow version
AIRFLOW_VERSION=2.4.3
Set airflow version
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
Set constraints version
CONSTRAINT_URL="
https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt
"
Create virtualenv and install airflow
cd "/Users/$(whoami)/projects/airflow-local"
python -m venv venv
source venv/bin/activate
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
mkdir -p "${AIRFLOW_HOME}/dags"
Run airflow with
airflow standalone
This will run, creating an SQLite DB. This is not really production friendly. You can use them to run basic pipelines and playgrounds. We can also make our local environment production friendly. We will see that in a bit.
The Airflow Project
standalone_admin_password.txt
has the password for your local airflow password isadmin
dags
the folder is where you add your dags, a.k.a pipelineslogs
the folder will have your logs
Using LocalExecutor
Airflow, by default, used SequentialExecutor
this is not great for production-level systems. When exploring Airflow, it is a good idea to use LocalExecutor
along with Postgres or MySQL right away.
Stop the Airflow instance. Update the airflow.cfg
with the following configs. Replace sql_alchemy_conn
with respective DB credentials
[core]
load_examples = False
executor=LocalExecutor
[database]
sql_alchemy_conn = postgresql://<pg-user>:<pg-password>@<host>:<port:5423>/<db-name>
If you use Postgres, You will need psycopg
the library, which you can download using the following command
pip install "apache-airflow[postgres]"
Sample Dag
Copy the example dag from the Airflow repo and place it under the dags folder
Restart Airflow
airflow standalone
Airflow has a steep learning curve. I can help you adopt Airflow for your Data engineering pipeline and your team's ecosystem. Schedule a free call today.