Apache Airflow

New
assess
First Added:May 17, 2026 Updated: May 18, 2026

Apache Airflow is a Python-centric workflow orchestrator for DAGs of tasks (operators, sensors, schedules) with a central metadata database, scheduler, and workers. We rate it assess: the default incumbent for data engineering batch/ETL, but heavy to operate; prefer Argo Workflows (trial) on Kubernetes or Dagu (assess) when you want lighter ops for new DAG work.

Blurb

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows.

Summary

Role: orchestrate Workflow steps (extract, transform, load, ML feature jobs, reports), not application deploy. Different lane from GitHub Actions / CI-CD Tools (PR build/test) or ArgoCD (GitOps sync).

When to assess / keep:

  • Mature data platform already on Airflow (operators, plugins, SLAs)
  • Python-first DAGs with rich scheduling (cron, backfill, datasets in Airflow 2.x+)
  • Team staffed to run scheduler, metadata DB, and worker pools

When to choose alternatives instead:

NeedPrefer
Steps are K8s pods, artifacts between tasksArgo Workflows
Single VM, YAML DAGs, minimal infraDagu
CI on merge to mainGitHub Actions
K8s-native CI CRDsTekton (assess)

Ops cost: Airflow expects persistent metadata (Postgres/MySQL), a scheduler process, executors (Celery, KubernetesExecutor, etc.), and monitoring. That is justified at data-platform scale; it is overkill for small automation.

Details

TopicNotes
ModelDAGs in Python (@dag, operators); UI for run history
ExecutorsLocal, Celery, Kubernetes; pick based on isolation and scale
SecretsConnections/variables in metadata DB; integrate Vault/cloud secret managers
TestingUnit-test DAG structure; use staging env for integration
SecurityLock down web UI, RBAC, and who can trigger DAGs

Garden pattern: assess for data/ETL estates; do not default new org-wide automation on Airflow if Argo Workflows or Dagu fits. Cross-link from Dagu and Argo Workflows when comparing orchestrators.

References