Installing Airflow (Local, Docker, Cloud)

Apache Airflow is a fantastic open-source platform that data engineers use to orchestrate workflows—like scheduling data tasks or automating complex pipelines—through Python code. To unlock its potential, you need to install it, and this guide is here to make that process crystal clear. Hosted on SparkCodeHub, we’ll walk you through three distinct ways to set up Airflow: locally on your computer, using Docker for a contained setup, and in the cloud for a scalable solution. Each method comes with precise, step-by-step instructions laid out in points, so you know exactly what to do at every turn. If you’re new to Airflow, this pairs perfectly with Airflow Fundamentals for a broader context, and you can explore related setup details like Airflow Metadata Database Setup once you’re installed.


Why Install Airflow?

Before we dive into the steps, let’s talk about why installing Airflow matters. It’s all about turning your tasks—like fetching data, processing it, and storing it—into Python scripts called Directed Acyclic Graphs (DAGs), which you can learn more about in Introduction to DAGs in Airflow. Airflow then runs these tasks on a schedule and lets you monitor them through a web interface, detailed in Airflow Web UI Overview. Whether you’re executing a simple script with BashOperator or connecting to Airflow with Apache Spark for big data, Airflow simplifies it all. Installing it is your starting point, and we’ll cover three methods: local for quick testing, Docker for consistency, and cloud for serious scale.

Installing Airflow Locally

Running Airflow on your own machine—whether it’s Windows, macOS, or Linux—is the easiest way to get started. It’s ideal for experimenting, learning, or managing small projects, putting everything you need right on your computer. Here’s how to do it, step by step.

Step 1: Verify Your Python Version

  1. Open your terminal: On Windows, press the Windows key, type “cmd” into the search bar, and press Enter to launch Command Prompt. On Mac, click the magnifying glass in the top-right corner, type “Terminal,” and hit Enter. On Linux, press Ctrl+Alt+T or search for “Terminal” in your applications menu.
  2. Check Python version: In the terminal, type python --version and press Enter. You’ll see output like “Python 3.8.5.” If it starts with 3.7, 3.8, or higher, you’re ready. If it’s “Python 2.7” or you get “command not found,” try python3 --version instead—some systems use “python3” for newer versions.
  3. Install Python if needed: If neither command works or your version is below 3.7, visit python.org. Click “Downloads” at the top, then pick the latest version (e.g., 3.10) for your system—Windows, macOS, or Linux. Download the installer by clicking the yellow button, double-click the file to run it, and follow the prompts. On Windows, check the box “Add Python to PATH” during installation. After installing, close and reopen your terminal, then type python --version (or python3 --version) again to confirm it’s 3.7 or higher.

Airflow relies on Python to run its scripts, so getting this right is essential before moving forward.

Step 2: Navigate to Your Home Directory

  1. Open your terminal: Use the same terminal from Step 1—Command Prompt on Windows, Terminal on Mac or Linux.
  2. Change to your home directory: Type cd ~ and press Enter if you’re on Mac or Linux—this takes you to your home folder, like /home/username or /Users/username. On Windows, type cd %userprofile% and press Enter instead—this goes to C:\Users\YourUsername. You’ll see the prompt change to reflect your home directory (e.g., C:\Users\YourUsername> on Windows).

This step sets you up in a familiar spot on your computer where we’ll install Airflow, keeping things organized.

Step 3: Create a Virtual Environment

  1. Run the venv command: In your terminal, type python -m venv airflow_env and press Enter. This creates a folder named airflow_env in your home directory with a fresh Python setup. On some systems (like Linux/Mac), you might need python3 -m venv airflow_env if python alone doesn’t work—use whichever matches your Python 3.7+ command from Step 1.
  2. Wait for creation: It’ll take a few seconds, and you’ll see the airflow_env folder appear in your home directory (e.g., /home/username/airflow_env or C:\Users\YourUsername\airflow_env).

A virtual environment keeps Airflow’s files separate from other Python projects, avoiding conflicts—it’s like a clean workspace just for Airflow.

Step 4: Activate the Virtual Environment

  1. Activate on Mac/Linux: In your terminal, type source airflow_env/bin/activate and press Enter. You’ll see (airflow_env) appear before your prompt, like (airflow_env) username@machine:~$.
  2. Activate on Windows: Type airflow_env\Scripts\activate and press Enter. You’ll see (airflow_env) in your prompt, like (airflow_env) C:\Users\YourUsername>.
  3. Confirm activation: Type python --version (or python3 --version) and press Enter to ensure it shows the Python version from your virtual environment—same as Step 1.

Activating switches your terminal to use this isolated Python setup, so everything we install next stays contained.

Step 5: Install Airflow Using pip

  1. Run the install command: In your activated terminal (with (airflow_env) showing), type pip install apache-airflow and press Enter. This downloads Airflow and its core dependencies from the internet—expect some text to scroll by as it works.
  2. Wait for completion: It’ll take a minute or two, depending on your internet speed. When it’s done, you’ll see your prompt return, like (airflow_env) username@machine:~$.
  3. Optional extras: If you want PostgreSQL support later, type pip install apache-airflow[postgres] instead and press Enter—this adds extra tools for a PostgreSQL database. Stick with the basic apache-airflow for now if you’re unsure.

This installs Airflow into your virtual environment, ready to use with its default settings—tweak them later in Airflow Configuration Options.

Step 6: Initialize the Airflow Database

  1. Run the init command: In your activated terminal, type airflow db init and press Enter. This sets up a small SQLite database to track your workflows—it’ll create a folder at ~/airflow (e.g., /home/username/airflow or C:\Users\YourUsername\airflow) with a file called airflow.db.
  2. Check the output: You’ll see some lines about creating tables, and it’ll finish in a few seconds. If it works, your prompt returns without errors.
  3. Verify the folder: Look in your home directory (use ls -a on Mac/Linux or dir on Windows) to see the airflow folder with airflow.cfg and airflow.db inside.

This database keeps Airflow’s records—task statuses and run history—and SQLite is simple for testing. For bigger setups, switch to PostgreSQL with Airflow Metadata Database Setup.

Step 7: Create a Folder for Your DAGs

  1. Make the folder: In your terminal, type mkdir ~/airflow/dags and press Enter. On Windows, use mkdir %userprofile%\airflow\dags. This creates a dags folder inside ~/airflow (e.g., /home/username/airflow/dags or C:\Users\YourUsername\airflow\dags).
  2. Confirm it’s there: Type ls ~/airflow (Mac/Linux) or dir %userprofile%\airflow (Windows) and press Enter—you’ll see dags listed.

This is where your workflow scripts go—Airflow scans it for DAGs, so keep it ready as shown in DAG File Structure Best Practices.

Step 8: Start the Airflow Webserver

  1. Open a terminal: Use your current terminal (still activated with (airflow_env)).
  2. Run the webserver: Type airflow webserver -p 8080 and press Enter. You’ll see startup messages—it’s now running a web interface.
  3. Check the UI: Open your web browser (Chrome, Firefox, etc.), type localhost:8080 into the address bar, and press Enter. After a few seconds, you’ll see Airflow’s homepage with a list of DAGs (empty for now).

This launches the UI where you’ll monitor workflows—keep this terminal open. Learn more in Airflow Web UI Overview.

Step 9: Start the Airflow Scheduler in a New Terminal

  1. Open a second terminal: On Windows, press Windows key, type “cmd,” and hit Enter for a new Command Prompt. On Mac/Linux, search “Terminal” again and open a fresh one.
  2. Navigate to your home directory: Type cd ~ (Mac/Linux) or cd %userprofile% (Windows) and press Enter.
  3. Activate the environment: Type source airflow_env/bin/activate (Mac/Linux) or airflow_env\Scripts\activate (Windows) and press Enter—look for (airflow_env) in the prompt.
  4. Run the scheduler: Type airflow scheduler and press Enter. You’ll see logs as it starts scanning your dags folder—keep this terminal running.

The scheduler runs your tasks on time—details in Introduction to Airflow Scheduling.

Step 10: Test with a Sample DAG

  1. Create a file: Open a text editor (Notepad on Windows, TextEdit on Mac, or any code editor like VS Code). Copy this code:
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def hello():
    print("Hello, local Airflow!")

with DAG(
    dag_id="local_test",
    start_date=datetime(2025, 1, 1),
    schedule_interval="@daily",
) as dag:
    task = PythonOperator(
        task_id="hello_task",
        python_callable=hello,
    )
  1. Save the file: Save it as test_dag.py in your dags folder—e.g., /home/username/airflow/dags/test_dag.py or C:\Users\YourUsername\airflow\dags\test_dag.py. On Windows, ensure it’s .py, not .txt (in Notepad, use “Save As,” pick “All Files,” and type test_dag.py).
  2. Check the UI: Go back to localhost:8080 in your browser, wait 10-20 seconds, and refresh—you’ll see “local_test” listed under DAGs.

This tests your setup—write more DAGs with Defining DAGs in Python.

Installing Airflow with Docker

Docker runs Airflow in a container, ensuring the same setup works on any machine—great for development or staging with consistency.

Step 1: Install Docker on Your Computer

  1. Visit Docker’s website: Open your browser, go to docker.com, and click “Get Started” at the top.
  2. Download Docker Desktop: Click “Download Docker Desktop,” pick your system (Windows, Mac, or Linux), and hit the download button. For Windows/Mac, you’ll get an installer; for Linux, follow their distro-specific guide.
  3. Run the installer: Double-click the downloaded file (e.g., Docker Desktop Installer.exe on Windows or .dmg on Mac), follow the prompts (accept defaults), and install. On Linux, use commands like sudo apt install docker.io (Ubuntu) per their guide.
  4. Start Docker: Open Docker Desktop from your applications menu—it’ll run in the background. On Linux, type sudo systemctl start docker and press Enter.
  5. Verify it’s working: In a terminal (Command Prompt on Windows, Terminal on Mac/Linux), type docker --version and press Enter—you’ll see something like “Docker version 20.10.7.”

Docker is your container engine—Airflow will run inside it.

Step 2: Pull the Official Airflow Image

  1. Open your terminal: Use Command Prompt (Windows) or Terminal (Mac/Linux).
  2. Pull the image: Type docker pull apache/airflow:latest and press Enter. This downloads Airflow’s latest version from Docker Hub—expect a wait (5-10 minutes) as it’s a big file.
  3. Check it’s downloaded: Type docker images and press Enter—you’ll see apache/airflow with “latest” listed.

This grabs Airflow’s pre-built container, ready to run.

Step 3: Create a Directory for Airflow Files

  1. Go to your home directory: In your terminal, type cd ~ (Mac/Linux) or cd %userprofile% (Windows) and press Enter.
  2. Make a folder: Type mkdir airflow-docker and press Enter—this creates ~/airflow-docker (e.g., /home/username/airflow-docker or C:\Users\YourUsername\airflow-docker).
  3. Add subfolders: Type mkdir airflow-docker/dags airflow-docker/logs airflow-docker/plugins and press Enter—this sets up dags, logs, and plugins inside airflow-docker.

These folders hold your workflows (dags), output (logs), and extras (plugins).

Step 4: Initialize the Database with a Temporary Container

  1. Run the init command: In your terminal, type docker run --rm -it apache/airflow:latest airflow db init and press Enter. This starts a temporary container, sets up the SQLite database, and exits when done—takes a few seconds.
  2. Check the output: You’ll see logs about creating tables, then it’ll stop with your prompt back.

This preps the database—SQLite works here, but we’ll adjust for persistence next.

Step 5: Start Airflow with Docker Compose

  1. Create a docker-compose file: In your text editor, paste this:
version: '3'
services:
  airflow:
    image: apache/airflow:latest
    ports:
      - "8080:8080"
    volumes:
      - ./dags:/opt/airflow/dags
      - ./logs:/opt/airflow/logs
      - ./plugins:/opt/airflow/plugins
    command: "standalone"
    environment:
      - AIRFLOW__CORE__EXECUTOR=SequentialExecutor
  1. Save the file: Save it as docker-compose.yml in ~/airflow-docker (e.g., /home/username/airflow-docker/docker-compose.yml or C:\Users\YourUsername\airflow-docker\docker-compose.yml).
  2. Run Docker Compose: In your terminal, type cd ~/airflow-docker (or cd %userprofile%\airflow-docker on Windows) and press Enter, then type docker-compose up and press Enter. It starts Airflow—wait a minute for logs to settle.
  3. Check the UI: Open your browser, go to localhost:8080, and log in (default: admin/admin)—you’ll see the Airflow interface.

This runs Airflow standalone—stop it with Ctrl+C. Explore Airflow Executors (Sequential, Local, Celery) for other options.

Installing Airflow in the Cloud (AWS Example)

Running Airflow in the cloud—like on AWS—gives you scale and reliability for production. We’ll use AWS Managed Workflows for Apache Airflow (MWAA).

Step 1: Sign Into AWS Console

  1. Open your browser: Use Chrome, Firefox, etc.
  2. Go to AWS: Type aws.amazon.com in the address bar and press Enter.
  3. Log in: Click “Sign In to the Console” at the top-right, enter your AWS email and password, and click “Sign In.” If you don’t have an account, click “Create a new AWS account,” fill out the form (email, password, etc.), and follow the setup.

You need an AWS account to proceed—sign up if new.

Step 2: Navigate to MWAA

  1. Search for MWAA: In the AWS Console, click the search bar at the top, type “MWAA,” and press Enter.
  2. Select MWAA: Click “Amazon Managed Workflows for Apache Airflow (MWAA)” from the results—this opens the MWAA dashboard.

This is AWS’s managed Airflow service—no server setup needed.

Step 3: Create an S3 Bucket

  1. Go to S3: In the AWS Console, click the search bar, type “S3,” and press Enter, then click “S3” in the results.
  2. Create a bucket: Click “Create bucket” (orange button), type a unique name (e.g., my-airflow-bucket-123), pick a region (e.g., “US East (N. Virginia) us-east-1”), and click “Create bucket” at the bottom.
  3. Upload a DAG: Click your bucket’s name, click “Upload,” click “Add files,” pick a DAG file (e.g., test_dag.py from local steps), and click “Upload.”

MWAA needs an S3 bucket for DAGs—DAG File Structure Best Practices applies here.

Step 4: Set Up an MWAA Environment

  1. Go back to MWAA: In the AWS Console, search “MWAA” again and click it.
  2. Create environment: Click “Create environment” (orange button), name it (e.g., “my-airflow”), pick Airflow version (e.g., “2.4.3”), and select your S3 bucket (my-airflow-bucket-123) under “DAG code in Amazon S3.”
  3. Configure settings: Scroll to “Networking,” pick a VPC (or use defaults), set “Access mode” to “Public,” and click “Next.” Accept defaults for “Environment class” (e.g., mw1.small) and “Execution role,” then click “Create environment.”
  4. Wait for setup: It takes 20-30 minutes—watch the status change to “Available.”

This spins up Airflow on AWS—manage it via Airflow CLI: Overview and Usage with AWS tools.

Step 5: Access the Airflow UI

  1. Find the UI link: In the MWAA dashboard, click your environment (“my-airflow”), and copy the “Airflow webserver URL” (e.g., https://xxxx.mwaa.amazonaws.com).
  2. Open the UI: Paste the URL into your browser, press Enter, log in with IAM credentials (set via AWS IAM if needed), and see your DAGs.

Your cloud Airflow is live—logs are in AWS CloudWatch.


Conclusion

Installing Airflow—whether locally, with Docker, or in the cloud—sets you up to orchestrate workflows with ease. Local is quick and hands-on, Docker ensures consistency, and cloud offers scale. Follow these steps, and you’re ready to write DAGs in Defining DAGs in Python and monitor them in Monitoring Task Status in UI. Dive deeper with Airflow Concepts: DAGs, Tasks, and Workflows!