Mastering Airflow with REST API: A Comprehensive Guide

Apache Airflow is a robust platform for orchestrating workflows, and its integration with the REST API enhances its capabilities by enabling programmatic control, monitoring, and triggering of DAGs and tasks from external systems or scripts. Whether you’re running tasks with PythonOperator, sending notifications via SlackOperator, or connecting to systems like Airflow with Apache Spark, the REST API provides a powerful interface for automation and integration. This comprehensive guide, hosted on SparkCodeHub, explores Airflow with REST API—how it works, how to set it up, and best practices for optimal use. We’ll provide detailed step-by-step instructions, practical examples with code, and an extensive FAQ section. For foundational knowledge, start with Airflow Web UI Overview and pair this with Defining DAGs in Python.


What is Airflow with REST API?

Airflow with REST API refers to the integration of Apache Airflow’s workflow orchestration capabilities with its built-in RESTful API, allowing external systems, scripts, or users to interact with Airflow programmatically. Managed by Airflow’s Scheduler and Executor components (Airflow Architecture (Scheduler, Webserver, Executor)), this integration enables operations such as triggering DAG runs, fetching task statuses, managing DAGs, and monitoring workflows defined in the ~/airflow/dags directory (DAG File Structure Best Practices). The REST API, exposed via Airflow’s Webserver, leverages the apache-airflow-providers-http package with operators like SimpleHttpOperator and hooks like HttpHook for making API calls within DAGs, while external clients (e.g., Python scripts, cURL) use endpoints like /api/v1/dags/{dag_id}/dagRuns to interact with Airflow. Task states are tracked in the metadata database (airflow.db), with execution monitored via the Web UI (Monitoring Task Status in UI) and logs centralized (Task Logging and Monitoring). This integration bridges Airflow’s internal orchestration with external automation, making it ideal for dynamic workflows, CI/CD pipelines, and system integrations.

Core Components in Detail

Airflow’s REST API integration relies on several core components, each with specific roles and configurable parameters. Below, we explore these components in depth, including their functionality, parameters, and practical code examples.

1. SimpleHttpOperator: Makes HTTP Requests from Airflow

The SimpleHttpOperator executes HTTP requests (e.g., GET, POST) within Airflow DAGs, enabling tasks to call the Airflow REST API or external APIs.

  • Key Functionality: Sends HTTP requests to specified endpoints, supporting authentication, headers, and data payloads—ideal for triggering DAGs or fetching statuses internally.
  • Parameters:
    • task_id (str): Unique identifier (e.g., "call_rest_api").
    • http_conn_id (str): Airflow Connection ID for HTTP endpoint (default: "http_default").
    • endpoint (str): API endpoint (e.g., "api/v1/dags/my_dag/dagRuns").
    • method (str): HTTP method (e.g., "POST", "GET")—defines request type.
    • data (dict or str): Request payload (e.g., {"conf": {"key": "value"} })—supports JSON or string.
    • headers (dict): HTTP headers (e.g., {"Content-Type": "application/json", "Authorization": "Bearer <token>"}</token>).
    • response_check (callable): Function to validate response (e.g., lambda resp: resp.status_code == 200).
    • log_response (bool): Logs response content (default: False).
    • do_xcom_push (bool): Pushes response to XCom (default: True).
  • Code Example:
from airflow import DAG
from airflow.providers.http.operators.http import SimpleHttpOperator
from datetime import datetime

with DAG(
    dag_id="http_operator_example",
    start_date=datetime(2025, 4, 1),
    schedule_interval=None,
    catchup=False,
) as dag:
    trigger_dag = SimpleHttpOperator(
        task_id="trigger_dag_via_api",
        http_conn_id="airflow_api",
        endpoint="api/v1/dags/http_operator_example/dagRuns",
        method="POST",
        data='{"conf": {"param": "test"}, "execution_date": "{ { execution_date.isoformat() } }"}',
        headers={"Content-Type": "application/json", "Authorization": "Bearer <your-token>"},
        response_check=lambda response: response.status_code == 201,
        log_response=True,
        do_xcom_push=True,
    )

This triggers the same DAG via the REST API, validating a 201 response.

2. HttpHook: Programmatic HTTP Requests

The HttpHook provides programmatic access to HTTP requests within Airflow, enabling custom API interactions in Python tasks.

  • Key Functionality: Executes HTTP requests with fine-grained control—e.g., triggering DAGs, fetching statuses—using Airflow’s connection system.
  • Parameters:
    • http_conn_id (str): Connection ID (default: "http_default").
    • method (str): HTTP method (e.g., "GET", "POST").
    • Methods: run(endpoint, data, headers)—sends requests and returns responses.
  • Code Example:
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.http.hooks.http import HttpHook
from datetime import datetime

def trigger_dag_api():
    hook = HttpHook(http_conn_id="airflow_api", method="POST")
    response = hook.run(
        endpoint="api/v1/dags/http_hook_example/dagRuns",
        data='{"conf": {"param": "hook_test"}, "execution_date": "2025-04-07T00:00:00Z"}',
        headers={"Content-Type": "application/json", "Authorization": "Bearer <your-token>"}
    )
    print(f"Response: {response.json()}")

with DAG(
    dag_id="http_hook_example",
    start_date=datetime(2025, 4, 1),
    schedule_interval=None,
    catchup=False,
) as dag:
    trigger_task = PythonOperator(
        task_id="trigger_dag_via_hook",
        python_callable=trigger_dag_api,
    )

This triggers a DAG run via HttpHook and prints the response.

3. REST API Endpoints: External Interaction with Airflow

Airflow’s REST API endpoints allow external systems to interact with Airflow, managed via the Webserver—e.g., /api/v1/dags/{dag_id}/dagRuns for triggering runs.

  • Key Functionality: Provides CRUD operations for DAGs, tasks, and runs—e.g., trigger, monitor, update—via HTTP requests with JSON payloads.
  • Key Endpoints:
    • GET /api/v1/dags: Lists all DAGs.
    • POST /api/v1/dags/{dag_id}/dagRuns: Triggers a DAG run.
    • GET /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}: Fetches run status.
    • GET /api/v1/dags/{dag_id}/tasks/{task_id}: Fetches task details.
  • Parameters:
    • Headers: Authorization: Bearer <token></token>—JWT token for auth.
    • Body: JSON payload (e.g., {"conf": {"key": "value"} })—for POST requests.
  • Code Example (External cURL):
curl -X POST "http://localhost:8080/api/v1/dags/rest_api_example/dagRuns" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{"conf": {"param": "test"}, "execution_date": "2025-04-07T00:00:00Z"}'

This triggers a DAG run externally via cURL.

4. Connections: Airflow Connection IDs (e.g., airflow_api)

Airflow Connections configure REST API access, centralizing credentials for operators and hooks.

  • Key Functionality: Stores API endpoint details—e.g., host, token—for secure, reusable access within DAGs or external clients.
  • Parameters:
    • conn_id (str): Unique identifier (e.g., airflow_api).
    • conn_type (str): http—specifies HTTP connection.
    • host (str): Airflow Webserver URL (e.g., http://localhost:8080).
    • password (str): API token (e.g., <jwt-token><jwt-token>—generated via CLI or UI).
    • extra (dict): JSON config (e.g., {"endpoint": "api/v1"})—optional.
  • Code Example (UI Setup):
    • In Airflow UI: Admin > Connections > +
    • Conn Id: airflow_api
    • Conn Type: HTTP
    • Host: http://localhost:8080
    • Password: <your-jwt-token></your-jwt-token> (generate via airflow users create or UI)
    • Save

This connection enables secure API calls within Airflow or externally.


Key Parameters for Airflow with REST API

Parameters in airflow.cfg and operator configurations fine-tune the integration:

  • http_conn_id: Connection ID for HTTP API (default: "http_default")—used by SimpleHttpOperator and HttpHook.
  • endpoint: API endpoint (e.g., "api/v1/dags/{dag_id}/dagRuns").
  • method: HTTP method (e.g., "POST", "GET").
  • data: Request payload (e.g., {"conf": {"key": "value"} }).
  • headers: HTTP headers (e.g., {"Authorization": "Bearer <token>"}</token>).
  • response_check: Response validation function (e.g., lambda resp: resp.status_code == 200).
  • webserver_url: Webserver base URL in airflow.cfg (e.g., http://localhost:8080)—defines API access point.
  • auth_type: Authentication method in airflow.cfg (e.g., JWT)—enables API security.

These parameters ensure precise control over REST API interactions within Airflow workflows.


Setting Up Airflow with REST API: Step-by-Step Guide

Let’s configure Airflow with REST API access, secure it with JWT authentication, and run a sample DAG to trigger itself via the API.

Step 1: Set Up Your Airflow Environment with REST API

  1. Install Docker: Install Docker Desktop—e.g., on macOS: brew install docker. Start Docker and verify: docker --version.
  2. Install Airflow with HTTP Support: Open your terminal, navigate to your home directory (cd ~), and create a virtual environment (python -m venv airflow_env). Activate it—source airflow_env/bin/activate on Mac/Linux or airflow_env\Scripts\activate on Windows—then install Airflow with HTTP support (pip install "apache-airflow[http]").
  3. Enable REST API: Edit ~/airflow/airflow.cfg: ```ini [webserver] web_server_host = 0.0.0.0 web_server_port = 8080 expose_config = True

[api] auth_backends = airflow.api.auth.backend.basic_auth ``` 4. Initialize the Database: Run airflow db init to create the metadata database at ~/airflow/airflow.db. 5. Create Admin User: Generate an admin user for API access: airflow users create -u admin -p admin -r Admin -e admin@example.com -f Admin -l User. 6. Generate JWT Token: Use the Airflow CLI or UI to generate a token (future versions may simplify this)—for now, use Basic Auth or configure JWT manually (e.g., via external script). Example token: <jwt-token></jwt-token> (placeholder). 7. Configure Airflow API Connection: In Airflow UI (localhost:8080, post-service start), go to Admin > Connections, click “+”, set:

  • Conn Id: airflow_api
  • Conn Type: HTTP
  • Host: http://localhost:8080
  • Password: <jwt-token><jwt-token> (or use Basic Auth with admin:admin)
  • Save

8. Start Airflow Services: In one terminal, run airflow webserver -p 8080. In another, run airflow scheduler.

Step 2: Create a Sample DAG

  1. Open a Text Editor: Use Visual Studio Code or any plain-text editor—ensure .py output.
  2. Write the DAG Script: Define a DAG to trigger itself via the REST API:
  • Copy this code:
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.http.operators.http import SimpleHttpOperator
from datetime import datetime

def log_execution():
    print("DAG executed successfully")

with DAG(
    dag_id="rest_api_demo",
    start_date=datetime(2025, 4, 1),
    schedule_interval=None,
    catchup=False,
) as dag:
    log_task = PythonOperator(
        task_id="log_execution",
        python_callable=log_execution,
    )

    trigger_self = SimpleHttpOperator(
        task_id="trigger_self",
        http_conn_id="airflow_api",
        endpoint="api/v1/dags/rest_api_demo/dagRuns",
        method="POST",
        data='{"conf": {"triggered_by": "API"}, "execution_date": "{ { execution_date.isoformat() } }"}',
        headers={"Content-Type": "application/json", "Authorization": "Bearer <your-token>"},
        response_check=lambda response: response.status_code == 201,
        log_response=True,
        do_xcom_push=True,
    )

    log_task >> trigger_self
  • Save as rest_api_demo.py in ~/airflow/dags. Replace <your-token></your-token> with your JWT token or use Basic Auth.

Step 3: Execute and Monitor the DAG with REST API

  1. Verify API Access: Test the API—e.g., curl -X GET "http://localhost:8080/api/v1/dags" -H "Authorization: Bearer <your-token>"</your-token>—ensure it returns DAGs.
  2. Trigger the DAG: At localhost:8080, toggle “rest_api_demo” to “On,” click “Trigger DAG” for April 7, 2025. In Graph View, monitor:
  • log_execution: Executes, turns green.
  • trigger_self: Triggers another DAG run, turns green.

3. Check DAG Runs: In Airflow UI, see multiple runs—initial trigger and API-triggered run—or check via curl -X GET "http://localhost:8080/api/v1/dags/rest_api_demo/dagRuns" -H "Authorization: Bearer <your-token>"</your-token>. 4. View Logs: In Graph View, click trigger_self > “Log”—see API request and response details. 5. Retry Task: If trigger_self fails (e.g., due to auth issues), fix it, click “Clear,” and retry—updates status on success.

This setup demonstrates Airflow triggering itself via the REST API, monitored via the UI.


Key Features of Airflow with REST API

Airflow’s REST API integration offers powerful features, detailed below.

Programmatic DAG Triggering

The SimpleHttpOperator and REST API endpoints (e.g., /api/v1/dags/{dag_id}/dagRuns) enable programmatic DAG triggering, using method="POST" and data (e.g., {"conf": {"key": "value"} }). This supports automation from external systems or self-triggering within DAGs.

Example: Self-Triggering

trigger_self triggers another run—demonstrates dynamic workflow control.

Task and DAG Status Monitoring

Endpoints like GET /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} fetch real-time status, while HttpHook retrieves data programmatically. Configurable with endpoint and headers, this provides visibility into workflow execution.

Example: Status Check

External curl or HttpHook fetches run status—integrates monitoring into scripts.

Flexible HTTP Integration

SimpleHttpOperator and HttpHook support diverse HTTP methods (e.g., GET, POST) with data and headers (e.g., {"Authorization": "Bearer <token>"}</token>), enabling interaction with Airflow’s API or external services—e.g., triggering Slack notifications.

Example: API Flexibility

trigger_dag_via_api posts to Airflow’s API—extends to any HTTP endpoint.

Real-Time Monitoring in UI

Graph View tracks API task statuses—green for success, red for failure—updated from the metadata database, with logs detailing API calls and responses. This ensures full visibility into API-driven workflows (Airflow Metrics and Monitoring Tools).

Example: API Task Oversight

trigger_self turns green—logs show API response, tracked in Graph View (Airflow Graph View Explained).

Secure Authentication

The REST API uses JWT or Basic Auth via Authorization headers (e.g., Bearer <token></token>), configured in airflow.cfg (auth_backends) and Connections (e.g., airflow_api). This secures API access, protecting workflow operations.

Example: Secure Calls

headers with token ensures authenticated API access—prevents unauthorized triggers.


Best Practices for Airflow with REST API

Optimize this integration with these detailed guidelines:

  • Secure API Access: Use JWT auth—set auth_backends=airflow.api.auth.backend.default and store tokens in Connections—avoid hardcoding credentials Airflow Configuration Basics.
  • Test API Calls: Validate endpoints—e.g., via curl or Postman—before DAG runs DAG Testing with Python.
  • Tune API Performance: Set response_check (e.g., lambda resp: resp.status_code == 200) and retry logic—monitor with Webserver logs Airflow Performance Tuning.
  • Limit API Usage: Avoid excessive calls—e.g., batch triggers—to prevent Webserver overload.
  • Monitor Post-Trigger: Check Graph View and API logs—e.g., failed trigger_self signals an issue—for quick resolution Airflow Graph View Explained.
  • Persist Logs: Enable Webserver logging—e.g., log_level=DEBUG—to capture API interactions Task Logging and Monitoring.
  • Document Endpoints: Track http_conn_id, endpoint, and headers—e.g., in a README—for team clarity DAG File Structure Best Practices.
  • Handle Time Zones: Align execution_date in API calls—e.g., adjust for PDT Time Zones in Airflow Scheduling.

These practices ensure secure, efficient REST API integration.


FAQ: Common Questions About Airflow with REST API

Here’s an expanded set of answers to frequent questions from Airflow users.

1. Why does SimpleHttpOperator fail to call the API?

http_conn_id may be invalid—e.g., wrong token—test with curl (Airflow Configuration Basics).

2. How do I debug REST API failures?

Check trigger_self logs—e.g., “401 Unauthorized”—then verify auth in Airflow UI (Task Logging and Monitoring).

3. Why are API calls slow?

Webserver overload—adjust workers in airflow.cfg (e.g., 4)—monitor with logs (Airflow Performance Tuning).

4. How do I retrieve API responses dynamically?

Use do_xcom_push=True—e.g., trigger_self pushes response to XCom (Airflow XComs: Task Communication).

5. Can I trigger multiple DAGs via API in one task?

Yes—chain SimpleHttpOperator tasks or use a loop in HttpHook—e.g., trigger dag1, dag2 (Airflow Executors (Sequential, Local, Celery)).

6. Why does API authentication fail?

Token may be expired—regenerate via CLI or UI—test with curl (DAG Views and Task Logs).

7. How do I monitor API usage?

Use Webserver logs or integrate Prometheus—e.g., airflow_api_requests_total (Airflow Metrics and Monitoring Tools).

8. Can an external system trigger an Airflow DAG via API?

Yes—use POST /api/v1/dags/{dag_id}/dagRuns—e.g., via cURL or Python requests (Triggering DAGs via UI).


Conclusion

Mastering Airflow with REST API enables programmatic workflow control—set it up with Installing Airflow (Local, Docker, Cloud), craft DAGs via Defining DAGs in Python, and monitor with Airflow Graph View Explained. Explore more with Airflow Concepts: DAGs, Tasks, and Workflows and Customizing Airflow Web UI!