Airflow Executors (Sequential, Local, Celery)

Apache Airflow is a leading open-source platform for orchestrating workflows, and its Executors are the engines that power task execution. Whether you’re running a simple script with BashOperator or a complex pipeline with Airflow with Apache Spark, the Executor determines how your tasks get done—sequentially, in parallel, or across multiple machines. This guide, hosted on SparkCodeHub, dives deep into Airflow’s three core Executors: SequentialExecutor, LocalExecutor, and CeleryExecutor. We’ll explore what they do, how to set them up, and when to use each, with step-by-step instructions where needed. New to Airflow? Start with Airflow Fundamentals, and pair this with Airflow Architecture (Scheduler, Webserver, Executor) for a complete view.

What Are Airflow Executors?

In Airflow, Executors are the components that take tasks queued by the Scheduler—explained in Introduction to Airflow Scheduling—and actually run them. Think of the Scheduler as the planner who decides when tasks should happen, and the Executor as the worker who makes it happen. When you define a Directed Acyclic Graph (DAG) in Python—covered in Introduction to DAGs in Airflow—it includes tasks with dependencies (Task Dependencies (set_upstream, set_downstream)). The Executor picks up these tasks and executes their code—whether it’s a Python function via PythonOperator or a shell command. Airflow offers three main Executors: SequentialExecutor runs one task at a time, LocalExecutor runs multiple tasks on your machine, and CeleryExecutor distributes tasks across workers. Each has its strengths, and choosing the right one shapes how efficiently your workflows run.

Why Executors Matter

The Executor you pick directly impacts Airflow’s performance and scalability. Without an Executor, the Scheduler’s queues would sit idle—tasks wouldn’t run, and your DAGs would be stuck as code. The SequentialExecutor is simple but slow, perfect for testing a small DAG on your laptop. The LocalExecutor ramps up speed by running tasks in parallel on one machine, ideal for moderate workloads. The CeleryExecutor takes it further, spreading tasks across multiple machines for large-scale production—crucial for high-volume jobs or integrating with Airflow with Celery Executor. Picking the right one ensures your workflows—tracked in the metadata database (Airflow Metadata Database Setup) and viewed in Airflow Web UI Overview)—run efficiently.

SequentialExecutor

The SequentialExecutor is Airflow’s default—it’s straightforward and runs tasks one by one.

How SequentialExecutor Works

When you start Airflow after installation—steps in Installing Airflow (Local, Docker, Cloud)—it uses SequentialExecutor unless you change it. The Scheduler queues tasks from your DAGs (stored in ~/airflow/dags, per DAG File Structure Best Practices), and SequentialExecutor picks them up, running each task in order on your machine. If your DAG has three tasks—task1, task2, task3—with task1 >> task2 >> task3, it finishes task1 before starting task2, then task3. It’s single-threaded, so no overlap—perfect for simplicity but not speed.

When to Use SequentialExecutor

Use it for testing or tiny workloads—like debugging a DAG with DAG Testing with Python. It’s low-resource, needing no extra setup, but it’s slow for multiple tasks or complex DAGs—think one worker on an assembly line.

Setting Up SequentialExecutor

It’s the default—no steps needed beyond installation. After airflow db init, check airflow.cfg in ~/airflow—under [core], you’ll see executor = SequentialExecutor. Start services with airflow scheduler and airflow webserver -p 8080 (Airflow CLI: Overview and Usage), and it’s running.

Example DAG

from airflow import DAG
from airflow.operators.dummy import DummyOperator
from datetime import datetime

with DAG(
    dag_id="sequential_test",
    start_date=datetime(2025, 1, 1),
    schedule_interval="@daily",
) as dag:
    t1 = DummyOperator(task_id="task1")
    t2 = DummyOperator(task_id="task2")
    t3 = DummyOperator(task_id="task3")
    t1 >> t2 >> t3

Trigger it with airflow dags trigger -e 2025-04-07 sequential_test—tasks run one at a time.

LocalExecutor

The LocalExecutor steps up performance by running multiple tasks in parallel on one machine.

How LocalExecutor Works

LocalExecutor uses your machine’s CPU cores to run tasks concurrently. If your DAG has task1 >> task2 and task3 with no dependency, LocalExecutor might run task2 and task3 together after task1, depending on cores available—say, four tasks on a four-core system. It’s still single-machine but multi-threaded, making it faster than SequentialExecutor for moderate workloads.

When to Use LocalExecutor

It’s great for small-to-medium setups—faster than SequentialExecutor without needing extra machines. Use it when testing’s done and you’re running real DAGs locally or on a beefy server, but switch to CeleryExecutor for distributed scale.

Setting Up LocalExecutor

Steps to Configure LocalExecutor

Install Airflow: Follow Installing Airflow (Local, Docker, Cloud)—activate your environment (source airflow_env/bin/activate or airflow_env\Scripts\activate).
Initialize the Database: Type airflow db init and press Enter—it sets up SQLite at ~/airflow/airflow.db.
Edit airflow.cfg: Open ~/airflow/airflow.cfg with a text editor—type nano ~/airflow/airflow.cfg (Mac/Linux) or use Notepad (Windows). Find [core], locate executor = SequentialExecutor, change it to executor = LocalExecutor, save (Ctrl+O, Enter, Ctrl+X in nano), and close.
Start the Scheduler: In your terminal, type airflow scheduler and press Enter—it uses LocalExecutor to run tasks.
Start the Webserver: In another terminal (activate environment), type airflow webserver -p 8080 and press Enter—check localhost:8080.

Alternative with Environment Variable

Type export AIRFLOW__CORE__EXECUTOR=LocalExecutor (Mac/Linux) or set AIRFLOW__CORE__EXECUTOR=LocalExecutor (Windows) and press Enter before starting services—overrides airflow.cfgAirflow Configuration Options.

Example DAG

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG(
    dag_id="local_test",
    start_date=datetime(2025, 1, 1),
    schedule_interval="@daily",
) as dag:
    t1 = BashOperator(task_id="task1", bash_command="sleep 5")
    t2 = BashOperator(task_id="task2", bash_command="echo 'Task 2'")
    t3 = BashOperator(task_id="task3", bash_command="echo 'Task 3'")
    t1 >> [t2, t3]

Trigger with airflow dags trigger -e 2025-04-07 local_test—t2 and t3 run together after t1.

CeleryExecutor

CeleryExecutor distributes tasks across multiple worker machines, ideal for large-scale production.

How CeleryExecutor Works

CeleryExecutor uses Celery—a distributed task queue—to send tasks to worker nodes. The Scheduler queues tasks, CeleryExecutor pushes them to a message broker (e.g., RabbitMQ), and workers pick them up—each worker runs tasks in parallel, scaling beyond one machine. If your DAG has task1 >> [t2, t3], t2 might run on Worker 1, t3 on Worker 2, all coordinated via the broker.

When to Use CeleryExecutor

Use it for heavy workloads—hundreds of tasks, multiple DAGs, or production environments needing high throughput. It’s complex but scales horizontally—perfect for enterprise setups or cloud deployments like Airflow with Celery Executor.

Setting Up CeleryExecutor

Steps to Configure CeleryExecutor

Install Airflow with Celery: Activate your environment, type pip install apache-airflow[celery] and press Enter—it adds Celery support.
Install RabbitMQ: On Mac, type brew install rabbitmq, press Enter, then brew services start rabbitmq. On Linux (Ubuntu), type sudo apt update, press Enter, then sudo apt install rabbitmq-server, and sudo systemctl start rabbitmq-server. On Windows, download from rabbitmq.com, run the installer, and start via the Start menu. Verify with rabbitmqctl status—it shows running services.
Install Redis (Alternative Broker): On Mac, brew install redis, then brew services start redis. On Linux, sudo apt install redis-server, then sudo systemctl start redis. On Windows, download from redis.io, install, and start manually—use redis-server.
Configure airflow.cfg: Open ~/airflow/airflow.cfg, set [core] executor = CeleryExecutor, under [celery] set broker_url = amqp://guest:guest@localhost:5672// (RabbitMQ) or redis://localhost:6379/0 (Redis), and result_backend = db+postgresql://postgres:password@localhost:5432/airflow (needs PostgreSQL—see Step 5). Save and close.
Set Up PostgreSQL: Follow Airflow Metadata Database Setup—install PostgreSQL, create an airflow database, update sql_alchemy_conn, and run airflow db init.
Start the Scheduler: Type airflow scheduler and press Enter—it queues tasks via Celery.
Start a Worker: In another terminal, activate, type airflow celery worker, and press Enter—it picks up tasks from the broker.
Start the Webserver: In a third terminal, activate, type airflow webserver -p 8080, and press Enter—check localhost:8080.

Example DAG

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG(
    dag_id="celery_test",
    start_date=datetime(2025, 1, 1),
    schedule_interval="@daily",
) as dag:
    t1 = BashOperator(task_id="task1", bash_command="sleep 5")
    t2 = BashOperator(task_id="task2", bash_command="echo 'Task 2'")
    t3 = BashOperator(task_id="task3", bash_command="echo 'Task 3'")
    t1 >> [t2, t3]

Trigger with airflow dags trigger -e 2025-04-07 celery_test—workers handle t2 and t3 in parallel.

Comparing Executors

SequentialExecutor is simplest—one task, one machine, no setup. LocalExecutor adds parallelism on one machine—faster, still local. CeleryExecutor scales across machines—complex but powerful. Use Sequential for tests, Local for medium loads, Celery for big production—tune with Airflow Performance Tuning.

FAQ: Common Questions About Airflow Executors

Here are frequent questions about Executors, with detailed answers from online sources.

1. How do I know which Executor Airflow is using?

Type airflow config get-value core executor and press Enter—it shows “SequentialExecutor,” “LocalExecutor,” or “CeleryExecutor” from airflow.cfg (Airflow Configuration Options). If unset, it’s SequentialExecutor after Installing Airflow (Local, Docker, Cloud).

2. Can I switch Executors without reinstalling Airflow?

Yes—edit airflow.cfg, change executor under [core] (e.g., executor = LocalExecutor), save, and restart airflow scheduler and airflow webserver -p 8080 (Airflow CLI: Overview and Usage). For CeleryExecutor, install extras (pip install apache-airflow[celery]) and set up a broker first.

3. Why is SequentialExecutor so slow compared to LocalExecutor?

SequentialExecutor runs one task at a time—no overlap, even if tasks are independent. LocalExecutor uses your CPU cores (e.g., 4 tasks on 4 cores)—it’s multi-threaded, so independent tasks like t2 and t3 run together after t1, cutting wait time—see Task Concurrency and Parallelism.

4. What do I need besides Airflow to use CeleryExecutor?

You need a message broker (RabbitMQ or Redis) and a result backend (e.g., PostgreSQL). Install RabbitMQ or Redis (Steps 2-3 for CeleryExecutor), set broker_url and result_backend in airflow.cfg, and install apache-airflow[celery]—full setup in Airflow with Celery Executor.

5. How do I troubleshoot a “worker not picking up tasks” issue with CeleryExecutor?

Check if the worker’s running—type airflow celery worker in a terminal and look for logs. Ensure RabbitMQ/Redis is active (rabbitmqctl status or redis-cli ping—should say “PONG”). Verify broker_url in airflow.cfg matches (e.g., amqp://guest:guest@localhost:5672//), and restart all services—logs in Task Logging and Monitoring help pinpoint issues.

6. Can I use LocalExecutor on a multi-core server for production?

Yes—it’s fine for moderate production on a strong server (e.g., 16 cores), running many tasks in parallel. But it’s still one machine—if it fails, everything stops. CeleryExecutor spreads risk across workers—use LocalExecutor if one server’s enough, Celery for redundancy and scale.

7. How do I see how many tasks an Executor is running at once?

Check the UI at localhost:8080—under “DAGs,” click your DAG, and see “Running” tasks in Airflow Graph View Explained. Or, tweak parallelism in airflow.cfg (default 32) and max_active_tasks_per_dag (Task Concurrency and Parallelism)—LocalExecutor uses your cores, Celery scales with workers.

Conclusion

Airflow Executors—Sequential, Local, Celery—drive your workflows, from simple tests to distributed production. Set them up with Installing Airflow (Local, Docker, Cloud), craft DAGs in Defining DAGs in Python, and monitor with Monitoring Task Status in UI. Optimize with Airflow Performance Tuning and explore more in Airflow Concepts: DAGs, Tasks, and Workflows!