Airflow Configuration Options

Apache Airflow is a versatile open-source platform that empowers data engineers to orchestrate workflows with precision, and its flexibility stems from a robust set of configuration options. These settings let you tailor Airflow to your specific needs—whether you’re tweaking performance, setting up security, or integrating with external systems. This guide, hosted on SparkCodeHub, dives deep into Airflow’s configuration options, exploring how they work, why they matter, and how to adjust them effectively. We’ll cover the essentials, from the airflow.cfg file to environment variables, and provide practical insights to get you started. New to Airflow? Pair this with Airflow Fundamentals, and explore related topics like Airflow Architecture (Scheduler, Webserver, Executor) for a fuller picture.


What Are Airflow Configuration Options?

Airflow’s configuration options are the knobs and dials that control how the platform behaves. They’re settings you can adjust to define everything from how tasks are executed to how the web interface looks, all centralized in a file called airflow.cfg or overridden with environment variables. When you install Airflow—detailed in Installing Airflow (Local, Docker, Cloud)—it creates this file in your Airflow home directory (typically ~/airflow), packed with defaults that work out of the box. But the real power comes when you tweak these options to fit your workflows—whether you’re running a simple DAG with BashOperator or scaling up with Airflow with Apache Spark. These settings touch every part of Airflow’s architecture—the Scheduler, Webserver, Executor, and more—making them essential for customization.

The airflow.cfg File

The airflow.cfg file is Airflow’s configuration hub, a text file where most options live by default.

Where to Find airflow.cfg

When you first run airflow db init—part of the setup process in Airflow Metadata Database Setup—Airflow creates a folder at ~/airflow (e.g., /home/username/airflow or C:\Users\YourUsername\airflow) and drops airflow.cfg inside it. This is your Airflow home directory, and you can open the file with any text editor—Notepad on Windows, TextEdit on Mac, or something like VS Code. Inside, you’ll see sections like [core], [webserver], and [scheduler], each holding settings in a simple key = value format. For example, under [core], you might see executor = SequentialExecutor, setting how tasks run, as explained in Airflow Executors (Sequential, Local, Celery).

How to Edit airflow.cfg

Editing airflow.cfg is straightforward—just open it in your editor, find the section and key you want, and change the value. Say you want tasks to run in parallel instead of one-by-one. Locate [core], find executor = SequentialExecutor, and change it to executor = LocalExecutor. Save the file, then restart the Scheduler (airflow scheduler) and Webserver (airflow webserver -p 8080)—commands you can manage with Airflow CLI: Overview and Usage—so Airflow picks up the change. It’s like adjusting settings on a control panel, and every tweak shapes how Airflow behaves.

Why airflow.cfg Matters

This file is your starting point because it holds Airflow’s defaults—things like where DAGs live (dags_folder = /home/username/airflow/dags) or how often the Scheduler checks them (dag_dir_list_interval = 300). Without touching it, Airflow works fine for basic setups, but editing it lets you optimize for your needs—say, speeding up the Scheduler with Reducing Scheduler Latency or securing the Webserver with Security Best Practices in Airflow. It’s the foundation for customization.

Environment Variables

Environment variables offer another way to configure Airflow, overriding airflow.cfg without touching the file.

How Environment Variables Work

Airflow checks environment variables when it starts, using them to set options dynamically. They follow a naming pattern: AIRFLOW__SECTION__KEY, where SECTION is the airflow.cfg section (like core) and KEY is the setting (like executor). For example, to set the Executor to LocalExecutor, you’d use AIRFLOW__CORE__EXECUTOR=LocalExecutor. On Mac/Linux, in your terminal, type export AIRFLOW__CORE__EXECUTOR=LocalExecutor and press Enter before running airflow scheduler. On Windows, in Command Prompt, type set AIRFLOW__CORE__EXECUTOR=LocalExecutor and press Enter. These stay active until you close the terminal—run Airflow commands in the same session to use them. Learn more in Airflow Environment Variables.

Setting Environment Variables

To make a change stick, add it to your shell profile. On Mac/Linux, open your terminal, type nano ~/.bashrc (or nano ~/.zshrc for Zsh), and press Enter. Scroll to the bottom, add export AIRFLOW__CORE__EXECUTOR=LocalExecutor, save with Ctrl+O then Enter, exit with Ctrl+X, and reload with source ~/.bashrc. On Windows, use setx AIRFLOW__CORE__EXECUTOR "LocalExecutor" in Command Prompt (note setx makes it permanent but doesn’t apply to the current session—reopen Command Prompt to check). Verify with echo $AIRFLOW__CORE__EXECUTOR (Mac/Linux) or echo %AIRFLOW__CORE__EXECUTOR% (Windows)—you’ll see “LocalExecutor.”

Why Environment Variables Matter

They’re handy for quick changes or deployments—like Docker or cloud setups—without editing files. They override airflow.cfg, so you can test settings (e.g., AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL=60) without committing to permanent edits. They’re also key for secrets—use AIRFLOW__DATABASE__SQL_ALCHEMY_CONN to hide database passwords, aligning with Encrypting Sensitive Data in Airflow.

Core Configuration Options

Core options under [core] in airflow.cfg control Airflow’s fundamental behavior.

Executor Settings

The executor setting decides how tasks run—default is SequentialExecutor, running one task at a time. Change it to LocalExecutor for parallelism on your machine, or CeleryExecutor for multiple machines (needs setup in Airflow with Celery Executor). Edit executor = LocalExecutor in [core] or set AIRFLOW__CORE__EXECUTOR=LocalExecutor. This shapes your workflow’s speed—see Airflow Executors (Sequential, Local, Celery).

DAG Folder Location

The dags_folder option points to where your DAGs live—default is ~/airflow/dags. Change it with dags_folder = /custom/path/dags in [core] or AIRFLOW__CORE__DAGS_FOLDER=/custom/path/dags. Airflow scans this for scripts—keep it tidy with DAG File Structure Best Practices.

Parallelism Limits

The parallelism setting caps how many tasks run at once—default is 32. Set parallelism = 16 in [core] or AIRFLOW__CORE__PARALLELISM=16 to limit it. This controls resource use—tune it with Task Concurrency and Parallelism.

Scheduler Configuration Options

The [scheduler] section tweaks how the Scheduler behaves, managing task timing.

DAG Scan Interval

The dag_dir_list_interval setting—default 300 seconds (5 minutes)—tells the Scheduler how often to check the dags folder. Lower it to 60 with dag_dir_list_interval = 60 or AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL=60 for faster updates—optimize with Reducing Scheduler Latency.

Task Concurrency

The max_active_tasks_per_dag—default 16—limits tasks per DAG. Set max_active_tasks_per_dag = 8 or AIRFLOW__SCHEDULER__MAX_ACTIVE_TASKS_PER_DAG=8 to cap it. This balances load—see Task Concurrency and Parallelism.

Scheduler Heartbeat

The scheduler_heartbeat_sec—default 5 seconds—sets how often the Scheduler checks in. Increase to 10 with scheduler_heartbeat_sec = 10 or AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC=10 for less frequent checks—tune it in Introduction to Airflow Scheduling.

Webserver Configuration Options

The [webserver] section customizes the UI, served by the Webserver.

Webserver Port

The webserver_port—default 8080—sets the UI’s port. Change to 8081 with webserver_port = 8081 or AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8081, then run airflow webserver -p 8081. This avoids conflicts—customize further with Customizing Airflow Web UI.

Authentication

The authenticate option—default False—enables login. Set authenticate = True or AIRFLOW__WEBSERVER__AUTHENTICATE=True, then configure users in webserver_config.py—secure it with Airflow Authentication and Authorization.

UI Theme

The default_ui_timezone—default “UTC”—sets the UI’s time display. Change to “America/New_York” with default_ui_timezone = America/New_York or AIRFLOW__WEBSERVER__DEFAULT_UI_TIMEZONE=America/New_York—align it with Time Zones in Airflow Scheduling.

Database Configuration Options

The [database] section (formerly [core] for sql_alchemy_conn) manages the metadata database.

Database Connection String

The sql_alchemy_conn—default sqlite:////home/username/airflow/airflow.db—links to the database. For PostgreSQL, set sql_alchemy_conn = postgresql+psycopg2://user:password@localhost:5432/airflow or AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://user:password@localhost:5432/airflow—details in Airflow Metadata Database Setup.

Connection Pool Size

The sql_alchemy_pool_size—default 5—limits database connections. Increase to 10 with sql_alchemy_pool_size = 10 or AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_SIZE=10 for more load—optimize with Database Performance in Airflow.

Example: Customizing a Configuration

Let’s tweak a few settings. Open airflow.cfg, set executor = LocalExecutor under [core], dag_dir_list_interval = 60 under [scheduler], and webserver_port = 8081 under [webserver]. Save, then restart with:

airflow scheduler
airflow webserver -p 8081

Or use environment variables:

export AIRFLOW__CORE__EXECUTOR=LocalExecutor
export AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL=60
export AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8081
airflow scheduler
airflow webserver -p 8081

This speeds up DAG scans, runs tasks in parallel, and shifts the UI to port 8081—test it at localhost:8081.

FAQ: Common Questions About Airflow Configuration Options

Here are some frequently asked questions about configuring Airflow, with detailed answers to clear up confusion from online forums.

1. Where do I find the airflow.cfg file after installing Airflow?

After running airflow db init—part of Installing Airflow (Local, Docker, Cloud)—Airflow creates airflow.cfg in your Airflow home directory, usually ~/airflow (e.g., /home/username/airflow on Linux/Mac or C:\Users\YourUsername\airflow on Windows). Open your terminal, type cd ~/airflow (or cd %userprofile%\airflow on Windows), and press Enter, then type ls -a (Mac/Linux) or dir (Windows) to see it. If it’s not there, check your AIRFLOW_HOME environment variable with echo $AIRFLOW_HOME (Mac/Linux) or echo %AIRFLOW_HOME% (Windows)—it might point elsewhere, like /custom/path. Set it with export AIRFLOW_HOME=~/airflow or set AIRFLOW_HOME=%userprofile%\airflow if needed.

2. How do I change the default port for the Airflow Webserver?

To shift the Webserver from port 8080, open airflow.cfg with a text editor (e.g., nano ~/airflow/airflow.cfg), find the [webserver] section, and locate webserver_port = 8080. Change it to, say, webserver_port = 8081, save the file (Ctrl+O, Enter, Ctrl+X in nano), and restart the Webserver with airflow webserver -p 8081. Alternatively, set AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8081 in your terminal before running airflow webserver -p 8081. Check localhost:8081 in your browser—it avoids conflicts if 8080’s busy.

3. What’s the difference between editing airflow.cfg and using environment variables?

Editing airflow.cfg makes permanent changes—open it, tweak executor = LocalExecutor under [core], save, and restart Airflow. It’s static until you edit again. Environment variables, like AIRFLOW__CORE__EXECUTOR=LocalExecutor, override airflow.cfg dynamically—set them with export (Mac/Linux) or set (Windows) in your terminal, and they apply only for that session unless added to .bashrc or via setx. Variables are faster for testing or deployments (e.g., Docker), while airflow.cfg suits long-term setups—see Airflow Environment Variables.

4. How do I configure Airflow to use PostgreSQL instead of SQLite?

SQLite is default (sql_alchemy_conn = sqlite:////home/username/airflow/airflow.db), but for PostgreSQL, install it first—on Mac/Linux, sudo apt install postgresql (Ubuntu) or brew install postgresql (Mac); on Windows, download from postgresql.org. Create a database: psql -U postgres, then CREATE DATABASE airflow;. In airflow.cfg, under [database], set sql_alchemy_conn = postgresql+psycopg2://postgres:password@localhost:5432/airflow (replace password with yours), install the driver with pip install apache-airflow[postgres], and run airflow db init. Use AIRFLOW__DATABASE__SQL_ALCHEMY_CONN for environment variables—full steps in Airflow Metadata Database Setup.

5. Can I increase the number of tasks Airflow runs at the same time?

Yes—edit parallelism in [core] of airflow.cfg, default 32. Change to parallelism = 64, save, and restart Airflow. Or set AIRFLOW__CORE__PARALLELISM=64 before running airflow scheduler. This caps total tasks across all DAGs—adjust per-DAG limits with max_active_tasks_per_dag under [scheduler] (default 16), like max_active_tasks_per_dag = 32. Tune it with Task Concurrency and Parallelism.

6. How do I secure sensitive data like database passwords in the configuration?

Don’t hardcode passwords in airflow.cfg—use environment variables. Set AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://user:secret@localhost:5432/airflow in your terminal (or .bashrc), keeping sql_alchemy_conn in airflow.cfg as a placeholder (e.g., sql_alchemy_conn =). Airflow encrypts sensitive fields when using the UI—configure this with Airflow Connections: Setup and Security and Encrypting Sensitive Data in Airflow.

7. What happens if I misconfigure an option in airflow.cfg?

A typo—like excutor = LocalExecutor instead of executor—means Airflow falls back to defaults (e.g., SequentialExecutor). If it’s critical—like a wrong sql_alchemy_conn—Airflow might crash on startup, showing errors like “database connection failed.” Check logs in ~/airflow/logs (via Task Logging and Monitoring), fix the mistake, and restart with airflow scheduler and airflow webserver -p 8080. Test small changes first.


Conclusion

Airflow’s configuration options—whether in airflow.cfg or environment variables—give you the power to shape its behavior, from task execution to UI access. Start with defaults after Installing Airflow (Local, Docker, Cloud), then tweak settings like executor in Airflow Executors (Sequential, Local, Celery) or dag_dir_list_interval in Introduction to Airflow Scheduling. Write DAGs with Defining DAGs in Python, monitor them in Monitoring Task Status in UI, and explore more with Airflow Concepts: DAGs, Tasks, and Workflows!