Airflow Logging Configuration: A Comprehensive Guide
Apache Airflow is a powerful platform for orchestrating workflows, and configuring its logging system effectively ensures that you can monitor, debug, and audit the execution of Directed Acyclic Graphs (DAGs) and tasks with precision and reliability. Whether you’re running tasks with PythonOperator, sending notifications via SlackOperator, or integrating with systems like Airflow with Snowflake, a well-configured logging setup is essential for production-grade deployments. This comprehensive guide, hosted on SparkCodeHub, explores Airflow Logging Configuration—how to set it up, how to customize it, and best practices for optimal logging. We’ll provide detailed step-by-step instructions, practical examples with code, and an extensive FAQ section. For foundational knowledge, start with Airflow Web UI Overview and pair this with Defining DAGs in Python.
What is Airflow Logging Configuration?
Airflow Logging Configuration refers to the process of setting up and customizing the logging system within an Airflow deployment—rooted in the ~/airflow directory (DAG File Structure Best Practices)—to capture, store, and manage execution logs for DAGs, tasks, and system components like the Scheduler, Webserver, and Executor. Managed by Airflow’s core architecture (Airflow Architecture (Scheduler, Webserver, Executor)), logging configuration defines log levels, formats, storage locations, and remote logging options (e.g., to cloud services), with task states tracked in the metadata database (airflow.db) and logs accessible via the Web UI (Monitoring Task Status in UI) or centralized storage (Task Logging and Monitoring). This configuration enhances observability, making it a critical practice for production-grade Airflow deployments managing complex, high-stakes workflows.
Core Components in Detail
Airflow Logging Configuration relies on several core components, each with specific roles and configurable parameters. Below, we explore these components in depth, including their functionality, parameters, and practical code examples.
1. Log Levels and Formatting: Controlling Log Detail
Configuring log levels and formatting determines the verbosity and structure of log messages, allowing you to balance detail with performance and readability.
- Key Functionality: Sets verbosity—e.g., INFO vs. DEBUG—and format—e.g., timestamped logs—for actionable insights—e.g., debugging errors.
- Parameters (in airflow.cfg under [logging]):
- logging_level (str): Log level (e.g., "INFO")—defines detail.
- log_format (str): Log format (e.g., "[%(asctime)s] %(levelname)s - %(message)s")—customizes output.
- Code Example (Log Levels and Format):
# airflow.cfg
[logging]
logging_level = INFO
log_format = [%(asctime)s] %(levelname)s - %(name)s - %(message)s
- DAG Example (Using Logging):
# dags/log_level_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import logging
def log_task():
logging.debug("Debug message")
logging.info("Info message")
logging.warning("Warning message")
print("Task executed")
with DAG(
dag_id="log_level_dag",
start_date=datetime(2025, 4, 1),
schedule_interval="@daily",
catchup=False,
) as dag:
task = PythonOperator(
task_id="log_task",
python_callable=log_task,
)
This configures INFO logging, capturing info and warning messages from log_task.
2. Local Log Storage: Centralizing Log Files
Local log storage defines where Airflow writes logs on the filesystem, centralizing them for easy access and review, typically in the ~/airflow/logs directory.
- Key Functionality: Stores logs—e.g., task execution—in base_log_folder—e.g., logs/—for local access—e.g., debugging via files.
- Parameters (in airflow.cfg under [logging]):
- base_log_folder (str): Log directory (e.g., "/home/user/airflow/logs")—storage path.
- filename_template (str): Log file format (e.g., "{ { ti.dag_id } }/{ { ti.task_id } }/{ { ts } }.log")—file naming.
- Code Example (Local Log Storage):
# airflow.cfg
[logging]
base_log_folder = /home/user/airflow/logs
filename_template = { { ti.dag_id } }/{ { ti.task_id } }/{ { ts } }/{ { try_number } }.log
- DAG Example (Logging to Local Storage):
# dags/local_log_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import logging
def local_log_task():
logging.info("Task logged locally")
print("Task executed")
with DAG(
dag_id="local_log_dag",
start_date=datetime(2025, 4, 1),
schedule_interval="@daily",
catchup=False,
) as dag:
task = PythonOperator(
task_id="local_log_task",
python_callable=local_log_task,
)
This logs local_log_task to ~/airflow/logs/local_log_dag/local_log_task/....
3. Remote Logging: Sending Logs to External Systems
Remote logging configures Airflow to send logs to external systems (e.g., AWS S3, Elasticsearch) for centralized storage, analysis, and long-term retention, enhancing scalability and accessibility.
- Key Functionality: Sends logs—e.g., to S3—via remote handlers—e.g., S3TaskHandler—for centralized management—e.g., cloud analysis.
- Parameters (in airflow.cfg under [logging]):
- remote_logging (bool): Enables remote (e.g., True)—activates handler.
- remote_log_conn_id (str): Connection ID (e.g., "s3_log_conn")—remote target.
- remote_base_log_folder (str): Remote path (e.g., "s3://my-bucket/logs")—storage location.
- Code Example (Remote Logging to S3):
# airflow.cfg
[logging]
remote_logging = True
remote_log_conn_id = s3_log_conn
remote_base_log_folder = s3://my-bucket/logs
- Connection Setup (CLI):
airflow connections add \
--conn-id "s3_log_conn" \
--conn-type "s3" \
--conn-extra '{"aws_access_key_id": "your_access_key", "aws_secret_access_key": "your_secret_key"}'
- DAG Example (Remote Logging):
# dags/remote_log_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import logging
def remote_log_task():
logging.info("Task logged remotely to S3")
print("Task executed")
with DAG(
dag_id="remote_log_dag",
start_date=datetime(2025, 4, 1),
schedule_interval="@daily",
catchup=False,
) as dag:
task = PythonOperator(
task_id="remote_log_task",
python_callable=remote_log_task,
)
This configures remote_log_dag to log to S3.
4. Custom Logging Handlers: Extending Log Capabilities
Custom logging handlers extend Airflow’s logging by integrating with external systems or adding custom formatting, offering flexibility beyond built-in options.
- Key Functionality: Adds handlers—e.g., custom file handler—to log—e.g., with extra fields—for tailored output—e.g., to specific files.
- Parameters (Python Logging):
- logging.handlers: Custom handler (e.g., FileHandler)—defines output.
- Code Example (Custom Handler):
# plugins/custom_logging.py
import logging
from airflow.plugins_manager import AirflowPlugin
from airflow.utils.log.logging_mixin import LoggingMixin
class CustomFileHandler(logging.handlers.RotatingFileHandler):
def __init__(self, filename):
super().__init__(filename, maxBytes=1048576, backupCount=5)
def emit(self, record):
record.custom_field = "AirflowCustom"
super().emit(record)
class CustomLoggingPlugin(AirflowPlugin):
name = "custom_logging_plugin"
def on_load(self, *args, **kwargs):
handler = CustomFileHandler("/home/user/airflow/logs/custom.log")
handler.setLevel(logging.INFO)
formatter = logging.Formatter("[%(asctime)s] %(levelname)s - %(custom_field)s - %(message)s")
handler.setFormatter(formatter)
LoggingMixin().log.addHandler(handler)
# dags/custom_log_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import logging
def custom_log_task():
logging.info("Task with custom logging")
print("Task executed")
with DAG(
dag_id="custom_log_dag",
start_date=datetime(2025, 4, 1),
schedule_interval="@daily",
catchup=False,
) as dag:
task = PythonOperator(
task_id="custom_log_task",
python_callable=custom_log_task,
)
- Directory Structure:
airflow/
├── dags/
│ └── custom_log_dag.py
├── plugins/
│ └── custom_logging.py
This adds a custom rotating file handler for custom_log_dag.
Key Parameters for Airflow Logging Configuration
Key parameters in logging configuration:
- logging_level: Log detail (e.g., "INFO")—sets verbosity.
- base_log_folder: Local log path (e.g., "/home/user/airflow/logs")—storage dir.
- remote_logging: Remote toggle (e.g., True)—enables external logging.
- remote_log_conn_id: Connection ID (e.g., "s3_log_conn")—remote target.
- log_format: Format string (e.g., "[%(asctime)s] %(levelname)s - %(message)s")—log structure.
These parameters manage logging.
Setting Up Airflow Logging Configuration: Step-by-Step Guide
Let’s configure Airflow with a comprehensive logging setup, testing with a sample DAG.
Step 1: Set Up Your Airflow Environment
- Install Docker: Install Docker Desktop—e.g., on macOS: brew install docker. Start Docker and verify: docker --version.
- Install Airflow with S3: Open your terminal, navigate to your home directory (cd ~), and create a virtual environment (python -m venv airflow_env). Activate it—source airflow_env/bin/activate on Mac/Linux or airflow_env\Scripts\activate on Windows—then install Airflow (pip install "apache-airflow[postgres,s3]>=2.0.0").
- Set Up PostgreSQL: Start PostgreSQL:
docker run -d -p 5432:5432 -e POSTGRES_USER=airflow -e POSTGRES_PASSWORD=airflow -e POSTGRES_DB=airflow --name postgres postgres:13
- Create Structure: Run:
mkdir -p ~/airflow/{dags,plugins,logs}
- Configure Airflow: Edit ~/airflow/airflow.cfg:
[core]
executor = LocalExecutor
dags_folder = /home/user/airflow/dags
plugins_folder = /home/user/airflow/plugins
[database]
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@localhost:5432/airflow
[webserver]
web_server_host = 0.0.0.0
web_server_port = 8080
[logging]
logging_level = INFO
base_log_folder = /home/user/airflow/logs
log_format = [%(asctime)s] %(levelname)s - %(name)s - %(message)s
filename_template = { { ti.dag_id } }/{ { ti.task_id } }/{ { ts } }/{ { try_number } }.log
remote_logging = True
remote_log_conn_id = s3_log_conn
remote_base_log_folder = s3://my-bucket/logs
Replace /home/user with your actual home directory and configure an S3 bucket (my-bucket). 6. Add S3 Connection: Run:
airflow connections add \
--conn-id "s3_log_conn" \
--conn-type "s3" \
--conn-extra '{"aws_access_key_id": "your_access_key", "aws_secret_access_key": "your_secret_key"}'
- Initialize the Database: Run airflow db init.
- Start Airflow Services: In separate terminals:
- airflow webserver -p 8080
- airflow scheduler
Step 2: Add Custom Logging Plugin
- Create Plugin: Add ~/airflow/plugins/custom_logging.py:
import logging
from airflow.plugins_manager import AirflowPlugin
from airflow.utils.log.logging_mixin import LoggingMixin
class CustomFileHandler(logging.handlers.RotatingFileHandler):
def __init__(self, filename):
super().__init__(filename, maxBytes=1048576, backupCount=5)
def emit(self, record):
record.custom_field = "AirflowCustom"
super().emit(record)
class CustomLoggingPlugin(AirflowPlugin):
name = "custom_logging_plugin"
def on_load(self, *args, **kwargs):
handler = CustomFileHandler("/home/user/airflow/logs/custom.log")
handler.setLevel(logging.INFO)
formatter = logging.Formatter("[%(asctime)s] %(levelname)s - %(custom_field)s - %(message)s")
handler.setFormatter(formatter)
LoggingMixin().log.addHandler(handler)
Step 3: Create a DAG with Logging
- Open a Text Editor: Use Visual Studio Code or any plain-text editor—ensure .py output.
- Write the DAG Script: Create ~/airflow/dags/log_test_dag.py:
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import logging
def log_test_task():
logging.debug("Debug message (not logged)")
logging.info("Info message")
logging.warning("Warning message")
print("Task executed")
with DAG(
dag_id="log_test_dag",
start_date=datetime(2025, 4, 1),
schedule_interval="@daily",
catchup=False,
) as dag:
task = PythonOperator(
task_id="log_test_task",
python_callable=log_test_task,
)
Step 4: Test and Monitor Logging Configuration
- Access Web UI: Go to localhost:8080—verify log_test_dag appears.
- Trigger the DAG: In Graph View, toggle “log_test_dag” to “On,” click “Trigger DAG” for April 7, 2025. Monitor:
- log_test_task executes, logging to local and S3.
3. Check Local Logs: In ~/airflow/logs/log_test_dag/log_test_task/...:
- See “Info message” and “Warning message” (not DEBUG due to INFO level).
- In ~/airflow/logs/custom.log: See custom-formatted logs (e.g., [2025-04-07...] INFO - AirflowCustom - Info message).
4. Check S3 Logs: Access s3://my-bucket/logs/log_test_dag/log_test_task/...—verify logs match local output. 5. Verify Logs in UI: In Graph View, click log_test_task > “Log”—see logged messages. 6. Optimize Logging:
- Change logging_level to DEBUG, restart Scheduler—verify debug logs appear.
- Add a second task, re-trigger—check log separation.
7. Retry DAG: If logging fails (e.g., S3 access error), fix s3_log_conn, click “Clear,” and retry.
This tests a comprehensive logging setup with local, remote, and custom logging.
Key Features of Airflow Logging Configuration
Airflow Logging Configuration offers powerful features, detailed below.
Flexible Log Levels
logging_level—e.g., INFO—adjusts detail—e.g., info/warning—balancing insight and performance.
Example: Level Flex
log_level_dag—logs INFO and above.
Centralized Local Logs
base_log_folder—e.g., logs/—centralizes logs—e.g., task files—for easy access.
Example: Local Central
local_log_dag—logs to logs/.
Scalable Remote Logging
remote_logging—e.g., to S3—scales storage—e.g., cloud access—for analysis.
Example: Remote Scale
remote_log_dag—logs to S3 bucket.
Customizable Log Handling
Custom handlers—e.g., CustomFileHandler—extend logging—e.g., with fields—for flexibility.
Example: Custom Handle
custom_log_dag—logs with custom_field.
Robust Monitoring Support
Logging config—e.g., [logging]—supports monitoring—e.g., via UI—for observability.
Example: Monitor Support
log_test_dag—logs visible in UI.
Best Practices for Airflow Logging Configuration
Optimize logging with these detailed guidelines:
- Set Appropriate Levels: Use INFO—e.g., for production—adjust verbosity—test levels Airflow Configuration Basics.
- Test Logging: Run DAGs—e.g., log_test_dag—verify output DAG Testing with Python.
- Centralize Locally: Set base_log_folder—e.g., logs/—organize files—log storage Airflow Performance Tuning.
- Use Remote Logging: Enable remote_logging—e.g., to S3—scale logs—log remote Airflow Pools: Resource Management.
- Monitor Logs: Check UI, files—e.g., task logs—adjust configs Airflow Graph View Explained.
- Extend with Handlers: Add customs—e.g., CustomFileHandler—enhance logs—log custom Task Logging and Monitoring.
- Document Logging: List configs—e.g., in a README—for clarity DAG File Structure Best Practices.
- Handle Time Zones: Align log_format—e.g., with timezone—for accuracy Time Zones in Airflow Scheduling.
These practices ensure effective logging.
FAQ: Common Questions About Airflow Logging Configuration
Here’s an expanded set of answers to frequent questions from Airflow users.
1. Why aren’t my logs appearing?
Wrong base_log_folder—set to logs/—check path.
2. How do I debug logging issues?
Check Scheduler logs—e.g., “Handler error”—verify configs.
3. Why use remote logging?
Scalability—e.g., S3 storage—test remote.
4. How do I add custom log fields?
Use handlers—e.g., CustomFileHandler—log fields.
5. Can logging scale across instances?
Yes—with shared storage—e.g., S3 sync.
6. Why are my debug logs missing?
logging_level too high—set to DEBUG—check UI.
7. How do I monitor log performance?
Use logs, metrics—e.g., log volume—or Prometheus—e.g., log_write_time.
8. Can logging trigger a DAG?
Yes—use a sensor with log check—e.g., if log_event_detected().
Conclusion
Airflow Logging Configuration enhances workflow observability—set it up with Installing Airflow (Local, Docker, Cloud), craft DAGs via Defining DAGs in Python, and monitor with Airflow Graph View Explained. Explore more with Airflow Concepts: DAGs, Tasks, and Workflows and Security Best Practices in Airflow!