Overview
An Apache Airflow DAG that automates daily data ingestion from Amazon S3 into Snowflake, with real-time Slack notifications for pipeline monitoring. Built with production-grade patterns including externalized configuration, failure callbacks, and modern Airflow 2.x providers.
Architecture
The pipeline follows a three-stage pattern:
1. Sense — An S3KeySensor polls the target bucket every 30 minutes (up to 12 hours) waiting for the expected data file to land.
2. Load — Executes Snowflake’s COPY INTO command to bulk-load the CSV directly from S3 into the target table.
3. Notify — Posts a structured success message to Slack via webhook. Any task failure triggers an immediate alert with DAG and task context.
Tech Stack
- Orchestration: Apache Airflow 2.x
- Data Warehouse: Snowflake
- Cloud Storage: Amazon S3
- Alerting: Slack (webhook + API)
- Language: Python