Skip to content

DataExpert-io-Community/skyII

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

skylogix Transportation

Weather Data Ingestion & Analytics Pipeline

Overview

This project implements an end-to-end data pipeline for ingesting, staging, replicating, and modeling weather API data for analytics and dashboarding use cases.

The pipeline uses MongoDB as a staging database, a Python ingestion service for polling external weather APIs, Airbyte (Docker) for data replication, and DuckDB as the analytical warehouse.

architecture diagram

Python Weather Ingestion Script

The ingestion service:

  • Periodically polls one or more weather APIs
  • Normalizes API responses
  • Performs upserts into MongoDB to avoid duplicates
  • Tracks record freshness using an updatedAt timestamp

Key Behaviors

  • Idempotent
  • Handles updates to existing weather observations
  • Designed to be schedulable (cron / Airflow-ready)

MongoDB Staging Layer

MongoDB acts as a raw but structured staging layer, optimized for:

  • Flexible schema evolution from APIs
  • Fast upserts
  • Reliable CDC-style replication via Airbyte

Document Structure:

  • document structure

Future Improvements

  • Orchestrate ingestion and syncs with Airflow
  • Add data quality checks (freshness, nulls, anomalies)
  • Partition analytical tables by date
  • Extend to multiple weather API providers
  • Publish dashboards using Metabase or Superset

Author

Taofeecoh Adesanu Data Engineer

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.5%
  • Shell 4.5%