OL Data Platform — Architecture & Data Flows
Generated 2026-06-24 13:33 UTC · c4gen dev
The OL Data Platform is the ingest & ETL backbone of the MIT Open Learning SOA. Dagster (webserver + daemon + per-domain code locations) orchestrates Airbyte and dlt ingestion of every SOA app's Postgres/MySQL/forum database and tracking logs into a raw Iceberg lake on S3; dbt transforms raw → staging → intermediate → marts/dimensional in Trino/Starburst (catalog ol_data_lake_production); Superset reads the marts and OpenMetadata catalogs the warehouse and its lineage. The platform also pushes data back to the apps — today via HMAC-signed content webhooks to MIT Learn. The strategic TARGET (tagged target below) is to relocate heavy ETL that currently runs inside each app's Celery workers (edX sync, catalog ingest, certificate generation, CRM sync) onto this platform.
This is a C4 view of OL Data Platform within the MIT Open Learning SOA, focused on how data is created and propagated — synchronous request paths and asynchronous (queued, scheduled, event-driven) flows alike. Use it for onboarding and as a holistic reference when realigning flows or hunting harmful cycles and fragile linkages.
How to read these diagrams
These are C4 model diagrams (C4-PlantUML). Read them top-down: System Context (the whole SOA) → Container (one system's runtime units) → Dynamic (a single data flow, step by step).
- People are rounded boxes; systems and containers are rectangles; databases and queues have distinct shapes.
- Each arrow is a data flow labelled with what moves.
- Solid arrows are synchronous (request/response, caller blocks).
- Amber dashed arrows are asynchronous (queued, scheduled, or event-driven — caller does not block).
- Drag to pan, scroll to zoom. Boxes with a link drill into the next level.
Contents
- System Context — OL Data Platform and the systems it exchanges data with.
- Containers — the runtime units inside OL Data Platform.
- Data Flows — key interactions, step by step (sync & async).
- Dependencies & Cycles — graph-derived coupling, cycles, fragile links.
Keeping this current
These pages are generated from a structured model by
architecture_maps/c4gen. The cross-service edges are extracted deterministically
from the witan-code graph; node prose and scenarios are curated. See
the generator README
to regenerate after the system changes.