Skip to content

System Context — OL Data Platform

Generated 2026-06-24 13:33 UTC · c4gen dev

The widest view: OL Data Platform and every external actor and system it exchanges data with. Edges shown are curated and code-verified; raw graph-derived candidates are listed under Dependencies & Cycles.

Interactive

Drag to pan, scroll to zoom. Click the OL Data Platform box to drill into its container view.

External systems & peers

System Role
HashiCorp Vault Source-DB and SaaS credentials; every Dagster code location authenticates at startup.
MIT Learn Discovery platform. Its Postgres is ingested into raw; the platform POSTs HMAC-signed content/OVS webhooks back into its Django API to trigger ingest.
MITx Online Course/enrollment platform. App Postgres + its Open edX MySQL/forum are ingested. Heavy edX-sync/certificate ETL still runs in its own Celery workers.
MITx Pro (xPRO) App Postgres + Open edX MySQL/forum ingested into raw.
MicroMasters App Postgres ingested into raw (courses, programs, certificates).
OCW Studio App Postgres ingested into raw; OCW site JSON also flows via S3.
ODL Video Service Video/transcript metadata ingested via API; OVS webhooks pushed back.
Bootcamps Bootcamps app Postgres ingested into raw.
edX.org Archives & BigQuery edX.org course tarballs/tracking logs from GCS/S3 and Emeritus/IRX BigQuery exports — landed via dlt and Airbyte.
SaaS Sources (Salesforce / Mailgun / feeds) Salesforce and Mailgun via Airbyte; MIT Climate, MIT Professional Ed, Open Learning Library, and podcast RSS via dlt.
Hightouch Third-party reverse-ETL SaaS. Reads curated models from the warehouse (Starburst) and syncs rows into operational systems — notably writing ProgramCertificate records into MIT Learn's Postgres. Operated outside this repo (no Hightouch code lives here); it connects to the warehouse as an external consumer.