Skip to content

Data Flows — MITx Online

Generated 2026-06-24 16:33 UTC · c4gen dev

Each scenario below replays one interaction as a C4 Dynamic diagram. Amber steps are asynchronous (queued / scheduled / event-driven).

How to read these diagrams

These are C4 model diagrams (C4-PlantUML). Read them top-down: System Context (the whole SOA) → Container (one system's runtime units) → Dynamic (a single data flow, step by step).

  • People are rounded boxes; systems and containers are rectangles; databases and queues have distinct shapes.
  • Each arrow is a data flow labelled with what moves.
  • Solid arrows are synchronous (request/response, caller blocks).
  • Amber dashed arrows are asynchronous (queued, scheduled, or event-driven — caller does not block).
  • Drag to pan, scroll to zoom. Boxes with a link drill into the next level.

Learner enrollment & paid checkout (synchronous)

A learner browses the catalog and enrolls. APISIX authenticates via Keycloak and proxies to Django, which records the order, starts a CyberSource payment, and on the signed callback fulfills the order and pushes the enrollment to Open edX.

Open edX enrollment & grade sync (mixed)

Open edX posts enrollment/certificate webhooks back to Django; a periodic Celery beat task retries failed enrollments and repairs faulty edX users, and the certificate jobs read grades from edX to generate certificates in Postgres.

CRM, identity & sheets sync (asynchronous)

Celery beat fans out to external services: HubSpot contact/product/deal sync, Keycloak B2B organization reconciliation, and Google Sheets refund/deferral and B2B enrollment-code processing. SCIM provisioning arrives inbound from Keycloak.

MIT Learn catalog ingestion (cross-service, asynchronous)

MIT Learn's edx_content ETL worker pulls the live MITx Online course and program catalog from the v2 API every 6 hours and fetches S3 content archives, indexing them as learning resources. MITx Online is the catalog source-of-truth.

Ingestion sources (ETL)

Every external source the edx_content / default Celery workers pull from, with transport and cadence. ⚠️ marks brittle linkages (HTML/token scrapes, hardcoded URLs).

Source Transport Cadence Data Source of truth
Google Sheets (refunds/deferrals) Google Sheets API every ~hour refund/deferral requests sheets/tasks.py
Open Exchange Rates REST daily 03:00 currency exchange rates flexiblepricing/tasks.py:57
Open edX (certificate webhook) HTTPS webhook on event certificate events openedx/views.py:128
Open edX (enrollment webhook) HTTPS webhook on event enrollment events openedx/views.py:34
Open edX (grades) REST OAuth2 on enroll / scheduled course current grades openedx/api.py:1096
Unified Ecommerce (product metadata) REST on request product meta flexiblepricing/api.py:458