Data Flows — MITx Online
Generated 2026-06-24 16:33 UTC · c4gen dev
Each scenario below replays one interaction as a C4 Dynamic diagram. Amber steps are asynchronous (queued / scheduled / event-driven).
How to read these diagrams
These are C4 model diagrams (C4-PlantUML). Read them top-down: System Context (the whole SOA) → Container (one system's runtime units) → Dynamic (a single data flow, step by step).
- People are rounded boxes; systems and containers are rectangles; databases and queues have distinct shapes.
- Each arrow is a data flow labelled with what moves.
- Solid arrows are synchronous (request/response, caller blocks).
- Amber dashed arrows are asynchronous (queued, scheduled, or event-driven — caller does not block).
- Drag to pan, scroll to zoom. Boxes with a link drill into the next level.
Learner enrollment & paid checkout (synchronous)
A learner browses the catalog and enrolls. APISIX authenticates via Keycloak and proxies to Django, which records the order, starts a CyberSource payment, and on the signed callback fulfills the order and pushes the enrollment to Open edX.
Open edX enrollment & grade sync (mixed)
Open edX posts enrollment/certificate webhooks back to Django; a periodic Celery beat task retries failed enrollments and repairs faulty edX users, and the certificate jobs read grades from edX to generate certificates in Postgres.
CRM, identity & sheets sync (asynchronous)
Celery beat fans out to external services: HubSpot contact/product/deal sync, Keycloak B2B organization reconciliation, and Google Sheets refund/deferral and B2B enrollment-code processing. SCIM provisioning arrives inbound from Keycloak.
MIT Learn catalog ingestion (cross-service, asynchronous)
MIT Learn's edx_content ETL worker pulls the live MITx Online course and program catalog from the v2 API every 6 hours and fetches S3 content archives, indexing them as learning resources. MITx Online is the catalog source-of-truth.
Ingestion sources (ETL)
Every external source the edx_content / default Celery workers pull from, with transport and cadence. ⚠️ marks brittle linkages (HTML/token scrapes, hardcoded URLs).
| Source | Transport | Cadence | Data | Source of truth |
|---|---|---|---|---|
| Google Sheets (refunds/deferrals) | Google Sheets API | every ~hour | refund/deferral requests | sheets/tasks.py |
| Open Exchange Rates | REST | daily 03:00 | currency exchange rates | flexiblepricing/tasks.py:57 |
| Open edX (certificate webhook) | HTTPS webhook | on event | certificate events | openedx/views.py:128 |
| Open edX (enrollment webhook) | HTTPS webhook | on event | enrollment events | openedx/views.py:34 |
| Open edX (grades) | REST OAuth2 | on enroll / scheduled | course current grades | openedx/api.py:1096 |
| Unified Ecommerce (product metadata) | REST | on request | product meta | flexiblepricing/api.py:458 |