Data Flows — MITx Pro
Generated 2026-06-24 16:33 UTC · c4gen dev
Each scenario below replays one interaction as a C4 Dynamic diagram. Amber steps are asynchronous (queued / scheduled / event-driven).
How to read these diagrams
These are C4 model diagrams (C4-PlantUML). Read them top-down: System Context (the whole SOA) → Container (one system's runtime units) → Dynamic (a single data flow, step by step).
- People are rounded boxes; systems and containers are rectangles; databases and queues have distinct shapes.
- Each arrow is a data flow labelled with what moves.
- Solid arrows are synchronous (request/response, caller blocks).
- Amber dashed arrows are asynchronous (queued, scheduled, or event-driven — caller does not block).
- Drag to pan, scroll to zoom. Boxes with a link drill into the next level.
Paid enrollment & checkout (synchronous)
A learner pays through CyberSource Secure Acceptance. xPRO signs a payload and redirects the browser to CyberSource; CyberSource posts a signed result back to xPRO's OrderFulfillmentView, which fulfills the order, enrolls the learner in Open edX, syncs the deal to HubSpot, and emails a receipt.
Open edX SSO & user provisioning (synchronous)
xPRO authenticates learners through Open edX via social-auth. On first login/signup xPRO creates the corresponding edX user and an OpenEdxApiAuth access token, used later for enrollment and grade reads.
Courseware sync & certificates (asynchronous)
RedBeat-scheduled Celery tasks repair failed enrollments and faulty edX users, sync course-run data and grades from Open edX, and generate course certificates. External vendor courses (Emeritus / Global Alumni) are synced from report APIs on a daily cron.
Google Sheets coupon/refund/deferral ops (asynchronous)
Staff manage coupon assignment, refund, and deferral requests in Google Sheets. Drive push notifications and a polling beat task make Celery read the sheets via a service account, apply coupons/enrollments, email bulk enrollment codes, and write status back.
MIT Learn catalog ingestion (cross-service, asynchronous)
MIT Learn's ETL pulls xPRO's course/program catalog from the public REST API and content files from S3 to surface xPRO offerings in discovery. This is a one-directional pull owned by MIT Learn's Celery scheduler.
Ingestion sources (ETL)
Every external source the edx_content / default Celery workers pull from, with transport and cadence. ⚠️ marks brittle linkages (HTML/token scrapes, hardcoded URLs).
| Source | Transport | Cadence | Data | Source of truth |
|---|---|---|---|---|
| Emeritus (external courses) | REST report API | daily cron | external course/run batch reports | courses/sync_external_courses/external_course_sync_api.py |
| Global Alumni (external courses) | REST report API | daily cron | external course/run batch reports | courses/sync_external_courses/external_course_sync_api.py |
| ⚠️ Google Sheets (coupon/refund/deferral) | Sheets/Drive API + push webhook | every SHEETS_MONITORING_FREQUENCY | coupon assignment, refund, deferral requests | sheets/tasks.py |
| Open edX course-run data | REST (edx-api-client) | daily cron | course-run titles & dates | courses/tasks.py:109 |
| Open edX grades | REST (edx-api-client) | daily cron | course-run grades for certificates | courses/tasks.py:31 |