Skip to content

Data Flows — ODL Video Service

Generated 2026-06-24 17:02 UTC · c4gen dev

Each scenario below replays one interaction as a C4 Dynamic diagram. Amber steps are asynchronous (queued / scheduled / event-driven).

How to read these diagrams

These are C4 model diagrams (C4-PlantUML). Read them top-down: System Context (the whole SOA) → Container (one system's runtime units) → Dynamic (a single data flow, step by step).

  • People are rounded boxes; systems and containers are rectangles; databases and queues have distinct shapes.
  • Each arrow is a data flow labelled with what moves.
  • Solid arrows are synchronous (request/response, caller blocks).
  • Amber dashed arrows are asynchronous (queued, scheduled, or event-driven — caller does not block).
  • Drag to pan, scroll to zoom. Boxes with a link drill into the next level.

Video upload & transcode (asynchronous)

An admin selects a file via the Dropbox Chooser; Django enqueues a Celery chain. The worker streams the shared link into the S3 upload bucket (under a per-video Redis lock), submits an AWS MediaConvert job, and goes idle. When the job finishes, MediaConvert emits a CloudWatch/EventBridge event to SNS, which POSTs the result to the transcode-jobs webhook; Django creates VideoFile rows, marks the video COMPLETE, and enqueues a status email.

Authenticated playback (synchronous)

A viewer authenticates through Keycloak (federating Touchstone). Django checks KeycloakGroup access, signs CloudFront URLs, and the player streams HLS segments and subtitles directly from CloudFront.

Sync videos to Open edX (asynchronous, cross-service)

The Sync Videos with edX action enqueues post_collection_videos_to_edx. The worker fetches a short-lived JWT (client_credentials) from each collection's EdxEndpoint and POSTs the transcoded HLS/MP4 CloudFront URLs to the edxval API.

MIT Learn ingests OVS videos (asynchronous, cross-service)

MIT Learn's Celery ETL hits the OVS public videos API daily, pages through public non-YouTube videos and their collections, and fetches subtitle VTT files (via CloudFront) as transcripts to index as learning resources.

Ingestion sources (ETL)

Every external source the edx_content / default Celery workers pull from, with transport and cadence. ⚠️ marks brittle linkages (HTML/token scrapes, hardcoded URLs).

Source Transport Cadence Data Source of truth
Dropbox uploads HTTPS shared-link stream on upload source video files cloudsync/tasks.py:233
S3 watch bucket (lecture capture) S3 poll every 15m lecture-capture videos cloudsync/tasks.py:559