AINS graduate course

AINS6006 · Big Data Management for AI Applications

Core sequence · catalog 2026–27

Description

Data platforms, governance, and pipelines that feed reliable training and evaluation datasets for AI workloads.

Course shells on the Castalia LMS are provisioned per license; this link opens the LMS to explore the guest demo or landing experience.

Open Castalia LMS Back to catalog

Buy license Continue on the purchase hub to request a license or institutional quote.

Syllabus outline

  1. Modules 1–2 · Platforms

    • Lakehouse concepts and query engines
    • Batch vs streaming (intro)
    • Schema evolution and contracts
  2. Modules 3–4 · Quality

    • Data validation and anomaly detection
    • Labeling operations and inter-rater reliability
    • Lineage and reproducibility
  3. Modules 5–6 · Scale

    • Partitioning and cost controls
    • Access control patterns
    • Lab: end-to-end pipeline slice