Assumption: 14‑week semester + 1 week for project presentations/exams.
| Week | Theme | Core Concepts | Lab / Assignment | |------|-------|----------------|-------------------| | 1 | Course Intro & Review of Relational Theory | ER modelling, relational algebra, SQL basics | Mini‑SQL quiz (in‑class) | | 2 | Advanced Normalisation & Physical Design | BCNF, decomposition, indexing, partitioning | Design a normalized schema for a sample e‑commerce dataset | | 3 | Query Optimisation | Cost‑based optimisation, EXPLAIN, statistics | Write and optimise 5 queries; compare plans | | 4 | Transaction Management & Concurrency | ACID, isolation levels, locking, MVCC | Simulate deadlocks in PostgreSQL; resolve them | | 5 | NoSQL Overview | Key‑value, Document, Column‑family, Graph DBs | Implement a simple CRUD app on MongoDB | | 6 | Data Integration Foundations | Schema matching, data cleaning, ETL basics | Clean a noisy CSV using Python/pandas; generate a report | | 7 | Batch Processing with Spark | RDDs, DataFrames, SparkSQL, Catalyst optimiser | Build a Spark job that aggregates click‑stream data | | 8 | Streaming & Real‑Time Ingestion | Kafka fundamentals, Structured Streaming, windowing | Set up a Kafka producer/consumer pair; stream to Spark | | 9 | Data Modelling for Analytics | Star & Snowflake schemas, slowly changing dimensions | Model a sales warehouse; load sample data | |10 | Data Lake & Lakehouse Concepts | Delta Lake, Apache Iceberg, storage formats (Parquet, ORC) | Convert raw JSON logs into a Delta Lake table | |11 | Orchestration & Workflow | Airflow DAGs, task dependencies, retries | Create an Airflow DAG that runs the ETL pipeline from weeks 6‑9 | |12 | Containerisation & CI/CD for Data Pipelines | Docker, Docker‑Compose, GitHub Actions, Helm basics | Containerise the Spark job + Airflow; push to a test registry | |13 | Performance Tuning & Monitoring | Metrics, Prometheus‑Grafana, query‑plan hints | Profile a slow query; apply indexes & partitioning to improve | |14 | Emerging Topics & Future Trends | Cloud‑native warehouses (Snowflake, BigQuery), Data Mesh, ML‑ops | Guest lecture / student‑led lightning talks | |15 | Project Presentations & Final Exam Review | – | Students demo their end‑to‑end pipelines; Q&A | MIDE-400
Flexibility: If your institution splits the semester differently (e.g., 12 weeks), condense weeks 13‑14 into a single “Trends & Review” session and allocate the remaining week for the final exam. Assumption: 14‑week semester + 1 week for project
| Item | Details | |------|---------| | Course Code | MIDE‑400 | | Title (example) | Advanced Data Management & Integration | | Credits | 3 (or 4) semester units | | Prerequisites | MIDE‑200 (Intro to DBMS) or CS‑210 (Data Structures) and CS‑230 (Programming) | | Delivery Mode | Lectures + Lab Sessions + Project | | Target Audience | Upper‑level undergraduates / first‑year graduate students in CS, IS, Data‑Science, Business‑Analytics. | | Core Tools | PostgreSQL / MySQL, Apache Spark, Python (pandas, SQLAlchemy), Docker, Git, Jupyter, Airflow, dbt, Kafka (optional). | | Week | Theme | Core Concepts |
| Topic | Extra Material | |-------|----------------| | Data Mesh | Data Mesh: Delivering Data‑Driven Value at Scale – Zhamak Dehghani (2022). | | SQL on Big Data | Presto / Trino official docs + “The Trino Book” (free PDF). | | Graph Databases | Neo4j Graph Academy (free courses). | | ML‑Ops for Data Pipelines | Machine Learning Engineering – Andriy Burkov (Chapter 7). | | Cloud‑Native Warehouses | Snowflake University (free modules). | | Testing Data Pipelines | Great Expectations tutorial (open‑source data validation). |