SWEN 432 (2019): Advanced Database Design and Implementation

This page will be continually updated with SWEN432 lecture notes as they come to hand. If you find errors or want to make comments on the content, email me, Pavle.Mogin@ecs.vuw.ac.nz

In lectures, I have already handed out, or I will hand out lecture notes for:

  • Advanced Database Design and Implementation - Course Overview and you can find them as a pptx file, and as a pdf file. These slides introduce the course, list main areas to be covered, explain how the students will be assessed. This material is covered in the course outline and related documents, which are given in Course Outline.

  • Exam and Lecture Overview and you can find them here as a Power Point file, and here as a .pdf file.

Data Warehouse

  • OLAP and Data Warehouse Architectures, and you can find them as a Power Point file here, and as a pdf file here. These lecture notes consider basic OLAP and Data Warehouse architectures, the common sources of Data Warehouse failures, and a Data Warehouse design methodology. There are the following common sources of Data Warehouse failures considered:
    • Missing business drivers,
    • Use of wrong architecture,
    • Use of dirty data, and
    • Top - down development.

  • Populating a Data Warehouse, and you can find them as a Power Point file here, and as a pdf file here.

  • Query Rewriting, and you can find them as a Power Point file here, and as a pdf file here. These lecture notes introduce:
    • Two query rewrite methods:
    • Text match, and
    • General query rewrite with:
      • Join compatibility check,
      • Data sufficiency check,
      • Grouping compatibility check, and
      • Aggregate computability check.
    • A number of examples illustrate the Query Rewriting procedure.

  • Aggregate Functions, and you can find them as a Power Point file here, and a pdf file here. These lecture notes introduce:
    • Distributive,
    • Algebraic, and
    • Holistic
      aggregates.

  • Dimension Hierarchies, and you can find them as a Power Point file here, and a pdf file here. These lecture notes introduce:
    • A syntax for defining Attribute Hierarchies in ROLAP,
    • Functional dependencies among dimension attributes, and
    • An analysis of the impact of database constraints on query rewriting.

  • Materialized Views, and you can find them as a a Power Point file here, and as a pdf file here. These lecture notes introduce:
    • Motives for using materialized views in OLAP queries,
    • Examples of using materialized views.
      This material is covered in S. Chaudhuri, U. Dayal: An Overview of Datawarehousing and OLAP Technology here.

  • OLAP Queries and SQL1999, and you can find them as a Power Point file here, and as a pdf file here. These lecture notes introduce OLAP specific extensions of SQL:
    • CUBE,
    • ROLLUP,
    • WINDOW, and
    • RANK

  • Basic OLAP Queries, and you can find them here, as a Power Point file , and as a pdf file here. These lecture notes introduce OLAP specific queries:
    • Roll-up and Drill-down,
    • Slice and Dice, and
    • Pivoting.

  • OLAP Database Structures, and you can find them as a Power Point file here, and a pdf file here. These Lecture Notes introduce the star schema and discuss the following components of a star schema:
    • Facts,
    • Dimensions,
    • Attributes and attribute hierarchies,
    • Snowflakes, and
    • Constellations.

* Introduction to Data Warehouse, and you can find them as a Power Point file here, and a pdf file here. These lecture notes introduce Decision Support Systems, their structure and their database called Data Warehouse. Further, the lecture slides discuss the basic features of a Data Warehouse: time interval, granularity, dimensionality, integration, orientation towards subjects, dependency on time, and non volatility.
    • S. Chaudhuri, U. Dayal: An Overview of Datawarehousing and OLAP Technology here.

MongoDB

  • MongoDB Distributed Reads and Writes, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes consider:
    • Writes on a Sharded Cluster,
    • Writes on a Replica Set,
    • Write Concern,
    • Distributed Queries, and
    • MongoDB and Transaction Processing

  • MongoDB Architecture, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes consider:
    • Sharding:
      • Shard Key,
      • Sharding Architecture,
      • Balancing Data Distribution,
    • Replication:
      • Master - Slave Mode,
      • Replica Set Operation,
      • Failover - Election of a new Master

  • MongoDB Aggregation, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes consider
    • Pipelined Aggregation:
      • Aggregation Stages,
      • Stage Expressions,
    • Examples

  • MongoDB Arrays, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes consider:
    • Using arrays in query selection criteria,
    • Array projection operators,
    • Updating arrays

  • MongoDB Write Operations, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes consider the following MongoDB operations:
    • Insert,
    • Update, and
    • Delete

  • MongoDB Read Operations, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes introduce MongoDB db.collection.find() method and consider:
    • Query selection criteria,
    • Projections,
    • Query Result Processing, and
    • Cursor

  • MongoDB Data Modeling, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes introduce MongoDB NoSQL DBMS and consider:
    • Data Model,
    • Document Structure,
    • _id field,
    • Embedded data,
    • Referencing, and
    • Indexing

Cassandra

  • Cassandra Repair Mechanisms, and you can find them here as a .ppt file and here as a .pdf file. These lecture notes consider:
    • Hinted Handoff Writes
      • How Hinted Handoff works
      • Hinted Handoff and the Consistency Level
    • Anti-entropy Node Repair
      • Merkle Trees
      • How node repair works

  • Cassandra Architecture, and you can find them here as a .ppt file and here as a .pdf file. These lecture notes consider:
    • Internode Communication
    • Data Distribution and Replication
    • Partitioning
    • Snitches
    • CREATE KEYSPACE statement

  • Cassandra Tools and Files, and you can find them here as a .ppt file and here as a .pdf file. These lecture notes consider:
    • Cassandra Cluster Manager (CCM)
    • Cassandra Files
    • cqlsh Commands
    • Node Tool

  • Cassandra Consistency Levels (Read), and you can find them here as a .ppt file and here as a .pdf file. These lecture notes consider:
    • Clusters, Data Centers, and Racks
    • Configuring Data Consistency
    • Read Requests
    • Read Consistency Levels

  • Cassandra Consistency Levels (Write), and you can find them here as a .ppt file and here as a .pdf file. These lecture notes consider:
    • Configuring Data Consistency
    • Write Requests
    • Write Consistency Levels

  • Cassandra Storage Engine, and you can find them here as a .ppt file and here as a .pdf file. These lecture notes consider:
    • Table Primary Key and Partitioning
    • Storage Engine Rows
    • Log Structured Merge Trees
    • Write Paths for Insert and Update
    • About Reads
    • About Deletes
    • Compaction

  • Cassandra CQL Queries, and you can find them here as a .pptx file and here as a .pdf file. These lecture notes consider:
    • The Syntax of the SELECT Statement
    • Simple SELECT expressions
    • Filtering Data using WHERE Clause
    • Using Indexes
    • Filtering Collections
    • Querying Tables with Columns of the counter Type
    • Keyspace Design Heuristics

  • Cassandra Data Model, and you can find them here as a pptx file and here as a pdf file. These lecture notes introduce:
    • Basic features of Cassandra NoSQL DBMS,
    • Cassandra CQL data model:
      • Key space,
      • Table definition,
      • Column, and
      • Data Types

NoSQL Databases - General

  • Data Versioning, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes introduce data versioning as a technique to combat conflicting replica updates. Lecture Notes consider:
    • Influence of the replication on reads and writes,
    • Models of writing and reading as a function of the client consistency model:
      • Two phase commit (for the strict consistency),
      • Master Slave Model (for a conditional strict consistency),
      • No Master (All Master) Model (for the eventual consistency),
      • Quorum based two phase commit (for the strong consistency)
    • Data versioning techniques

  • Partitioning and Replication, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes introduce data partitioning and replication as two important techniques for implementing highly available, scalable, and network partition tolerant NoSQL databases:
    • Data partitioning and replication techniques,
    • Consistent Hashing, and
    • Membership changes

  • Trade-offs in Cloud Databases, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes introduce differences between conventional (relational) databases and NoSQL databases by discussing:
    • ACID database properties,
    • CAP Conjecture, and
    • BASE properties of NoSQL databases

* Introduction to NoSQL Databases, and you can find them here as a Power Point file, and here as a pdf file. These lecture notes introduce cloud computing, cloud services, and cloud databases by discussing:
    • Basic principles of cloud computing, and
    • Basic terms and techiques of NoSQL databases