Introduction

CSC443, fall 2018

1. Database management

In the simplest form of data manage, we need to store bytes of data for later retrieval. The simplest model that describe the majority of the medium (known as memory or storage) in which data can reside is:

  1. the medium stores data in units of bytes with linearly addressable space form 0 to N-1, where N is the capacity in bytes of the medium.
  2. the medium supports read and write operations.
class Memory {
    byte read(address)
    void write(address, byte)
}

Each operation takes some time to complete. The time it takes to complete the data read/write is known as data access latency.

In a way, all database management systems can be reduced to byte-level read/write to some linearly addressable storage medium. However such over-simplification is completely pathological as a practical system requires a much richer abstraction to confront real-life issues.

1.1. Data modeling

Real life data are naturally expressible by data structures. In the relational data model, there exists a natural data hierarchy:

  • Scalars: numbers, strings, date time, blobs
  • Tuples or records
  • Relations or tables
  • Databases
  • Database clusters

Data processing algorithms need to process data in the units of the data model. It's simply infeasible to express any nontrivial algorithm in terms of data I/O at the byte-level.

CSC343 deals with the data model part of data management.

1.2. Storage modeling

Different families of data storage medium technologies have been introduced ever since the invention of computers. They are designed to co-exist and complement each other to maximize the efficiency of data processing. It's important to distinguish them in terms of the following properties:

Property Difference in technology
Affordability inexpensive / expensive
Capacity small / large
Performance slow / fast
Persistence volatile / nonvolatile
Reliability low error rate / high error rate

While a linearly addressable byte array quite accurate model for most data storage mediums, It's quite complex to accurately model data access latency.

In the first part of this course, we will present a much more accurate model of data storage technologies of the modern computer system. One will see that there is are different forms of data storage medium with different properties. We will refer to the collection of data storage mediums as the memory hierarchy.

2. About this course

This course discusses the detailed modeling and characteristics of the memory hierarchy. We will revisit the relational data model and its query language, and investigate their connection to the memory hierarchy.

We will begin with:

  • data arrangement in different levels of the memory hierarchy
  • algorithms that work with the whole memory hierarchy
  • compilation of relational queries to such algorithms

  • concurrency
  • crash and recovery
  • parallel databases
  • non relational databases