Instructor: Alin Deutsch, lastname at cs dot ucsd dot edu

Time: Instruction: Fridays 9:00-4:30 Office Hours: Tuesdays and Thursdays 9:10-10:10am

The course covers data models, query languages and models of computation beyond those employed in relational databases. It addresses new developments that have gained attention with the advent of the Web 2.0 and Big Data revolutions. The topics are presented in a unifying framework and include:

  • key-value pairs as data model, as used in Google’s Big Table;
  • Object- Oriented Data Model, with its practical support in relational databases via the Object-Relational Mapping (involves ODMG standards ODL and OQL, and recent systems such as Ruby on Rails);
  • semi-structured databases (data organized as graph with labels on nodes and edges), with query languages based on reachability constraints between nodes: conjunctive regular path queries);
  • XML databases, as special case of semi-structured databases in which the graph is a tree (this involves associated standards such as XML Schema, XPath and XQuery);
  • RDF databases (with associated OWL and SPARQL standard).

Announcements and Discussion Forum

Please sign up for the course’s Piazza page for regular announcements and a discussion board.

Prerequisites

Prior exposure to SQL (either in an introductory db class or programming experience)

Textbook

There is no required or recommended textbook. The class Web site offers all necessary reading material (slides/papers/book chapters). If interested in delving deeper, contact Alin for relevant (text)book recommendations.

Topics Covered

Relational (Very quick recap, to contrast and compare)

  • Relational Data Model</li>
  • Relational Databases (slides used in class: Relational Model)</li>
  • The standard query language SQL (slides used in class: SQL Data Manipulation Language)
  • A query language based on pattern matching: QBE (Query by Example)

Beyond relational

  • Object-oriented and object-relational databases
    • Object-oriented Data Model
    • OO Databases
    • The ODMG standard:
      • Object Definition Language (ODL) and Object Query Language (OQL) (slides, ODMG’s user manual, textbook material from “First Course in Database Systems” by Widom and Ullman)
    • Ruby on Rails (a typical representative of modern ORM database technology)
      • For the global overview and the architecture of the Web application, I highly recommend this deck of slides by James Reynolds from Univ. of Utah
      • For the ORM mapping, I recommend this slide deck from the UCB RAD Lab.
      • the Ruby host language (a good place to start is this tutorial by Chris Pine)
  • Graph Databases (a.k.a. “Semi-structured” Data Model)
    • Query languages based on regular path expressions
  • Semantic web data and query langeuages (RDF & SPARQL , OWL-S, Description Logics)
    • JSON & JAQL
    • Neo4J’s Cypher (interactive reference manual)
  • XML Data Model
    • The data model (slides, chapter - from the new book “Web Data Management” by Abiteboul et al.)
    • The XPath query language (slides, chapter)
    • The XQuery language (slides, chapter)
    • The XUpdate language (slides)
  • Text Databases
    • Keyword search languages
    • Full-Text search languages and indexing (Lucene)

Time permitting, we will also discuss the following:

  • Programming Paradigms for Parallel Computing
    • Map-Reduce programming for arbitary data collections (emphasis on high-level languages such as HiveQL and PigLatin that are compiled to map-reduce jobs)
    • Parallel Graph Processing based on Valiant’s Bulk Synchronous Parallel model of computation (a la Pregel, Giraph, GPS, GraphLab)

Grading

There will be 4 homeworks (2 written, 2 programming assignments, each worth 15% of the grade), a class participation grade (20%) and a final (20%).