introduction
- Slides What is Data Science?
- Slides Memory Latency and Distributed data analysis
- Slides Map Reduce (slideshow in browser) (pdf)
Spark Basics
- Slides RDDs Plain and (Key,Value) pairs (slideshow in browser) (pdf)
- Slides Spark intro (slideshow in browser) (pdf)
- Notebook Spark Basics 1 (Slideshow in browser) (pdf)
- Notebook Spark Basics 2 (Slideshow in browser) (pdf)
Spark Architecture
- Slides Word Count using Spark (Slideshow in browser) (pdf)
- Slides Distributed sort (pdf)
- Slides Spark Architecture (slideshow in browser) (pdf)
- Slides Partitioners and Glom (Slideshow in browser) (pdf)
- Notebook Execution plans, Lazy Evaluation, caching and Gloming (Slideshow in browser) (pdf)
Advanced Spark
- Notebook More RDD operations (Slideshow in browser) (pdf)
- Notebook Spark-SQL (Slideshow in browser) (pdf)