Section 1: Basics
- What is data science?
- Computation locality and the memory Hierarchy.
- map-reduce , RDDs
- counting words example, loading, processing, collecting.
Section 2: DataFrames and PCA
- DataFrames, Spark-SQL, Parquet.
- PCA, Working Nan entries
- The weather database and it’s analysis using PCA.
- Combining effects and Percentage Variance Explained
Section 3: Clustering and intrinsic dimension
- K-means++ and intrinsic dimension.
- Non-linear dimensionality reduction.
- Locally linear embeddings
- Spectral analysis - The graph Laplacian
Section 4: Classification:
- Logistic regression
- Tree-based regression
- Ensamble methods for classification
- Random forests
- gradient boosted trees
- Boosting and resampling.
Section 5: Deep Neural Networks and Tensor-Flow
- DNNs: the good, the bad and the ugly.
- Convolutional Networks.