There are two goals to this page:
- Help MAS-DSE students brush up on material needed for their courses . (items arked with BU)
- Provide students with references to useful background material.
Math
Linear Algebra
- BU Gilbert Strang / Linear Algebra An excellent introduction to Linear Algebra. Check out the Videos!
Khan Academy
Linear Algebra Reference
Probability and statistics
- BU OpenIntro Statistics
- BU Think Stats/Probability and Statistics for Programmers by Allen B. Downey
Programming
Python
Python Books
- BU Interactive Python An interactive book: you do the excercises right inside the online book.
- BU Introduction to computer science and programming using python An excellent book, and an excellent MOOC based on that book.
- BU Dive Into Python
Python Reference
Some useful Python libraries
Pandas
- PANDAS: Python Data Analysis Library
- Pandas and Python: Top 10 - Curiosity
- Book: Python for Data Analysis (Pandas)
Unix
- BU Introduction to Linux: Slides Caroline Sporleder & Ines Rehbein and Book / Machtelt Garrels
DataBases
BU Online course on Databases
an excellent self paced online course from Stanford’s database lab. Please make sure to cover at least the mini-courses entitled
- Introduction and Relational Databases
- SQL, and
- Relational Algebra
Git
Jupyter / IPython
- IPython documentation
- ipython magics
- jupyter extensions configurator The best way to add and manage ipython notebook extensions.
- ipython/ipython/help - Gitter
Spark
Spark Overview
- Overview - Spark Documentation
- Online Guide: Spark Programming Guide
- python examples are available in the spark github repository
- Book: Learning Spark
- Mooc: Big Data Analysis using Spark
- Spark using Scala Slides (Databricks)
Spark Reference
- spark Python API Docs
- spark Scala API
- apache spark tutorial (tutorialPoint)
- Spark Cheat-Sheets (DZone)
Spark-SQL
- pyspark.sql DataFrame documentation
- Spark Python API Docs!
- Complete Guide to DataFrame Operations in PySpark
- Supported syntax of Spark SQL
- Using SQL and User-Defined Functions with Spark DataFrames