This class will use python 2.7 for all homeworks. Make sure you use python2 and not python3. The easiest way to install everything is to use anaconda on a linux or Mac computer.
Install and configure git and clone the course repository.
If you are already familiar with git, just clone the respository:
git clone https://github.com/ucsd-edx/CSE255-DSE230.git
If you are new to github, follow these directions
If you install Anaconda, jupyter and almost all the necessary packages are installed for you.
Otherwise, follow the directions for DSE200 software installation skip Startup directions for github and choose the installation directions that are right for you computer.
Install notebook extensions
This step is not required, but extensions can make your work on notebooks significantly easier.
To install a bunch of useful extensions, together with a configurator for managing thses extensions, follow the directions on:
Install python packages
Make sure to install the python package findspark. The typing the following command in the terminal installs the package:
Anaconda: conda install -c conda-forge findspark=1.0.0 pip: sudo pip install findspark
If you are using pip instead of anaconda, you also must install the following packages:
Test Drive jupyter notebooks
After you have cloned the this classes public github repository (first
step in this section) cd into the directory called Classes and the
start jupyter by running the command
jupyter notebook in the
terminal. This should automatically launch jupyter in one of your
internet browsers. Try exploring the directory. In the sub-directory
00.Background there is are some useful python notebooks that
introduce the pandas package. You could also try to get started on the
first small homework in the sub-directory, 0.MemoryLatency.