Vocareum

Vocareum will provide the cloud infrastructure for the class. This means that you do not have to install anything on your laptop and can do all of your assignment on Vocareum in a configured and controlled environment. Each student registered to a class should have received an email with their account information by the time of the first class. If this is not the case, please contact your TA.

How to get help

  • Debugging your code: It is your responsibility to debug your code. You can ask ask TAs general questions such as: Would you suggest using an RDD or a DataFrame for this problem. You cannot ask questions of the type “Here is my code, why does it not work?”
  • Stackoverflow contains vast amounts of searchable help. Use it!
  • Piazza If you cant find an answer on the web, check posts on Piazza. If you can’t find a relevant posting, start your own (but really do search before you post, it wastes everybody’s time to identify that your questions was answered before and to post a link to that posting).
  • Talk with other students Check with other students whether they are experiencing the same problem.
  • Ask a TA to look at your code If at least three students are experiencing the same problem, out a message on Piazza and include your vocareum IDs in it. Using that ID the TA can view your work and give you specific advice. Again it is not the job of the TA to debug your code, this is intended for the rare occasion where something is not working as expected and there is no simple workaround.

Types of HW

There will be Two types of HW: Regular “R” and Scaling “S”

  • Regular HW In regular HW your goal is to write correct code. Regular HW assignments require only the resources available on a common laptop. See here for more information on small homework assignments.

  • Scaling HW In scaling HW you are given correct but inefficient code. Your goal is to write code that would run efficiently on a Spark cluster. You will be given data files whose size ranges from 100MB to 100GB, and for each size there will be a target compute time. Your goal is to come within the target running time. You will still develop the code on your laptop, but to test it on a spark cluster. HW will be submitted using the mechanism described here. Using the same mechanism students can test their HW prior to submission.

Grading

  • Grading: Each HW will be graded on a scale of 0-100. The HW with the smallest grade will be dropped from the final class grade.
  • Plagiarism: The solution you write has to be 100% your own. We will test for plagiarism both between students and by using web resources. If two (or more) students are judged to have a copied piece, they will both have points deducted. We will not attempt to judge who had the original solution and who copied. It is therefore your responsibility both not to copy and to make sure nobody copied from you.

Schedule

  • CSE255 Schedule: There will be one homework every 1-2 week. Homeworks will be made available on friday morning. They will be due by midnight between Thursday and Friday of the following week.
  • DSE230 Schedule: Homeworks will be grouped according to the 2 week schedule. They will be made available on the friday before each saturday meeting and will be due on the monday following the next meeting. This is to give students with the opportunity to ask questions before starting the assignment as well as before the submission due date.

  • Submission deadlines: Not submitting HW within 1 hours of a deadline will result in a zero grade given to that HW.