Types of HW

There will be Two types of HW: Regular “R” and Scaling “S”

  • Regular HW In regular HW your goal is to write correct code. Regular HW assignments require only the resources available on a common laptop. See here for more information on small homework assignments.

  • Scaling HW In scaling HW you are given correct but inefficient code. Your goal is to write code that would run efficiently on a Spark cluster. You will be given data files whose size ranges from 100MB to 100GB, and for each size there will be a target compute time. Your goal is to come within the target running time. You will still develop the code on your laptop, but to test it on a spark cluster. HW will be submitted using the mechanism described here. Using the same mechanism students can test their HW prior to submission.


  • Grading: Each HW will be graded on a scale of 0-100. The HW with the smallest grade will be dropped from the final class grade.
  • Plagiarism: The solution you write has to be 100% your own. We will test for plagiarism both between students and by using web resources. If two (or more) students are judged to have a copied piece, they will both have points deducted. We will not attempt to judge who had the original solution and who copied. It is therefore your responsibility both not to copy and to make sure nobody copied from you.
  • Debugging your code: It is your responsibility to debug your code. You can ask ask TAs general questions such as: Would you suggest using an RDD or a DataFrame for this problem. You cannot ask questions of the type “Why does my code not work?”


All homework assignments will be distributed through github.


  • Schedule: There will be one homework every week. Homeworks will be made available on friday morning. They will be due by midnight between Thursday and Friday of the following week.

Detailed due dates.

| Assignment number | Assignment type | release date | Due date for CSE255 | Due for DSE230 | |—|—–|—–|——|——|

  • Submission deadlines: Not submitting HW within 1 hours of a deadline will result in a zero grade given to that HW.