Data Visualization - Winter 2017

Instructor: Amit Chourasia, San Diego Supercomputer Center, UCSD

Teaching assistant: Dhruv Sharma, Computer Science and Engineering, UCSD

Textbook Visualization Analysis and Design, Tamara Munzner (A K Peters Visualization Series, CRC Press, 2014)


Day 1 Jan 7 , Day 2 Jan 21 , Day 3 Feb 4 , Day 4 Feb 18 , Day 5 Mar 4 , Day 6 - Finals Mar 18


Class schedule

Day 1 Supplement (Jan 6)

Morning

Tutorial by TA

Day 1 (Jan 7)

Morning

Afternoon

  • Marks and Channels - Slides (PDF)
  • Rules of thumb - Slides (PDF)
  • 3pm - 4:30pm Guest tutorial - Vega-lite, Dominik Maritz, Graduate student, Department of Computer Science & Engineering, University of Washington
  • Exercise 2 using Vega-lite (Time permitting)

Home work


Day 2 (Jan 21)

Morning

  • Guest Lecture - Applying Color Theory to Visualization. Thersa-Marie Rhyne, Computer Graphics and Visualization Consultant. Slides (PDF)

    Abstract: We examine the foundation of color theory and how these methods apply to building effective visualizations. We define color harmony and demonstrate the application of color harmony to case studies. The material presented is from my book on “Applying Color Theory to Digital Media and Visualization”.

  • Colors suppliment - Slides (PDF)
  • Cognition videos (Time permitting)
  • Tables - Slides (PDF)

Afternoon

  • Network and Trees - Slides (PDF)
  • Tutorial (TA): Tableau
  • Exercise 2 using Tableau (Time permitting)

Home work

  • Reading - Chapter 10 (Map color and other channels)
  • Exercise 3 & Exercise 4
  • [Final project proposal] (https://mas-dse.github.io/DSE241/2017/final#finalproposal)

Day 3 (Feb 4)

Morning

Afternoon

  • 1:00 pm Guest lecture - Dimensionality Reduction From Several Angles. Dr. Tamara Munzner, Professor, Department of Computer Science, Univ. of British Columbia. Slides (PDF)

    Abstract: I will present several projects that attack the problem of dimensionality reduction (DR) in visualization. Much of this work was informed by a two-year qualitative study of high-dimensional data analysts in many domains, to encapsulate the use of DR “in the wild” as a small set of abstract tasks. We used different methodological angles of attack in order to answer different kinds of questions, according to our Nested Model of visualization design and evaluation. First, can we design better DR algorithms? Glimmer is a multilevel multidimensional scaling (MDS) algorithm that exploits the GPU. Glint is a new MDS framework that achieves high performance on costly distance functions. Second, can we build a DR system for real people? DimStiller is a toolkit for DR that provides local and global guidance to users who may not be experts in the mathematics of high-dimensional data analysis, in hopes of “DR for the rest of us”. Third, how should we show people DR results? An empirical lab study provides guidance on visual encoding for system developers, showing that points are more effective than spatialized landscapes for visual search tasks with DR data. A data study, where a small number of people make judgements about a large number of datasets rather than vice versa as with a typical user study, produced a taxonomy of visual cluster separation factors. Fourth, when do people need to use DR? Sometimes it is not the right solution, as we found when grappling with the design of the QuestVis system for a environmental sustainability simulation. We provide guidance for researchers and practitioners engaged in this kind of problem-driven visualization work with a nine-stage framework for Design Study Methodology.

  • Final project proposal presentations (By students) Presentation order

Home work

  • Exercise 5 & 6

Day 4 (Feb 18)

Morning

Afternoon

  • Guest lecture - Dr. Alark Joshi, Associate Professor, Department of Computer Science, University of San Francisco

    Abstract: Unboxing cluster heatmaps Cluster heatmaps are commonly used in biology and related fields to reveal hierarchical clusters in data matrices. However, cluster heatmaps have known issues making them both time consuming to use and prone to error. We developed an approach to ``unbox’’ the heatmap values and embed them directly in the hierarchical clustering results, allowing us to use standard hierarchical visualization techniques as alternatives to cluster heatmaps. We then tested our hypothesis by conducting a survey of 45 practitioners to determine how cluster heatmaps are used, and evaluating those alternatives with interviews of practitioners and an Amazon Mechanical Turk user study. We found that gapmaps were preferred by the interviewed practitioners and outperformed or performed as well as cluster heatmaps for clustering-related tasks. Based on these results, we recommend users adopt gapmaps as an alternative to cluster heatmaps.

    Do Defaults Matter? Evaluating the Effect of Defaults on User Preference for Multi-Class Scatterplots In this paper, we evaluate the effect of using defaults when visualizing the same data in four widely-used visualization tools: Tableau Desktop, Microsoft Excel, the ggplot2 R library, and the matplotlib Python library. We used the default settings in these tools to create multi-class scatterplots for several synthetic datasets generated using the scikit-learn package in Python. We conducted a within-subjects pilot study with 39 users and a follow-up study with 202 users to explore whether users have strong preferences for different default settings.

  • Case study presentations (By students) - Presentation order

Home work


Day 5 (Mar 4)

Morning

Afternoon

  • Guest lecture - Visual Analytics: A Data Scientist’s Secret Tool. Dr. Abon Chaudhary (Walmart Labs)

    Abstract: Visual analytics offers a wide range of techniques to explore data and results. In this talk, I will explain how some of these techniques can be leveraged to build superior machine learning based applications. A few use-cases related to large-scale classification - a problem commonly faced in the e-commerce (for classifying commodities into several categories) and many other domains - will be discussed to show the role of visual analytics in exploring data, building a model, and evaluating it. I will also discuss the use of visual analytics in model diagnostics and for comparison among models trained with different features or parameters.

Home work


Day 6 - Final exam (Mar 18)

Student presentations : Final project presentations


Course Grading

- A (Excellent 4.0) >= 90% - B (Good 3.0) >= 80% - C (Fair 2.0) >= 70% - D (Barely passing 1.0) >= 60% - F (Fail) < 60%

Grade calculation will be as follows

Class policy

  • Attendance is mandatory
  • Must complete all exercises
  • Must complete final project

Guest Lecturers

  1. Vega Lite Tutorial - Domink Moritz, Graduate student, Department of Computer Science & Engineering, University of Washington
  2. Applying Color Theory to Visualization - Thersa-Marie Rhyne, Computer Graphics and Visualization Consultant
  3. Dimensionality Reduction From Several Angles - Dr. Tamara Munzner, Professor, Department of Computer Science, Univ. of British Columbia
  4. Clustered Heatmaps - Dr. Alark Joshi, Associate Professor, Department of Computer Science, University of San Francisco
  5. Visual Analytics: A Data Scientist’s Secret Tool - Dr. Abon Chaudhuri, Applied Researcher, Walmart Labs