Past Seminars | UCSD DSE MAS

Peyman Hesami / Qualcomm Technologies, Inc.

November 2, 2018

Personalization via Reinforcement Learning

In this talk, we look at the applications of reinforcement learning for personalization of personal electronic devices. We will look at a fictional electronic device (a smart speaker) where the goal is to route the audio to the best available speaker on the device. Specifically, a machine learning agent can learn the optimal speaker’s configuration from a pre-specified range of parameters. The ML agent can learn (and update) the optimal and personalized configurations down to the user level. Our analysis shows that the ML agent outperforms a system that’s using a static configuration for the speakers! Finally, this approach, can be applied to any real personal electronic device/feature that can get a boost in user experience from personalization!

About Peyman Hesami

Peyman Hesami received his B.S. in electrical engineering from the University of Tehran, Tehran, Iran, in 2009. He received a M.S. degree, also in electrical engineering, from the University of Notre Dame, Notre Dame, IN, in 2011 and holds a M.S. degree in Data Science and Engineering from UCSD. Peyman is currently a senior data scientist at Qualcomm Inc. working on developing products that use real-time device-based machine learning on the cellular networks and mobile phones.

Keith Muller / Teradata Corporation

October 26, 2018

Technology and Realizing a Successful Application with Data Science

The goal of a successful application of data science is to solve a problem by finding value in data. In the commercial world, these values include the obvious: increased profits, improved product safety, lower prices from improved efficiency, and in general the typical metrics expected in any profit driven environment. More recently, commercial applications also look for values in data that have benefits to public health and welfare.

The premise of this talk is that a successful data scientist needs to have a better understanding of the relationship between the value of the problem, the data (you have or don’t have), and the technology available to realize the solution. Specifically, we propose this multi-way relationship has an increasing influence on the success of old application evolution and in the selection of new applications.

We will briefly introduce the trends, the pros, and the cons of some basic key technologies. We will illustrate the how and why behind a couple of innovative and influential commercial applications of data science. For each application we will look at how the data was obtained, how value was found in the data, and how technology influenced application selection and evolution.

About Keith Muller

Keith Muller has been a practicing engineer for over 40 years, with close to 30 of those years in the data analytics industry. His new role at Teradata is focused on long term technology development and strategy. His prior assignment at Teradata was as CTO and Chief Platform Architect, a position he has held for over 20 years where he has the opportunity to help design most of the largest commercial data analytic systems deployed. He taught at UCSD in the CSE department from 1984 to 1998. He started his career as CPU hardware engineer, but soon realized software and data analytic systems were more fun to work on. His research interests include architecture and optimization (performance, availability and continuity) of data analytic systems (especially scaling problems from large scale data sets; >10 PB currently), storage technologies, real time systems, and technology in general.

Zank Bennett / Bennett Data Science

October 12, 2018

Five Steps to Becoming a Killer Data Scientist & Data Science Deployment Without Dev Ops

About Zank Bennett

After earning Bioengineering and Electrical and Computer Engineering degrees from UCSD, Zank started his career doing extensive mathematical modeling and data visualization at Abbott Labs. He followed by spending nearly a decade at SAIC, leading teams working on “DARPA hard” problems, vital to United States security. Zank was awarded a medal of achievement for his part in a project that used mass spectrometry to detect, identify and quantify airborne pathogens, as reported in Scientific American. The technology was used to first identify and classify H1-N1 as a swine and avian flu. He continues to expand his career by helping companies succeed by finding novel ways to grow their products and revenue utilizing artificial intelligence. Most recently, Zank has taken a role as the Director of Data Science at TrunkClub, a Nordstrom Company.

Slides as PDF

Peter Calhoun / Dexcom, Inc.

May 25, 2018

About Peter Calhoun

Peter Calhoun is a biostatistician at Dexcom, Inc. evaluating and improving the effectiveness of continuous glucose monitors. He has worked as a biostatistician for seven years doing diabetes research and working closely with the FDA. Peter graduated from University of Florida with a Master’s degree in statistics and earned a PhD in computational statistics at the San Diego State University and Claremont Graduate University joint doctoral program. In his free time, he uses statistics to predict NFL games and March Madness.

Zank Bennett / Bennett Data Science

May 11, 2018

About Zank Bennett

Slides: Data Science in Practice

Slides: Machine Learning for Marketing Optimization

Henry Abarbanel / UC San Diego

April 13, 2018

Colin Jemmott / Seismic Software

April 27, 2018

Peyman Hesami / Qualcomm Technologies, Inc.

February 16, 2018

Network Failure Prediction in Superbowl LII (Stadium Analytics)

About Peyman Hesami

Slides as PDF

Nathan Lewis / UC San Diego

January 20, 2018

About Nathan Lewis

Dr. Lewis is an Assistant Professor of Pediatrics and Bioengineering at the University of California, San Diego. During his BS (Biochemistry, Brigham Young University), PhD (Bioengineering, UC San Diego) and postdoctoral work (Genetics, Harvard Medical School), he developed novel approaches for analyzing biological big data using genome-scale systems biology modeling techniques. In parallel, he helped lead efforts to sequence the genome of the Chinese hamster and several CHO cell lines. Dr. Lewis’ lab integrates all of his previous work by focusing heavily on the development of novel diagnostics for childhood disorders and also the use of systems biology and genome editing techniques to map out and engineer the cell pathways controlling mammalian cell growth, metabolism, protein synthesis, and protein glycosylation, in an effort to develop improved drug production host cells. See more on research in the Lewis lab at http://lewislab.ucsd.edu.

Bradley Voytek / UC San Diego

January 19, 2018

About Bradley Voytek

Bradley Voytek is an assistant professor in the Department of Cognitive Science, the Neurosciences Graduate Program, and the Halicioglu Data Science Institute at UC San Diego and an Alfred P. Sloan Neuroscience Research Fellow. He’s a founding faculty member of the UC San Diego undergraduate Data Science major and in 2011 was the first Data Scientist at Uber. He received his PhD from UC Berkeley in neuroscience and was a post-doctoral fellow at UCSF. His research centers around the computational role that neural oscillations play in coordinating information transfer in the brain. His research program combines large-scale data mining and machine learning techniques with hypothesis-driven experimental research. He is also known for his zombie brain “research” with his friend and fellow neuroscientist Timothy Verstynen, with whom he has published the book Do Zombies Dream of Undead Sheep?, by Princeton University Press. He blogs at Oscillatory Thoughts and is active on twitter as @bradleyvoytek.

Cinnamon Bloss / UC San Diego

November 17, 2017 The divide between health-related big data capabilities and individual privacy controls and protections is widening. Highly granular personal health data (e.g., from wearable sensors or personal genome sequencing), hold promise for improving health. Ironically, however, while the rhetoric around this promise focuses on empowerment of people to take greater control over their own health, we have a big data ecosystem in which people may have little control over the flow of their personal health information, and thus their privacy. Moreover, despite the widely recognized importance of privacy, there is little consensus among scholars and stakeholders as to what privacy actually is or means. With the goal of understanding individual conceptualizations of privacy with respect to personal health data technologies, we conducted focus groups, interviews, and surveys with individuals sampled from a diverse set of patient and demographic groups. This presentation will highlight findings from this work.

About Cinnamon Bloss, Ph.D.

Cinnamon Bloss, Ph.D. is Associate Professor in the Departments of Psychiatry and Family Medicine and Public Health, Division of Health Policy at the University of California, San Diego. She is an adjunct Policy Analyst at the J. Craig Venter Institute and a California-licensed clinical psychologist. Dr. Bloss’s career has been focused on transdisciplinary research. She has managed a number of multidisciplinary research teams in the context of large-scale projects in areas such as direct-to-consumer genomics, genome sequencing in diagnostic odyssey cases, privacy and big data, and genome editing for control of infectious disease. She manages an active independent research laboratory that includes several postdoctoral fellows, graduate students, undergraduates, and research staff.

Mai Nguyen / San Diego Supercomputer Center

October 27, 2017

Practical Data Science Examples: Santa Ana Detection and Demographics Analysis from Satellite Images

Two research projects will be presented to demonstrate the application of data science techniques to real-world problems. In one use case, cluster analysis is applied to sensor measurements from weather stations to provide location-specific and time-specific detection of Santa Ana conditions. The Kepler workflow system and Spark distributed framework are used to add usability and scalability. In another use case, satellite images are processed to analyze the demographics distribution of a region. This approach uses deep learning as well as conventional machine learning techniques to analyze satellite imagery, and has applications to several remote sensing problems.

About Mai Nguyen

Mai Nguyen is the Lead for Data Analytics in the San Diego Supercomputer Center (SDSC) at the University of California, San Diego (UCSD). Her research centers on applying machine learning techniques to interdisciplinary problems and combining machine learning methods with distributed computing to analyze big data. Prior to joining SDSC, she worked in industry on applications in machine learning, data mining, business intelligence, and data warehousing. She has also been teaching in these areas since 2009. Mai received her M.S. and Ph.D. degrees in Computer Science from UCSD, with focus on machine learning.

Slides as PDF

Matthias Blume / LoanHero

April 28, 2017

Data Science for Consumer Lending

Lending is one of the first areas in which machine learning proved to make better decisions than humans do. Today, trillions of dollars per year of credit granting decisions are based on computational models, and San Diego hosts a score of fintech companies. Matthias will present practical aspects of how data science is used by lenders from prospecting through fraud detection, underwriting, and servicing and will highlight how data science is changing the industry via marketplace lending, additional data sources, and new techniques such as deep learning.

About Matthias Blume

Matthias Blume is Chief Risk Officer at LoanHero. Previously, he was Director of Analytics at CoreLogic and FICO. He holds nine fintech patents and has published numerous papers. He received his PhD from UCSD in 1999, where he pursued a truly interdisciplinary study of neural networks.

Slides as PDF

Ilkay Altintas / San Diego Supercomputer Center

Analyzing Big Data Using Workflows: From Fighting Wildfires to Helping Patients

We will be looking at the scope of Data Science as a field, Big Data and Big Compute as disciplines, their respective trends and the new era of Data Science. This new era of Data Science encompasses new and unique challenges involving many factors like volume, velocity, and variety- and also new tools to combat these challenges. From people, purpose, process, platforms, and programmability - we will take a look at how to navigate Big Data as a discipline and process utilizing workflows, as well as looking at various applications of using workflows to analyze big data such as the WIFIRE project.

Wildfires are critical for ecosystems in many geographical regions. However, our current urbanized existence in these environments is inducing the ecological balance to evolve into a different dynamic leading to the biggest fires in history. Wildfire wind speeds and directions change in an instant, and first responders can only be effective if they take action as quickly as the conditions change. What is lacking in disaster management today is a system integration of real-time sensor networks, satellite imagery, near-real time data management tools, wildfire simulation tools, and connectivity to emergency command centers before, during and after a wildfire. As a first time example of such an integrated system, the WIFIRE project is building an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. This paper summarizes the approach and early results of the WIFIRE project to integrate networked observations, e.g., heterogeneous satellite data and real-time remote sensor data with computational techniques in signal processing, visualization, modeling and data assimilation to provide a scalable, technological, and educational solution to monitor weather patterns to predict a wildfire’s Rate of Spread.

About Ilkay Altintas de Callafon

In addition to being a co-director of the Data Science & Engineering (DSE) program, Ilkay Altintas is the Chief Data Science Officer at the San Diego Supercomputer Center (SDSC), UC San Diego, where she is also the Founder and Director for the Workflows for Data Science Center of Excellence. Since joining SDSC in 2001, she has worked on different aspects of scientific workflows as a principal investigator and in other leadership roles across a wide range of cross-disciplinary NSF, DOE, NIH and Moore Foundation projects. She is a co-initiator of and an active contributor to the popular open-source Kepler Scientific Workflow System, and the co-author of publications related to computational data science and e-Sciences at the intersection of scientific workflows, provenance, distributed computing, bioinformatics, observatory systems, conceptual data querying, and software modeling. Ilkay is the recipient of the first SDSC Pi Person of the Year in 2014, and the IEEE TCSC Award for Excellence in Scalable Computing for Early Career Researchers in 2015. Ilkay Altintas received her Ph.D. degree from the University of Amsterdam in the Netherlands with an emphasis on provenance of workflow-driven collaborative science.

Slides as PDF

John Hildebrand / Scripps Oceanography.

Recent advances in digital data storage capacity and low power electronics have made it possible to collect long-term continuous broadband passive acoustic data and thereby capture the full range of marine mammal sound production in an ocean setting.

Advances are also required in data curation, search, analysis, and visualization. We have developed methods to analyze and manage passive acoustic monitoring data that we are now acquiring at a rate of approximately 25 Tb/month.

Initial stages of data processing include converting the data from the internal instrument format to a standard audio file format with metadata extensions and preparation of both working and archival copies of the data. Standardized spectra are calculated for all data using a minimum of three frequency bands: (1) high frequency up to 160 kHz, (2) mid frequency up to 5 kHz, and (3) low frequency up to 1 kHz.

A set of 16 CPUs are operated in parallel for timely initial data processing. A number of automatic detectors are run on the data including: spectrogram correlation for blue whale call detection, energy detection for fin whale calls and anthropogenic sounds, power-law detector for humpback whale units, and Teager-energy based echolocation click detection and an expert system for beaked whale echolocation clicks.

Software (Triton) has been developed for efficient manual scanning and signal discovery. The key feature of Triton is the capability to display spectrograms on virtually any time scale and provide an index between long-term spectral averages (minutes to days) and short-term spectrograms (sec to msecs). Analysis effort is also standardized using a detection logger feature, allowing multiple analysts to contribute to the same dataset with uniform coverage. Detections are aggregated into a database (Tethys) that allows combination of multiple datasets and association with environmental or other data.

Javier R. Movellan / UC San Diego and Emotient

The human brain could be described as a computer designed to operate with hands and faces. The hands and the face take about 80% of the sensory motor areas of the human brain. The hands specialize in interaction with the physical world and the face in interaction with the social world. Computers can do complex things with the information we provide them with our hands via keyboard and mouse, but until recently they have been blind to the wealth of information we provide with our faces. For the last 20 years the Machine Perception Laboratory at UCSD has been pursuing the development of technology for automatic recognition of facial expressions. In this talk I will present the progress we made, from the early proof of concept prototypes, to the development of the first commercial smile detector embedded in digital cameras, to the implementation of large scale real time expression recognition systems.

Massimo Mascaro and Joe Cessna / Intuit

Intuit is transforming itself from a product oriented to a data company, leveraging the great amounts of data about its customers to produce improved and personalized experiences and to advance in new business areas. In the first part of the talk we will cover a broad range of topics related to the Tax Business where data science is being applied in an impactful way. In the second part we will drill down into the methods we’re using to rank tax topics and questions and into how we are using novelty detection to monitor both the system’s performance and the health of our business.

About Massimo Mascaro, PhD

Massimo Mascaro is a Sr. Data Scientist in the Intuit Consumer Tax Group where he leads the Data Science&Data Architecture team, overseeing data science projects between both online and offline analytics.

Prior to intuit Massimo worked for The Intellisis Corporation, leading the R&D team where he developed and patented algorithms for robust speech segmentation. Prior to that Massimo was a Microsoft, in the Bing Core Ranking team where he lead the data science team responsible for personalized web ranking. Before Bing, Massimo has been a Technical Program Manager and Architect for In the Technical Computing division of Microsoft, where worked on .NET Framework Parallel and Distributed programming extensions. Ahead of his Microsoft tenure, Massimo founded and lead a small startup in Italy that specialized in OCR for large financial customers.

In his early career Massimo has been a PostDoc and Lecturer at the University of Chicago, doing research on Recurrent Neural Networks, Computer Vision and biological models of the brain visual cortex. Massimo has a PhD in Neuroscience and a Master in Theoretical Physics, both from the University of Rome, Italy.

About Joe Cessna, PhD

Joe Cessna is a Data Scientist, working for Intuit’s Consumer Tax Group (TurboTax) here in San Diego. His current work is focused around the processing and understanding of the vast amounts of analytics data continually produced by our core products. This includes automatic segmentation and unsupervised anomaly detection across numerous, disparate metrics and business KPIs.

Prior to Intuit, Joe worked as the Program Director and Technical Lead for the Intelligence, Surveillance, and Reconnaissance (ISR) Business Unit at Numerica Corporation in Colorado. During his time at Numerica, Joe led programs with the Air Force, Navy, National Security Agency (NSA), and National Reconnaissance Office (NRO) working on electronic intelligence (ELINT) interception, multi-sensor data fusion, classification fusion, non-cooperative target recognition, and target anomaly detection.

Joe received his M.S. (in Engineering Physics) and Ph.D (in Computational Science, Applied Math, and Engineering) from UCSD in 2008 and 2010 respectively. His thesis developed novel algorithms for data assimilation and estimation of high-dimensional chaotic systems as well as efficient computational techniques for implementing the algorithms on switchless, distributed spherical grids. Prior to moving to San Diego, Joe earned a B.S in Engineering Mechanics/Astronautics and a B.S. In Mathematics from the University of Wisconsin, Madison.

Charles Elkan / Amazon and UC San Diego

What Really Matters in Data Science

The learning algorithms in widespread use for in companies nowadays include linear methods for classification and regression, nonlinear methods for the same tasks, clustering techniques, topic models, and recommendation methods. I’ll outline what each of these methods is, and discuss how successful, or not, it tends to be in practice. Then I will explain unsolved issues that arise repeatedly across applications.

About Charles Elkan, PhD**

Charles Elkan is the first Amazon Fellow, on leave from being a professor of computer science at the University of California, San Diego. In the past, he has been a visiting associate professor at Harvard and a researcher at MIT. His published research has been mainly in machine learning, data science, and computational biology. The MEME algorithm that he developed with Ph.D. students has been used in over 3000 published research projects in biology and computer science. He is fortunate to have had inspiring undergraduate and graduate students who are in leadership positions now such as vice president at Google.

Slides as PDF

Irene Clepper / Mitchell International

Mitchell International’s Journey to Business Intelligence & Analytics

About Irene Clepper

Irene Clepper is a data enthusiast with over 20 years of experience in delivering business intelligence solutions for medical informatics, electronic medical records and Property and Casualty industry. Since 2001 Irene has served in various engineering leadership roles at Mitchell International. Currently Senior Director of Enterprise Business Intelligence and Analytics, she leads a corporate initiative to build enterprise analytics platform which will leverage the depth and breadth of Mitchell’s data assets. Passionate about creating an inspired workplace, Irene co-founded Mitchell’s first diversity group: Women (m)Power Network. Its mission is to propel talented women and men to leadership positions, developing a strong pipeline of talent.

Before coming to Mitchell, Irene worked at Oracle Corporation and Science Applications International Corporation (SAIC) in a variety of software engineering positions. Irene holds a Bachelor of Arts degree is in Economics and earned a Master of Science degree in Computer Science from the University of California, Davis. She is a Certified Oracle Professional. Outside of work, Irene has served on the board of the San Diego Chinese Culture Association, a non-profit organization promoting Chinese culture and language learning, since

She is a member of the Society of Women Engineers (SWE) and Athena San Diego.

Slides as PDF