SCAIM Seminar: Hans De Sterck | PIMS - Pacific Institute for the Mathematical Sciences

Topic

Accelerated Parallel Optimization Algorithms for Distributed Data Analytics in Apache Spark

Speakers

University of Waterloo

Details

Scalable parallel optimization methods are gaining importance for a wide range of machine learning applications, for example, as implemented in the machine learning library of the Apache Spark distributed data processing environment. I will discuss our work on accelerating parallel algorithms for two common applications in this area: matrix factorization for recommendation systems, and line search methods for problems such as logistic regression.

For the recommendation application, we accelerate the standard Alternating Least Squares (ALS) optimisation algorithm using a nonlinear conjugate gradient (NCG) wrapper around the ALS iterations. In parallel numerical experiments on a 16 node cluster with 256 computing cores, we demonstrate that the combined ALS-NCG method requires many fewer iterations and less time (with acceleration factors of 4 and more) than standalone ALS to reach movie rankings with high accuracy on the MovieLens 20M dataset and synthetic datasets with up to nearly 1 billion ratings (http://arxiv.org/abs/1508.03110).

The second part of the talk discusses a new type of parallel line search for large-scale unconstrained minimization of smooth loss functions such as logistic regression. We present a new line search technique that computes more accurate minima by evaluating a Taylor polynomial approximation to the loss function, which also reduces the parallel communication costs, resulting in overall efficiency gains of a factor of 2 or more in parallel compared to existing approaches (http://arxiv.org/abs/1510.08345).

This is joint work with Mike Hynes.

Additional Information

Location: ESB 4133

Lunch will be provided

Hans De Sterck, Monash University

Jump to Event Details

This is a Past Event

Event Type

Scientific, Seminar

Date

July 12, 2016

Time

2:30pm - 4:30pm

Location

University of British Columbia