Chaotic gradient descent methods give the conjugate gradient method a run for its money
- Date: 03/16/2010
University of British Columbia
The conjugate gradient (CG) algorithm is usually the method of choice for the solution of large symmetric positive definite linear systems Ax=b. If however the matrix-vector products Av required at each iteration can not be calculated accurately, the delicate mechanisms on which CG is built can be easily disturbed and cause disaster. In such cases we may consider gradient descent methods, which are more robust against such effects. The classical steepest descent (SD) method, which takes the best possible (greedy) step in terms of reducing the error at each iteration, is well-known to wiggle agonizingly slowly to the solution. Fortunately its behaviour improves dramatically (by orders of magnitude) by some tinkering with the step size. This has given rise to a zoo of fast gradient descent methods known as BB, LSD(s), HLSD(k), SDOM, SD(\omega) etc. These methods are in practice much closer to CG in performance than to SD (though nobody has been able to prove this) and can outperform CG under certain conditions. I will present numerical experiments to establish which of the methods perform best on average, then show that the fast gradient descent methods generate chaotic dynamical systems. Very little is required to generate chaos here: simply damping steepest descent by a constant factor close to 1 will do. Some insight will be given into how chaos speeds up these methods, and I will show beautiful animations of the chaotic dynamics.
12:30 - 2:00pm, WMAX 216.