Toward an optimal method for convex optimization using an inexact first-order oracle

SHARE:

Event Speaker:

François Glineur (Université Catholique de Louvain)

Event Location:

32D-677 (Stata Center, LIDS seminar room)

Event Date/Time:

Wednesday, November 28, 2012 - 4:00pm

Title: Toward an optimal method for convex optimization using an inexact first-order oracle

Abstract: Standard analysis of first-order methods assumes availability of exact first-order information. Namely, the oracle must provide at each given point the exact values of the function and its gradient. However, in many convex problems, including those obtained by smoothing techniques, the objective function and its gradient are computed by solving another auxiliary optimization problem. In practice, we are often only able to solve these subproblems approximately. Hence, in that context, numerical methods solving the outer problem are provided with inexact first-order information.

We present in the first part of this talk a specific class of inexact first-order oracle, namely the $(\delta,L)$-oracle, which can be viewed as a common extension of the classical notions of epsilon-subgradient and Lipschitz-continuous gradient. We show that such an oracle is naturally available in several situations involving inexact computations, including many standard techniques where an auxiliary problem is solved approximately, such as convex-concave saddle point problems, augmented Lagrangians, and Moreau-Yosida regularization.

We then study the behavior of classical first-order methods for smooth convex optimization when such an inexact oracle is used instead of the exact gradient. In particular, we show that the convergence of the classical gradient method is mostly unchanged: it is guaranteed to converge to a solution whose accuracy is comparable to that of the oracle. In contrast, the behaviour of the fast gradient method seriously deteriorates: it suffers from error accumulation and is no longer guaranteed to converge. Moreover, the best accuracy reachable by a fast method is much larger than that of the oracle: if we want a better accuracy, we have to use a (much slower) classical gradient method.

In the second part of this talk, we propose a way to remedy this unsatisfactory situation. We introduce a new method that combines the fast gradient method during the first $\theta$ steps followed by a modified dual gradient method for the remaining steps. We show that, given an oracle accuracy $\delta$ and a target accuracy $\epsilon$ unattainable by the fast gradient method, this hybrid method requires a number of steps that is (much) smaller that the classical gradient method. We also show that the optimal switching point $\theta$ has a simple expression: it may be chosen as ${\delta}/{\epsilon}-2$, independently of the problem data.