SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Diffe

Organizer:Chris Junchi Li
Time: Fri 13:30-15:05, 2019-9-6
Venue:清华大学近春园西楼第一会议室

Abstract

Title: SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
In this talk, I will talk about a new technique named Stochastic Path-Integrated Differential EstimatoR (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SFO+, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. In special, we prove that the SPIDER-SFO and SPIDER-SFO+ algorithms achieve a record-breaking gradient computation cost of $O( \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3} ) )$ for finding an \epsilon-approximate first-order and $O( \min( n^{1/2} \epsilon^{-2}+\epsilon^{-2.5}, \epsilon^{-3} ) )$ for finding an $(\epsilon, O(\epsilon^{0.5}))$-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting. For stochastic zeroth-order method, we prove a cost of $O( d \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3}) )$ which outperforms all existing results. Joint work with Cong Fang, Zhouchen Lin and Tong Zhang.​

Description

In this talk, I will talk about a new technique named Stochastic Path-Integrated Differential EstimatoR (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SFO+, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. In special, we prove that the SPIDER-SFO and SPIDER-SFO+ algorithms achieve a record-breaking gradient computation cost of $O( \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3} ) )$ for finding an \epsilon-approximate first-order and $O( \min( n^{1/2} \epsilon^{-2}+\epsilon^{-2.5}, \epsilon^{-3} ) )$ for finding an $(\epsilon, O(\epsilon^{0.5}))$-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting. For stochastic zeroth-order method, we prove a cost of $O( d \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3}) )$ which outperforms all existing results. Joint work with Cong Fang, Zhouchen Lin and Tong Zhang.