Now we have only first order information from the oracle.
Due to theire computational efficiency, stochastic first order algorithms remains cornerstone of modern huge scale deep learning.
- Visualization of optimization algorithms.
Table of contents
- Gradient descent
- Subgradient descent
- Projected subgradient descent
- Mirror descent
- Stochastic gradient descent
- Stochastic average gradient
- ADAM: A Method for Stochastic Optimization
- Lookahead Optimizer: $k$ steps forward, $1$ step back
- Shampoo: Preconditioned Stochastic Tensor Optimization