Summary

The idea of maintaining second order statistics from accumulated stochastic gradients is the cornerstone of the stochastic first order optimization. Conceptually, guys threats parameter of each layer as a matrix and compute left and right preconditioner instead of one matrix preconditioner to the vectorized parameters, which allows to reduce the number of computations and the amount of memory, required to store.

Pros

Cons