# Intuition

Let’s consider illustrative example of a simple function of 2 variables:

Now, let’s introduce new variables or , where . The same function, written in the new coordinates, is

Let’s summarize what happened:

- We have a transformation of a vector space described by a coordinate transformation matrix B.
- Coordinate vectors transforms as .
- However, the partial gradient of a function w.r.t. the coordinates transforms as .
- Therefore, there seems to exist one type of mathematical objects (e.g. coordinate vectors) which transform with , and a second type of mathematical objects (e.g. the partial gradient of a function w.r.t. coordinates) which transform with .

These two types are called *contra-variant* and *co-variant*. This should at least tell us that indeed the so-called “gradient-vector” is somewhat different to a “normal vector”: it behaves inversely under coordinate transformations.

Nice thing here is that steepest descent direction on a sphere transforms as a covariant vector, since :

# Steepest descent in distribution space

Suppose, we have a probabilistic model represented by its likelihood . We want to maximize this likelihood function to find the most likely parameter with given observations. Equivalent formulation would be to minimize the loss function , which is the negative logarithm of likelihood function.