# Automatic differentiation

## Automatic differentiation

Calculate the gradient of a Taylor series of a \cos (x) using

`autograd`

library:`import autograd.numpy as np # Thinly-wrapped version of Numpy from autograd import grad def taylor_cosine(x): # Taylor approximation to cosine function # Your np code here return ans`

In the following code for the gradient descent for linear regression change the manual gradient computation to the PyTorch/jax autograd way. Compare those two approaches in time.

In order to do this, set the tolerance rate for the function value \varepsilon = 10^{-9}. Compare the total time required to achieve the specified value of the function for analytical and automatic differentiation. Perform measurements for different values of n from

`np.logspace(1,4)`

.For each n value carry out at least 3 runs.

`import numpy as np # Compute every step manually # Linear regression # f = w * x # here : f = 2 * x = np.array([1, 2, 3, 4], dtype=np.float32) X = np.array([2, 4, 6, 8], dtype=np.float32) Y = 0.0 w # model output def forward(x): return w * x # loss = MSE def loss(y, y_pred): return ((y_pred - y)**2).mean() # J = MSE = 1/N * (w*x - y)**2 # dJ/dw = 1/N * 2x(w*x - y) def gradient(x, y, y_pred): return np.dot(2*x, y_pred - y).mean() print(f'Prediction before training: f(5) = {forward(5):.3f}') # Training = 0.01 learning_rate = 20 n_iters for epoch in range(n_iters): # predict = forward pass = forward(X) y_pred # loss = loss(Y, y_pred) l # calculate gradients = gradient(X, Y, y_pred) dw # update weights -= learning_rate * dw w if epoch % 2 == 0: print(f'epoch {epoch+1}: w = {w:.3f}, loss = {l:.8f}') print(f'Prediction after training: f(5) = {forward(5):.3f}')`

Calculate the 4th derivative of hyperbolic tangent function using

`Jax`

autograd.Compare analytic and autograd (with any framework) approach for the hessian of:

f(x) = \dfrac{1}{2}x^TAx + b^Tx + c

Compare analytic and autograd (with any framework) approach for the gradient of:

f(X) = tr(AXB)

Compare analytic and autograd (with any framework) approach for the gradient and hessian of:

f(x) = \dfrac{1}{2} \|Ax - b\|^2_2

Compare analytic and autograd (with any framework) approach for the gradient and hessian of:

f(x) = \ln \left( 1 + \exp\langle a,x\rangle\right)

You will work with the following function for this exercise,

f(x,y)=e^{âˆ’\left(sin(x)âˆ’cos(y)\right)^2}

Draw the computational graph for the function. Note, that it should contain only primitive operations - you need to do it automatically - jax example, PyTorch example - you can google/find your own way to visualise it.

Compare analytic and autograd (with any framework) approach for the gradient of:

f(X) = - \log \det X

Suppose, we have the following function f(x) = \frac{1}{2}\|x\|^2, select a random point x_0 \in \mathbb{B}^{1000} = \{0 \leq x_i \leq 1 \mid \forall i\}. Consider 10 steps of the gradient descent starting from the point x_0:

x_{k+1} = x_k - \alpha_k \nabla f(x_k)

Your goal in this problem is to write the function, that takes 10 scalar values \alpha_i and return the result of the gradient descent on function L = f(x_{10}). And optimize this function using gradient descent on \alpha \in \mathbb{R}^{10}. Suppose, \alpha_0 = \mathbb{1}^{10}.

\alpha_{k+1} = \alpha_k - \beta \frac{\partial L}{\partial \alpha}

\frac{\partial L}{\partial \alpha} should be computed at each step using automatic differentiation. Choose any \beta and the number of steps your need. Describe obtained results.

Compare analytic and autograd (with any framework) approach for the gradient and hessian of:

f(x) = x^\top x x^\top x