# Problem

We need to estimate probability density $p(x)$ of a random variable from observed values.

# Approach

We will use idea of parametric distribution estimation, which involves choosing the best parameters, of a chosen family of densities $p_\theta(x)$, indexed by a parameter $\theta$. The idea is very natural: we choose such parameters, which maximizes the probability (or, logarithm of probability) of observed values.

## Linear measurements with i.i.d. noise

Suppose, we are given the set of observations:

where

• $\theta \in \mathbb{R}^n$ - unknown vector of parameters
• $\xi_i$ are IID noise with density $p(z)$
• $x_i$ - measurements, $x \in \mathbb{R}^m$

Which implies the following optimization problem:

Where the sum goes from the fact, that all observation are independent, which leads to the fact, that $p(\xi) = \prod\limits_{i=1}^m p(\xi_i)$. The target function is called log-likelihood function $L(\theta)$.

### Gaussian noise

Which means, the maximum likelihood estimation in case of gaussian noise is a least squares solution.

### Laplacian noise

Which means, the maximum likelihood estimation in case of Laplacian noise is a $l_1$-norm solution.

### Uniform noise

Which means, the maximum likelihood estimation in case of uniform noise is any vector $\theta$, which satisfies $\vert x_i - \theta^\top a_i \vert \leq a$.

## Binary logistic regression

Suppose, we are given a set of binary random variables $y_i \in \{0,1\}$. Let us parametrize the distribution function as a sigmoid, using linear transformation of the input as an argument of a sigmoid.

Let’s assume, that first $k$ observations are ones: $y_1, \ldots, y_k =1$, $y_{k+1}, \ldots, y_m = 0$. Then, log-likelihood function will be written as follows: