We need to estimate probability density of a random variable from observed values.
We will use idea of parametric distribution estimation, which involves choosing the best parameters, of a chosen family of densities , indexed by a parameter . The idea is very natural: we choose such parameters, which maximizes the probability (or, logarithm of probability) of observed values.
Suppose, we are given the set of observations:
- - unknown vector of parameters
- are IID noise with density
- - measurements,
Which implies the following optimization problem:
Where the sum goes from the fact, that all observation are independent, which leads to the fact, that . The target function is called log-likelihood function .
Which means, the maximum likelihood estimation in case of gaussian noise is a least squares solution.
Which means, the maximum likelihood estimation in case of Laplacian noise is a -norm solution.
Which means, the maximum likelihood estimation in case of uniform noise is any vector , which satisfies .
Suppose, we are given a set of binary random variables . Let us parametrize the distribution function as a sigmoid, using linear transformation of the input as an argument of a sigmoid.
Let’s assume, that first observations are ones: , . Then, log-likelihood function will be written as follows: