Linear least squares
1 Problem
In a least-squares, or linear regression, problem, we have measurements X \in \mathbb{R}^{m \times n} and y \in \mathbb{R}^{m} and seek a vector \theta \in \mathbb{R}^{n} such that $X $ is close to y. Closeness is defined as the sum of the squared differences:
\sum\limits_{i=1}^m (x_i^\top \theta - y_i)^2
also known as the l_2-norm squared, \|X \theta - y\|^2_2
For example, we might have a dataset of m users, each represented by n features. Each row x_i^\top of X is the features for user i, while the corresponding entry y_i of y is the measurement we want to predict from x_i^\top, such as ad spending. The prediction is given by x_i^\top \theta.
We find the optimal \theta by solving the optimization problem
\|X \theta - y\|^2_2 \to \min_{\theta \in \mathbb{R}^{n}}
Let \theta^* denote the optimal $ $. The quantity $ r=X ^* - y $ is known as the residual. If $ |r|_2 = 0 $, we have a perfect fit.
Note, that the function needn’t be linear in the argument x but only in the parameters \theta that are to be determined in the best fit.
2 Approaches
2.1 Moore–Penrose inverse
If the matrix X is relatively small, we can write down and calculate exact solution:
\theta^* = (X^\top X)^{-1} X^\top y = X^\dagger y,
where X^\dagger is called pseudo-inverse matrix. However, this approach squares the condition number of the problem, which could be an obstacle in case of ill-conditioned huge scale problem.
2.2 QR decomposition
For any matrix X \in \mathbb{R}^{m \times n} there is exists QR decomposition:
X = Q \cdot R,
where Q is an orthogonal matrix (its columns are orthogonal unit vectors meaning Q^\top Q=QQ^\top=I and R is an upper triangular matrix. It is important to notice, that since Q^{-1} = Q^\top, we have:
QR\theta = y \quad \longrightarrow \quad R \theta = Q^\top y
Now, process of finding theta consists of two steps:
- Find the QR decomposition of X.
- Solve triangular system R \theta = Q^\top y, which is triangular and, therefore, easy to solve.
2.3 Cholesky decomposition
For any positive definite matrix A \in \mathbb{R}^{n \times n} there is exists Cholesky decomposition:
X^\top X = A = L^\top \cdot L,
where L is an lower triangular matrix. We have:
L^\top L\theta = y \quad \longrightarrow \quad L^\top z_\theta = y
Now, process of finding theta consists of two steps:
- Find the Cholesky decomposition of X^\top X.
- Find the z_\theta = L\theta by solving triangular system L^\top z_\theta = y
- Find the \theta by solving triangular system L\theta = z_\theta
Note, that in this case the error stil proportional to the squared condition number.
3 Code
Open In Colab{: .btn } ## References * CVXPY documentation * Interactive example * Jupyter notebook by A. Katrutsa