# Least squares

Consider **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y=Ax}**
where **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A\in \mathbb {R} ^{m\times n}}**
is (strictly) skinny (i.e., **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle m>n}**
); that is, **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y=Ax}**
is an *overdetermined* set of linear equations.

For most **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y}**
, we cannot solve for **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x}**
^{[1]}. This makes sense: even if **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A}**
is full rank (i.e., **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle {\text{rank}}(A)=n}**
)its rank will still be smaller compared to the dimensionality of **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y}**
, i.e., **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle m}**
.

The least squares approach is about *approximately* solving **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y=Ax}**
: We define the residual **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle r=y-Ax}**
and find the least-squares (approximate) solution, **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x=x_{\text{ls}}}**
, that minimizes **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle ||r||}**
.

Taking the partial derivative of **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle ||r||{}^{2}=x^{T}A^{T}Ax-2y^{T}Ax+y^{T}y}**
w.r.t. **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x}**
and equating it to zero, we get:

**Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \nabla _{x}||r||{}^{2}=2A^{T}Ax-2A^{T}y=0\implies A^{T}Ax=A^{T}y}**
. The latter is called the *normal equations* and it has a unique solution (because **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A^{T}A}**
is square and full rank when **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A}**
is skinny and full rank), famous as the least square (approximate) solution:

**Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_{ls} = (A^T A)^{-1} A^T y := A^\dagger x}**
.

The matrix above, **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A^{\dagger }}**
, is called the pseudo-inverse of **Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A}**
and it's a generalized inverse. It is also a left inverse.

### Least-squares via QR factorization[edit]

One way to obtain the least-squares approximate solution is to use QR Factorization, because:

,

so .

In this case, the projection matrix can be obtained as:

.

### Geometric Interpretation[edit]

The vector is the *projection* of on . In this case, the residual is orthogonal to .

### Recursive Least Squares[edit]

Another way to write the least square solution is

where are the columns of . This formulation is particularly useful for *on-line* applications of least squares, i.e., when measurements keep coming and we want do update our estimation incrementally. An algorithm to do this would be^{[1]}:

- Initialize
- For :
- if is invertible, we have

As can be seen above, all we need to keep is the most recent versions of and give an estimation whenever we are asked for one with one matrix inversion and matrix-vector calculation, . In fact, the matrix inversion can be done quite efficiently via the rank one update formula because it adheres to the form .

### Multi-objective Least Squares[edit]

When we have two (generally) competing objectives , what we do is minimizing for various values of , which controls how much we care about each objective. Let be defined as:

We can write this as a minimization of a single objective . This can actually be written as a single-objective minimization where and . We can now solve this (assuming is full rank) using the least squares solution listed above. However, we need to solve for various values (possibly a logarithmic range). Of course, we must define what a good value of would be. This will depend on how much we care about each of the two objectives and . In some cases there may be a point that satisfies both with relatively little compromise from each. The clue that we are actually close to a good trade-off point is that both and vary little with . (16:30+ of Lecture 08^{[2]}). In other words, when our objectives becomes insensitive to .

Both least-norm and least-squares are special cases of general norm minimization with equality constraints.

## Optimality of Least-Squares[edit]

### BLUE property[edit]

Least-squares is the *best linear unbiased estimator* (BLUE). The BLUE property is understood better when we assume that there is some noise in our measurement, i.e., . The least squares approx. solution is nothing but multiplying with a left inverse, as seen above. Least squares involves a *specific* left inverse; the matrix , however, may have more (in fact, infinite) left inverses. In this case, all of them would yield a solution. Indeed, assume that is a left inverse of ; in the noiseless case, if , we have that .

In the case with noise (which is much more realistic), we have that . Since we don't know the noise and as far as we are concerned it can be random, we cannot expect to fully recover . However, we can aim to have a left inverse that minimizes the effect of the noise, that is, . In this case, we'd like to be 'small'. The pseudo-inverse is the *smallest* left inverse of . This property is called makes least-squares the *best linear unbiased estimator* (BLUE).

### Smallest uncertainty ellipse[edit]

Least-squares is also optimal in that it gives the smallest *uncertainty ellipsoid* ^{[3]}. This property applies when we have a *norm-bound* model of noise; that is, we assume that but otherwise know nothing about .

Consider any estimator of , i.e., , where is unbiased (i.e., ). The estimation error is . Due to norm-bound noise model, we have that

/

A good estimator should have a small *uncertainty ellipsoid* . It can be shown that^{[3]} least-squares has the smallest possible uncertainty ellipsoid -- smallest in the sense of matrix comparison; that is, if is the least-squares estimator and is its uncertainty ellipsoid, for any estimator with ellipsoid we have that

and (see Ellipsoid) .

## What's the point of Least Squares?[edit]

Least squares has numerous applications such as system identification, interpolation, extrapolation, smoothing of data ^{[1]} ... TBC

## References[edit]

- ↑
^{1.0}^{1.1}^{1.2}Stephen Boyd, EE263 Lecture Notes 6 - ↑ Stephen Boyd, EE268 Lecture notes
- ↑
^{3.0}^{3.1}Stephen Boyd, EE263, Lecture notes 16