Least-squares Approximation

we can minimize ∥η∥ by differentiating ∥η∥² with respect to each c_k and then setting the result to 0:

The n equations (28.32) for k = 1, 2, . . . , n are equivalent to the single matrix equation

(Ac - y)^T A = 0

or, equivalently, to

A^T(Ac - y) = 0,

which implies

In statistics, this is called the normal equation. the solution to equation (28.33) is

where the matrix A = ((A^T A)^-1 A^T) is called the pseudoinverse of the matrix A. The pseudoinverse is a natural generalization of the notion of a matrix inverse to the case in which A is nonsquare. (Compare equation (28.34) as the approximate solution to Ac = y with the solution A^-1b as the exact solution to Ax = b.)

As an example of producing a least-squares fit, suppose that we have five data points

(x₁, y₁)	=	(-1, 2),
(x₂, y₂)	=	(1, 1),
(x₃, y₃)	=	(2, 1),
(x₄, y₄)	=	(3, 0),
(x₅, y₅)	=	(5, 3),

shown as black dots in Figure 28.3. We wish to fit these points with a quadratic polynomial

Figure 28.3: The least-squares fit of a quadratic polynomial to the set of five data points {(-1, 2), (1, 1), (2, 1), (3, 0), (5,3)}. The black dots are the data points, and the white dots are their estimated values predicted by the polynomial F(x) = 1.2 - 0.757x 0.214x², the quadratic polynomial that minimizes the sum of the squared errors. The error for each data point is shown as a shaded line.

F (x) = c₁ c₂x c₃x².

We start with the matrix of basis-function values