Statistics In Six Sigma

Summary Statistics: In addition to correlation matrices and residual plots, several numbers called “summary statistics” provide often critical information about the adequacy of the model form in question. This section describes four summary statistics: R₂adjusted, PRESS, R₂ Prediction, and σ_est. Probably the most widely used summary statistic is the “R₂ adjusted” that is

also written “adjusted R-squared” or R₂ adj. This quantity is also sometimes called the “adjusted coefficient of multiple determinations”. To calculate the adjusted Rsquared, it is convenient to use an n × n matrix, Q, with every entry equaling 1.0. This permits calculation of the “sum of squares total” (SST) using

where k is the number of terms in the fitted model and SSE* is the sum of squares error .It is common to interpret R₂ adj as the “fraction of the variation in the response data explained by the model”.

Example( R₂ Adjusted Calculations) Calculate and interpret R₂ adjusted?

Answer: The following derive from previous results and definitions:

Therefore, with n = 5 data points, SST = 13720 and R2 adjusted = 0.662 so that roughly 66% of the observed variation is explained by the first order model in x₁.

The phrase “cross-validation” refers to efforts to evaluate prediction errors by using some of the data points only for this purpose, i.e., a set of data points only for testing.