Use of R-squared in Regression Analysis
At the en of this Blog you will see a practical summary of how to use and what for each of the three different R-Squared typically used to assess the quality of a Regression Model.
R2 is a measure of the accuracy of the model
(Regression) compared to the actual data. It is also a measure of the level of
dispersion of the data.
-> TOTAL VARIATION
:Difference between forecasted value and the mean of all actual data points.
HOW IT WORKS
R2 - Adjusted is similar to R-Sq but it takes in account the number of terms included in the model.
Where MSE is the Mean Standard Error
Where N is the total number of observations and p is the number of terms (predictors) included in the model
N-p are the degrees of freedom of the standard error.
So
R2(adj)
takes into consideration number of predictors included in the model
R2(adj)
will increase (improve) only when the amount of variation explained by the
model increases more than MSE does, so it will only improve if including more
terms in the model is worth it.
PRESS
= prediction
sum of squares
PRESS
== Total error to fitted values of the model
without ith
observation worked out for all observations.
is
notas is the error related to the fitted value
when ith
observation is not included in the model and Yi-Y ̂_i is the residual of ith observation.
hi = Leverage. It is the proportional contribution of ith observation to the total Squared Sum of predictors (X direction).
Summary R2 -R2 adjusted -R2 Predicted
•Use
of R2: It indicates in general the % of
variation
which is explained
by the regression
model. It is used to assess the “quality”
of the regression
model
•Use
of R2 adjusted:
Similar to R2 and adding the effect of including terms in the model. It is very useful in determining how many terms or variables to include
in the model. Useful to
compare different
models with different
number of terms (complexity)
•Use
of R2 predicted:
It is useful to detect models artificially adjusted to raw data. Useful to detect models with data points which have a high influence over the model