Saturday, November 5, 2016

Use of R-squared in Regression Analysis

Use of R-squared in Regression Analysis

At the en of this Blog you will see a practical summary of how to use and what for each of the three different R-Squared typically used to assess the quality of a Regression Model.



R2 is a measure of the accuracy of the model (Regression) compared to the actual data. It is also a measure of the level of dispersion of the data.
  -> REGRESSION
  -> ERROR (ALSO CALLED RESIDUAL)
 -> TOTAL VARIATION
:Difference between forecasted value and the mean of all actual data points.

Difference between actual value and mean of all actual data points.

HOW IT WORKS



R2 - Adjusted is similar to R-Sq but it takes in account the number of terms included in the model.

Where MSE is the Mean Standard Error

Where N is the total number of observations and p is the number of terms (predictors) included in the model
N-p are the degrees of freedom of the standard error.
So R2(adj) takes into consideration number of predictors included in the model
R2(adj) will increase (improve) only when the amount of variation explained by the model increases more than MSE does, so it will only improve if including more terms in the model is worth it.


PRESS = prediction sum of squares

PRESS == Total error to fitted values of the model without ith observation worked out for all observations.
is notas  is the error related to the fitted value when ith observation is not included in the model and Yi-Y ̂_i is the residual of ith observation.
hi = Leverage. It is the proportional contribution of ith observation to the total Squared Sum of predictors (X direction).

Summary R2 -R2 adjusted -R2 Predicted

Use of R2: It indicates in general the % of variation which is explained by the regression model. It is used to assess thequality” of the regression model

Use of R2 adjusted: Similar to R2 and adding the effect of including terms in the model. It is very useful in determining how many terms or variables to include in the model. Useful to compare different models with different number of terms (complexity)

Use of R2 predicted: It is useful to detect models artificially adjusted to raw data. Useful to detect models with data points which have a high influence over the model