Regression

More than you want to know about REGRESSION…

CORRELATION and REGRESSION are very similar with one main difference. In correlation the variables have equal status. In regression the focus is on predicting one variable from another.

  • Independent Variable = PREDICTOR Variable = X
  • Dependent Variable = CRITERION Variable = Y (Y hat) (Y is regressed on X) (Y is a function of X)
  • SIMPLE REGRESSION involves one predictor variable and one criterion variable.
  • MULTIPLE REGRESSION involves more than one predictor variable and one criterion variable.

Two Common Types of Multiple Regression

  • STEPWISE MULTIPLE REGRESSION-

 

    let computer decide the order to enter the predictors. The predictor with the largest correlation with the criterion will enter the regression formula first, then the next, etc.

  • HIERARCHICAL MULTIPLE REGRESSION–

 

    researcher selects the order the predictor variables will enter the equation

The research question for regression is: To what extent and in what manner do the predictors explain variation in the criterion?

  • to what extent– H0: R2=0
  • in what manner– H0: beta=0

EXPLAINED (REGRESSION) is the difference between the mean of Y and the predicted Y
ERROR (RESIDUAL) is the difference between the predicted Y (Y HAT or PRIME) and the observed Y

STANDARD ERROR OF ESTIMATE– square root of average residuals (distance of scores from regression line) — standard deviation of obtained score minus predicted

MULTIPLE R SQUARE– The variation in the criterion variable that can be predicted (accounted for) by the set of predictor variables.

ADJUSTED R SQUARE– Because the equation that is created with one sample will be used with a similar, although not identical population, there is some SHRINKAGE in the amount of variation that can be explained with the new population.

b weights (REGRESSION COEFFICIENT) can’t be used to compare relative importance of the predictors because the b weights are based on the measurement scale of each predictor. Can be used to compare different samples from the same population. Represents how much of an increase in the criterion variable results from one unit increase in the predictor variable. Regression coefficients and the Constant are used to write the REGRESSION EQUATION.

Beta weights (BETA COEFFICIENT — a.k.a. PARTIAL REGRESSION COEFFICIENTS) are used to judge the relative importance of predictor variables but they should not be used to compare from one sample to another because they are influenced by changes in the standard deviation. The beta is the correlation in a simple regression. Beta weights are used to write the STANDARDIZED REGRESSION EQUATIONl

CHANGE IN R SQUARE– reveals semi-partial correlations

CONSTANT– Y intercept

INDEPENDENCE– the X variables are not the same or related

HOMOSCEDASTICITY– the variation of the observed Y scores above and below the regression line is similar up and down the regression line.

MULTICOLLINEARITY– the predictor variables are highly correlated with each other. This results in unstable beta weight which cannot be trusted. Multicollinearity is tested with TOLERANCE. A high TOLERANCE represents lots of multicolinearity. TOLERANCES above .70 are good.

N:P — Ratio of observations to predictor variables. A 40:1 ratio is recommended for Stepwise and 20:1 for Hierarchical

Y(hat) = a + bX (where Y is the predicted score, a is the Y axis intercept of the regression line and b is the slope of the regression line