Regression with Excel / Multiple Regression with Excel |
A guide to regression analysis using MS Excel. In this section we will describe how to use the Excel™ to carry out some of the procedures we have described in the book. While this approach uses Excel 2003, I don't think that the regression procedure has changed between different versions. To do regression in Excel, you need the Analysis Toolpak add-in to be installed in Excel. This was an option when you installed Excel, but you might not have selected it. If you didn't install it, Excel will ask you for the CD, when you try to add the toolpak.Check that the add-in is installed, and added-in, by choosing Add-ins from the tools menu (as shown below). Then ensure that "Analysis ToolPak" is selected, as shown below. |
Regression Statistics
| Multiple R | 0.807582 |
| R Square | 0.652189 |
| Adjusted R Square | 0.623205 |
| Standard Error | 6.686129 |
| Observations | 40 |
| The first part of the output is the regression statistics. These are standard statistics which are given by most programs. |
ANOVA
| ANOVA | |||||
| df | SS | MS | F | Significance F | |
| Regression | 3 | 3017.745 | 1005.915 | 22.50151 | 2.22E-08 |
| Residual | 36 | 1609.355 | 44.70432 | ||
| Total | 39 | 4627.1 |
| The ANOVA table comes next. This gives a test of significance of the R2. Note that Excel uses scientific notation, by default, so when it says 2.22E-08 it means, 2.22 * 10-8 . (i.e. 0.0000000222). |
Coefficients
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
|
Intercept
|
12.39
|
3.90
|
3.18
|
0.00
|
4.48
|
20.29
|
4.48
|
20.29
|
|
HASSLES
|
-0.06
|
0.34
|
-0.19
|
0.85
|
-0.75
|
0.62
|
-0.75
|
0.62
|
|
HASSLES2
|
0.00
|
0.01
|
0.06
|
0.95
|
-0.02
|
0.02
|
-0.02
|
0.02
|
|
HASSLES3
|
0.00
|
0.00
|
0.51
|
0.61
|
0.00
|
0.00
|
0.00
|
0.00
|
| The next stage is the coefficients. Note that here I have converted the numbers to 2 decimal places to save space). It gives the coefficient for each parameter, including the intercept (the constant). The standard errors, and the t-values follow (the t-value is the coefficient divided by the standard error). Next comes the p-value associated with the variable, and the confidence intervals of the parameter estimates (Excel gave these to me twice, even though I didn't ask for them.) |
Residuals
|
Observation
|
Predicted ANX
|
Residuals
|
Standard Residuals
|
|
1.00
|
12.08
|
-2.08
|
-0.32
|
|
2.00
|
11.82
|
0.18
|
0.03
|
|
3.00
|
15.98
|
5.02
|
0.78
|
|
4.00
|
29.86
|
-13.86
|
-2.16
|
|
5.00
|
28.56
|
-1.56
|
-0.24
|
|
6.00
|
34.16
|
-4.16
|
-0.65
|
|
7.00
|
12.32
|
-3.32
|
-0.52
|
|
8.00
|
12.37
|
-5.37
|
-0.84
|
|
9.00
|
27.31
|
4.69
|
0.73
|
|
10.00
|
15.64
|
-4.64
|
-0.72
|
| The final part of the output is the residual information. The observation in the left had column is the case number - although Excel never told us about this, it has labelled the first person Observation 1, the second Observation 2, etc. (Note that this is NOT the original row number - Observation 1 was row 2). The predicted anxiety score is the score that was predicted from the regression equation. The residual is the raw residual - that is the difference between the predicted score and the actual score on the dependent variable. The final value is the standardised residual (the residuals adjusted to ensure that they have a standard deviation of 1; they have a mean of zero already). |
| Graphs Finally we will have a quick look at the graphs. The first graph is an example of the residual plots - it has hassles on the x-axis and the unstandardised residual on the y-axis. The second graphs shows the predicted and actual anxiety scores plotted against hassles3. |
| A Note to End On I have written this file which shows you how to do regression in Excel, but this does not mean that I think that you should be doing regression in Excel. Regression in Excel has a number of shortcomings, which include:
|