This document demonstrates various linear regression techniques using the Boston and Carseats datasets. We will explore simple linear regression, multiple linear regression, interaction terms, and nonlinear terms. The tutorial will also cover plotting regression results and interpreting qualitative predictors.
2 Setup
First, we need to install and load the required packages and datasets.
We explore interaction terms and nonlinear relationships by adding interaction between lstat and age, and a squared term for lstat.
fit5 =lm(medv ~ lstat * age, data = Boston)summary(fit5)
Call:
lm(formula = medv ~ lstat * age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.806 -4.045 -1.333 2.085 27.552
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.0885359 1.4698355 24.553 < 2e-16 ***
lstat -1.3921168 0.1674555 -8.313 8.78e-16 ***
age -0.0007209 0.0198792 -0.036 0.9711
lstat:age 0.0041560 0.0018518 2.244 0.0252 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.149 on 502 degrees of freedom
Multiple R-squared: 0.5557, Adjusted R-squared: 0.5531
F-statistic: 209.3 on 3 and 502 DF, p-value: < 2.2e-16
fit6 =lm(medv ~ lstat +I(lstat^2), data = Boston)summary(fit6)
Call:
lm(formula = medv ~ lstat + I(lstat^2), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.2834 -3.8313 -0.5295 2.3095 25.4148
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.862007 0.872084 49.15 <2e-16 ***
lstat -2.332821 0.123803 -18.84 <2e-16 ***
I(lstat^2) 0.043547 0.003745 11.63 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.524 on 503 degrees of freedom
Multiple R-squared: 0.6407, Adjusted R-squared: 0.6393
F-statistic: 448.5 on 2 and 503 DF, p-value: < 2.2e-16
plot(medv ~ lstat, pch =20, col ="forestgreen")points(lstat, fitted(fit6), col ="firebrick", pch =20)fit7 =lm(medv ~poly(lstat, 4), data = Boston)points(lstat, fitted(fit7), col ="steelblue", pch =20)
9 Qualitative Predictors and Interaction Terms
For qualitative predictors, we use the Carseats dataset and explore how different factors affect sales.
## Qualitative Predictors and Interaction Terms# Load the Carseats datasetdata(Carseats)attach(Carseats)# Proceed with the rest of the analysisnames(Carseats)
Sales CompPrice Income Advertising
Min. : 0.000 Min. : 77 Min. : 21.00 Min. : 0.000
1st Qu.: 5.390 1st Qu.:115 1st Qu.: 42.75 1st Qu.: 0.000
Median : 7.490 Median :125 Median : 69.00 Median : 5.000
Mean : 7.496 Mean :125 Mean : 68.66 Mean : 6.635
3rd Qu.: 9.320 3rd Qu.:135 3rd Qu.: 91.00 3rd Qu.:12.000
Max. :16.270 Max. :175 Max. :120.00 Max. :29.000
Population Price ShelveLoc Age Education
Min. : 10.0 Min. : 24.0 Bad : 96 Min. :25.00 Min. :10.0
1st Qu.:139.0 1st Qu.:100.0 Good : 85 1st Qu.:39.75 1st Qu.:12.0
Median :272.0 Median :117.0 Medium:219 Median :54.50 Median :14.0
Mean :264.8 Mean :115.8 Mean :53.32 Mean :13.9
3rd Qu.:398.5 3rd Qu.:131.0 3rd Qu.:66.00 3rd Qu.:16.0
Max. :509.0 Max. :191.0 Max. :80.00 Max. :18.0
Urban US
No :118 No :142
Yes:282 Yes:258
Finally, we create a custom function to simplify plotting regression models.
regplot =function(x, y, ...){ fit =lm(y ~ x)plot(x, y, ...)abline(fit, col ="firebrick")}regplot(Price, Sales, xlab ="Price", ylab ="Sales", col ="steelblue", pch =20)
11 Conclusion
In this lab, we have covered the basics of linear regression, including simple and multiple regression models, interaction terms, and nonlinear relationships. We also explored working with qualitative predictors and developed a custom plotting function for regression analysis.