The leadpol dataset is in file formatted as follows:
#Source: leadpol.txt
# lead = lead content in tree bark
# traffic = traffic volume/day
lead traffic
227 8.3
312 8.3
362 12.1
521 12.1
640 17
539 17
728 17
945 24.3
738 24.3
759 24.3
1263 33.6
Input the data to R using read.table().
leadpol <- read.table("http://www.stats.uwo.ca/faculty/aim/2017/3859/data/leadpol.txt",
skip=3, header=TRUE)
leadpol
## lead traffic
## 1 227 8.3
## 2 312 8.3
## 3 362 12.1
## 4 521 12.1
## 5 640 17.0
## 6 539 17.0
## 7 728 17.0
## 8 945 24.3
## 9 738 24.3
## 10 759 24.3
## 11 1263 33.6
A simple scatterplot suggests a linear relationship.
with(leadpol, plot(traffic, lead))
Fit a simple linear regression and print a brief summary.
ans <- lm(lead ~ traffic, data=leadpol)
ans
##
## Call:
## lm(formula = lead ~ traffic, data = leadpol)
##
## Coefficients:
## (Intercept) traffic
## -12.84 36.18
Here is a more detailed summary. We see that \(R^2 = 91.4\)% so the regression explains 91.4% of the variation. It may be a useful model but only if the model assumptions are correct. Diagnostic checking is very important for statistical model construction. If the assumptions are empirically false, conclusions from the fitted model may be wrong.
summary(ans)
##
## Call:
## lm(formula = lead ~ traffic, data = leadpol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -128.43 -63.13 24.52 69.32 125.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.842 72.143 -0.178 0.863
## traffic 36.184 3.693 9.798 4.24e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 92.19 on 9 degrees of freedom
## Multiple R-squared: 0.9143, Adjusted R-squared: 0.9048
## F-statistic: 96.01 on 1 and 9 DF, p-value: 4.239e-06
In the case of simple linear regression, a basic diagnostic plot comparing the data and the fitted model is useful. We look for systematic departures from the fit including outliers, bias and heteroscedasticity or non-constant variance. The plot below looks reasonable and in this simple situation we may conclude that the regression model appears to be valid.
with(leadpol, plot(traffic, lead, pch=19, cex=1.5, col="blue"))
abline(reg=ans, col="magenta")