The leadpol dataset is in file formatted as follows:

```
#Source: leadpol.txt
# lead = lead content in tree bark
# traffic = traffic volume/day
lead traffic
227 8.3
312 8.3
362 12.1
521 12.1
640 17
539 17
728 17
945 24.3
738 24.3
759 24.3
1263 33.6
```

Input the data to R using **read.table()**.

```
leadpol <- read.table("http://www.stats.uwo.ca/faculty/aim/2017/3859/data/leadpol.txt",
skip=3, header=TRUE)
leadpol
```

```
## lead traffic
## 1 227 8.3
## 2 312 8.3
## 3 362 12.1
## 4 521 12.1
## 5 640 17.0
## 6 539 17.0
## 7 728 17.0
## 8 945 24.3
## 9 738 24.3
## 10 759 24.3
## 11 1263 33.6
```

A simple scatterplot suggests a linear relationship.

`with(leadpol, plot(traffic, lead))`

Fit a *simple linear regression* and print a brief summary.

```
ans <- lm(lead ~ traffic, data=leadpol)
ans
```

```
##
## Call:
## lm(formula = lead ~ traffic, data = leadpol)
##
## Coefficients:
## (Intercept) traffic
## -12.84 36.18
```

Here is a more detailed summary. We see that \(R^2 = 91.4\)% so the regression explains 91.4% of the variation. It may be a useful model *but only if the model assumptions are correct*. Diagnostic checking is very important for statistical model construction. If the assumptions are empirically false, conclusions from the fitted model *may* be wrong.

`summary(ans)`

```
##
## Call:
## lm(formula = lead ~ traffic, data = leadpol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -128.43 -63.13 24.52 69.32 125.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.842 72.143 -0.178 0.863
## traffic 36.184 3.693 9.798 4.24e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 92.19 on 9 degrees of freedom
## Multiple R-squared: 0.9143, Adjusted R-squared: 0.9048
## F-statistic: 96.01 on 1 and 9 DF, p-value: 4.239e-06
```

In the case of simple linear regression, a basic diagnostic plot comparing the data and the fitted model is useful. We look for systematic departures from the fit including outliers, bias and heteroscedasticity or non-constant variance. The plot below looks reasonable and in this simple situation we may conclude that the regression model appears to be valid.

```
with(leadpol, plot(traffic, lead, pch=19, cex=1.5, col="blue"))
abline(reg=ans, col="magenta")
```