Illegal engine oil dumping is an important source of marine pollution as can be seen from the pie chart.
Pie chart. Causes of Marine Pollution
For a consulting project with Environment Canada, I was given \(n=17\) samples, labelled a, b, …, q of the spectroscopy analysis of different oil slicks caused by illegal engine oil dumping off the East Coast of Canada. The spectroscopy measurements comprised \(p=173\) normalized ion concentrations. These normalized ion measurements are like spectrographic fingerprints for the vessel that damped the used engine oil.
The basic problem given 17 oil slicks, determine how many different ships are involved. This is a clustering problem with \(n=17\) samples and \(p=173\). Notice that this is a wide data problem since \(p>>n\).
The data was provided in a csv-file with rows and columns corresponding to variables and observations. Note that this is the transpose of the usual data matrix. After inputting to R and taking the transpose, we find that 89 variables have interquartile range, \({\rm IRQ} = 0\). So these variables were discarded. The remaining data matrix is 17-by-84. Here is a display of the first six rows and columns of this data matrix,
## Ion1 Ion2 Ion3 Ion4 Ion5 Ion6
## a 0.00 0.00 0.00 6.60 7.06 2.42
## b 1.04 3.14 41.02 100.00 50.56 5.50
## c 2.12 0.00 1.10 14.63 10.78 2.78
## d 1.45 0.00 9.83 32.20 25.11 3.86
## e 0.00 0.00 0.00 23.74 14.63 0.00
## f 20.61 64.86 100.00 87.23 60.76 5.18
The PCs are computed using prcomp() and the scree plot is shown below in Figure 1. The scree plot suggests that the first two principal components account for most of the variation.
| PC1 | PC2 | PC3 | PC4 | |
|---|---|---|---|---|
| Standard deviation | 5.0569 | 4.1120 | 3.2641 | 2.6299 |
| Proportion of Variance | 0.3044 | 0.2013 | 0.1268 | 0.0823 |
| Cumulative Proportion | 0.3044 | 0.5057 | 0.6326 | 0.7149 |
The scatterplot matrix panel show PC2 vs PC1 (2nd row from top, 1st column) suggests there are two groups plus an outlier or three groups. This is also reflected in the histogram of PC1 in the top panel. The overall conclusion is that perhaps there are two or three ships involved in these oil spills.
This scatterplot was produced using a customized version of the pairs() function in R. See Rmd file for details.
The pie chart and photo of an oil spill are from the website http://www.marinedefenders.com/index.php