Illegal engine oil dumping is an important source of marine pollution as can be seen from the pie chart.

Pie chart. Causes of Marine Pollution

Pie chart. Causes of Marine Pollution

For a consulting project with Environment Canada, I was given \(n=17\) samples, labelled a, b, …, q of the spectroscopy analysis of different oil slicks caused by illegal engine oil dumping off the East Coast of Canada. The spectroscopy measurements comprised \(p=173\) normalized ion concentrations. These normalized ion measurements are like spectrographic fingerprints for the vessel that damped the used engine oil.

The basic problem given 17 oil slicks, determine how many different ships are involved. This is a clustering problem with \(n=17\) samples and \(p=173\). Notice that this is a wide data problem since \(p>>n\).

The data was provided in a csv-file with rows and columns corresponding to variables and observations. Note that this is the transpose of the usual data matrix. After inputting to R and taking the transpose, we find that 89 variables have interquartile range, \({\rm IRQ} = 0\). So these variables were discarded. The remaining data matrix is 17-by-84. Here is a display of the first six rows and columns of this data matrix,

Table 1. Six Columns and Rows of the Oil Spill Data

##    Ion1  Ion2   Ion3   Ion4  Ion5 Ion6
## a  0.00  0.00   0.00   6.60  7.06 2.42
## b  1.04  3.14  41.02 100.00 50.56 5.50
## c  2.12  0.00   1.10  14.63 10.78 2.78
## d  1.45  0.00   9.83  32.20 25.11 3.86
## e  0.00  0.00   0.00  23.74 14.63 0.00
## f 20.61 64.86 100.00  87.23 60.76 5.18

The PCs are computed using prcomp() and the scree plot is shown below in Figure 1. The scree plot suggests that the first two principal components account for most of the variation.

Figure 1. Scree Plot for Oil Spill Data

Table 2. First Four Principal Components
PC1 PC2 PC3 PC4
Standard deviation 5.0569 4.1120 3.2641 2.6299
Proportion of Variance 0.3044 0.2013 0.1268 0.0823
Cumulative Proportion 0.3044 0.5057 0.6326 0.7149

The scatterplot matrix panel show PC2 vs PC1 (2nd row from top, 1st column) suggests there are two groups plus an outlier or three groups. This is also reflected in the histogram of PC1 in the top panel. The overall conclusion is that perhaps there are two or three ships involved in these oil spills.

Figure 2. Scatterplot Matrix and Histograms for PC1 to PC4.


This scatterplot was produced using a customized version of the pairs() function in R. See Rmd file for details.

Reference

The pie chart and photo of an oil spill are from the website http://www.marinedefenders.com/index.php