Golub(CLEg)R Documentation

Acute leukemia data set Golub et al. 1999

Description

Response is y=1 or y=-1 according as patient has AML or ALL There are 7129 gene expressions.

Data y=1 y=–1 Total
train 11 27 38
test 14 20 34

Usage

data(Golub)

Format

List with 4 named elements, X, y, Xt, yt, which are respectively the training design matrix, training classes, test design matrix and test classes.

Details

The initial leukemia data set consisted of 38 bone marrow samples obtained from adult acute leukemia patients at the time of diagnosis, before chemotherapy. RNA prepared from bone marrow mononuclear cells was hybridized to high-density oligonucleotide microarrays, produced by Affymetrix and containing 6817 human genes. An independent collection of 34 leukemia samples contained a broader range of samples: the specimens consisted of 24 bone marrow and 10 periferal blood samples, derived from both adults and children. This collection also contained samples from different reference laboratories that used different sample preparation protocols.

The training set consists of 38 leukemia patients of which 11 suffer from acute myeloid leukemia (AML) and 27 from acute lymphoblastic leukemia (ALL). The test set consists of 34 patients of which 14 suffer from AML and 20 from ALL. The number of gene expression levels is 7129. Before performing normalization, preprocessing of this data set needs to be done by thresholding and log-transformation. Thresholding is realized by restricting gene expression levels to be larger than 20, e.g. expression levels which are smaller than 20 will be set to 20. Concerning the log-transformation, the natural logarithm of the expression levels was taken. Separating the AML samples from the ALL samples is the issue here.

Source

Nathalie Pochet, Frank De Smet, Johan A.K. Suykens and Bart L.R. De Moor (2004). Systematic benchmarking of microarray data classification: assessing the role of nonlinearity and dimensionality reduction. Bioinformatics Advance Access published July 1, 2004. http://homes.esat.kuleuven.be/~npochet/Bioinformatics/

References

Golub,T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasenbeek, M., Mesirov,J.P., Coller,H., Loh,M.L., Downing,J.R., Caligiuri,M.A., Bloomfield,C.D. and Lander,E.S. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science, 286,531-537.

Examples

data(Golub)

[Package CLEg version 2.0 Index]