Alon(CLEg)R Documentation

Colon cancer data set of Alon et al. 1999

Description

Response is y=1 or y=-1 according as tissue is normal or tumor There are 2000 gene expressions.

Data y=1 y=–1 Total
train 14 26 40
test 8 14 22

Usage

data(Alon)

Format

List with 4 named elements, X, y, Xt, yt, which are respectively the training design matrix, training classes, test design matrix and test classes.

Details

Colon adenocarcinoma tissues were collected from patients and from some of these patients, paired normal colon tissue also was obtained. Gene expression in 40 tumor and 22 normal colon tissue samples was analyzed with an Affymetrix oligonucleotide array complementary to more than 6500 human genes. The data set contains the expression of the 2000 genes with highest minimal intensity across the 62 tissues. Each gene intensity has been derived from the about 20 feature pairs that correspond to the gene on the chip by using a filtering process. The data is otherwise unprocessed, i.e. no normalization has been performed yet. The training set consists of 40 colon tissues of which 14 are normal and 26 tumor samples. The test set consists of 22 tissues of which 8 are normal and 14 tumor samples. The number of gene expression levels is 2000. The goal here is to classify the tissues as being cancerous or noncancerous.

Source

Nathalie Pochet, Frank De Smet, Johan A.K. Suykens and Bart L.R. De Moor (2004). Systematic benchmarking of microarray data classification: assessing the role of nonlinearity and dimensionality reduction. Bioinformatics Advance Access published July 1, 2004. http://homes.esat.kuleuven.be/~npochet/Bioinformatics/

References

Alon,A., Barkai,N., Notterman,D.A., Gish,K., Ybarra,S., Mack,D., and Levine,A.J. (1999) Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays, Proc. Natl. Acad. Sci. USA, 96,6745-6750.

Examples

data(Alon)

[Package CLEg version 2.0 Index]