Fluorescence on blood plasma for cancer diagnosis

The datasets are from fluorescence Excitation Emission Matrix (EEM) measurements on human blood plasma samples (citrate plasma). The samples are a part of a larger sample set from a multi-centre cross sectional study conducted at six Danish hospitals of patients undergoing large bowel endoscopy due to symptoms associated with CRC1. The present sample set is designed as a case control study with one case group (verified CRC) and three different control groups. The three control groups are (1) healthy subjects with no findings at endoscopy, (2) subjects with other, non-malignant findings and (3) subjects with pathologically verified adenomas2. Each of the groups, case and controls, consisted of samples from 77 individuals. The samples are matched in case control groups based on age gender and location of cancer and adenoma. Additional information is available on age, gender, smoking habits and lCD-10 cancer codes.

The three datasets are the same samples measured in different dilutions or different spectral areas:

X_UD (299 variables): The undiluted samples measured in the spectral area with excitation wavelengths from 250 to 450 with 5 nm increment, and emission wavelengths from 300 to 600 with 1 nm increment.

X_D (289 variables): The diluted samples (100 times in PBS) measured in the same spectral area as above.

X_HW (299 variables): The undiluted spectra measured in the spectral area with excitation wavelengths from 385 to 425 with 5 nm increment, and emission wavelengths from 585 to 680 with 1 nm increment.

All class data (cancer, case control, age, gender, smoking and cancer codes) are found in the data (for example, cancer status is found in X_UD.class{1,1}, and class labels in X_UD.classlookup{1,1})

Rayleigh scatter and second order fluorescence are removed from the data. and replaced with missing data and zeros using in-house software3. For the diluted samples, a background spectrum of the solute PBS, measured the same day as the sample, is subtracted from each sample in order to remove possible Raman scatter4. All samples are intensity calibrated by normalizing to the integrated area of the water Raman peak of a sealed water sample measured each day prior to the measurements. This converts the intensity scale into Raman units and allows comparison of intensity of samples measured on other fluorescence spectrometers5.

There are unequal number of samples in the files. Some samples lacked sample material to be measured in all three setups, further some spectra were discarded due to obviously erroneous measurements.

Get the data here

The data are available in MATLAB 7 format and stored as dataset objects (get freeware dataset object http://eigenvector.com/software/dataset.htm).

If you use the data please refer to:

Lawaetz, A.; Bro, R.; Kamstrup-Nielsen, M.; Christensen, I.; Jørgensen, L.; Nielsen, H. Fluorescence spectroscopy as a potential metabonomic tool for early detection of colorectal cancer. Metabolomics 8 (supplement 1): 111-121 (2012).

NOTE: There is a mismatch between the number of samples described in the paper and the dataset for download. This has been published in the erratum

A. Lawaetz, R. Bro, M. Kamstrup-Nielsen, I. Christensen, L. Jørgensen, and H. Nielsen, “Erratum to: Fluorescence spectroscopy as a potential metabonomic tool for early detection of colorectal cancer,” Metabolomics 8 (supplement 1): 122 (2012)

Reference List

1. Nielsen, H. J.; Brunner, N.; Frederiksen, C.; Lomholt, A. F.; King, D.; Jorgensen, L. N.; Olsen, J.; Rahr, H. B.; Thygesen, K.; Hoyer, U. Plasma tissue inhibitor of metalloproteinases-1 (TIMP-1): a novel biological marker in the detection of primary colorectal cancer. Protocol outlines of the Danish-Australian endoscopy study group on colorectal cancer detection. Scand. J. Gastroenterol. 2008, 43 (2), 242-248.

2. Lomholt, A. F.; Hoyer-Hansen, G.; Nielsen, H. J.; Christensen, I. J. Intact and cleaved forms of the urokinase receptor enhance discrimination of cancer from non-malignant conditions in patients presenting with symptoms related to colorectal cancer. British Journal of Cancer 2009, 101 (6), 992-997.

3. Andersen, C. M.; Bro, R. Practical aspects of PARAFAC modeling of fluorescence excitation-emission data. J. Chemometrics 2003, 17 (4), 200-215.

4. McKnight, D. M.; Boyer, E. W.; Westerhoff, P. K.; Doran, P. T.; Kulbe, T.; Andersen, D. T. Spectrofluorometric characterization of dissolved organic matter for indication of precursor organic material and aromaticity. Limnology and Oceanography 2001, 46 (1), 38-48.

5. Lawaetz, A. J.; Stedmon, C. A. Fluorescence intensity calibration using the Raman scatter peak of water. Appl. Spectrosc.2009, 63 (8), 936-940.

Published by Rasmus Bro

Chemometrics really - AI/ML if you insist, but chemo- is the magic ingredient

Leave a Reply

%d bloggers like this: