Chemometrics Research

Monday webinar: Causality in the latent space: the nice property of PLS for process optimization in digitalized Industry 4.0

Alberto Ferrer
Multivariate Statistical Engineering Group (GIEM)
Dpt of Applied Statistics, Operations Research and Quality
Universitat Politècnica de València

In this webinar Feb 17, 15 pm CET, Alberto Ferrer will address the potential of Latent Variables-based Multivariate Statistical Models such as Partial Least Squares Regression (PLS) for facing some challenges in Industry 4.0 by exploiting its property of being able to model “causality” in the latent space even in the case of using historical data, typical of these highly digitalized environments.

See details at https://www.linkedin.com/events/causalityinlatentspace-plsindig7295102725204172801/comments/

The webinar is now available at https://www.youtube.com/watch?v=SyW4fySXWvQ and if you want the slides, you can find them here.

Watch “A Data-Driven Framework for Metabolomics Quality Control”

For any mass spectrometry based analytical assay it is considered best practice to include mechanisms for assessing the quality of acquired analyte concentrations. This is particularly important for untargeted metabolomics, where many hundreds of metabolites may be (relatively) quantified in parallel, with metabolite identification performed post hoc, making it is impossible to calibrate each metabolite to an internal standard gradient, and equally, making it impossible to ensure optimal peak-shape for all detected features. This leaves the acquired data open to unwanted within- and between-batch variation throughout a given experiment. For over a decade, the use of repeat-injection pooled quality control samples has proven to be a popular means of monitoring the precision of acquired data. In this webinar I will briefly outline current best practices and demonstrate a new software package (qcmxp.org) for assessing and potentially improving the precision of untargeted metabolomics data collect using these protocols.

See the webinar here: https://www.youtube.com/watch?v=B6iGZgnLZE8

Biography:

Dr. David Broadhurst is Professor of Metabolomic Epidemiology & Biosystems Data Science at Edith Cowan University, Perth, Western Australia. He has been an active member of the metabolomics community for over 25 years. In 2022 he was made a lifetime honorary fellow of the Metabolomics Society for his work promoting best practice in design of experiments, biostatistics and machine learning.

A Data-Driven Framework for Metabolomics Quality Control

Please join the webinar “A Data-Driven Framework for Metabolomics Quality Control” on Oct 21 at 15 CET. We will make the recording available later at our Youtube channel.

How to know how well your clustering is working

Join the webinar by Roberto Todeschini Monday Sept 30, 2024 at 15 CET. See more here: https://www.linkedin.com/events/mondaywebinar-clustervalidityin7241719089453252608/comments/

Untargeted GC-MS

We have been working with untargeted GC-MS for decades here so it is about time that we now finally put some data out there for you to play with. The first data set comes from a bachelor project on apple wine fermentation. Have fun!

https://ucphchemometrics.com/applewine/

Honey fluorescence

Have a look at https://ucphchemometrics.com/honey/ for a dataset of approximately 100 EEMs of honey from different varieties.

Function for scatter interpolation in fluorescence data

We keep finding that we left out some data sets and functions when we transferred to our new home page here. So keep sending us notifications if you miss something.

We now put the function for scatter interpolation for EEM data online again at https://ucphchemometrics.com/eemscat.

An interface for control charts

We have made a small interactive program for learning about Control Charts (Shewhart, CUMSUM). The program is an educational tool, that has been made freely available.

The app runs in MATLAB either locally hosted or through MATLAB online. A small user guide is available . Note, the program can be made available as a standalone application using the MATLAB compiler to produce an *.exe file.

See more here.

Processing GC-MS made easy

Are you interested in learning a simple and free tool for turning untargeted GC-MS data into peak tables. And do so with less time, less user-dependence and with more analytes recovered. Then you may want to learn how to use PARADISe, a stand-alone Windows program for just that. We are running a two-day course on how to use the software:

PARADISe : a user friendly software for untargeted analysis of Gas Chromatography Mass Spectrometry (GC MS) data

Traditionally, GC-MS data analysis follows a targeted approach that involves several time consuming steps (integration and quantification), usually carried out sample by sample, being also subject to inter-user variability. Furthermore, interesting compounds are often left undiscovered due either to practical reasons or analytical limitations (limit of detection and quantification). PARADISe is a user-friendly tool for GC-MS deconvolution and identification It allows to perform untargeted analysis, meaning that all compounds present in the samples are considered, while overcoming the problems mentioned above.

Audience
The course is intended for GC-MS users at any level of expertise in any scientific
field, working in both academic and industrial environment Basic knowledge of chemometrics or statistics is advisable, but not mandatory.

This course will provide the participants a complete overview of the software, from theory to practice Participants are encouraged to bring and work with their own data, otherwise we will provide them with a dataset.

Teachers: Professor Rasmus Bro and Postdoc Beatriz Quintanilla Casas
Place: Online Microsoft Teams
Participation cost is 100 Euro
Registration: here!

Monday, November 13th 9-12 Theoretical background. The data science behind the tool
Monday, November 13th 12.30-15 Getting started with PARADISe
Tuesday, November 21 st 9-15 Discussion of your experience so far, troubleshooting, good practices and challenges

Wine samples analyzed by GC-MS and FT-IR instruments

Wine Samples

Red wines, 44 samples, produced from the same grape (100% Cabernet Sauvignon), harvested in different geographical areas, have been collected from local supermarkets in the area of Copenhagen, Denmark. Details on the geographical origins and number of wine samples analysed are given in Table 1.

Table 1. Geographical origin of the analysed red wines

Origin	Wine samples
Argentina	6
Chile	15
Australia	12
South Africa	11
Total	44

The wine samples have been analyzed using head space GC-MS and FT-IR analytical instruments. The FT-IR was a commercial WineScan instrument provided by FOSS Analytical A/S.

GC-MS data

For each sample a mass spectrum scan (m/z: 5-204) measured at 2700 elution time-points was obtained providing a data cube of size 44×2700×200. In Figure 1 an example of a chromatogram for one red wine sample is shown.

In the figure the abundance at each scan is found by summing the contribution of all intensities of mass channels investigated (m/z: 5-204).

FT-IR data

For all wine samples 14 quality parameters were predicted from the IR spectra (Figure 2) using the FOSS WineScan build-in calibration models (Table 2).

Table 2. Quality parameters measured on the WineScan instrument and used in MVP (units shown in brackets)

#	Quality parameter
1	Ethanol (vol. %)
2	Total acid (g/L)
3	Volatile acid (g/L)
4	Malic acid (g/L)
5	pH
6	Lactic acid (g/L)
7	Rest Sugar (Glucose + Fructose) (g/L)
8	Citric acid (mg/L)
9	CO₂ (g/L)
10	Density (g/mL)
11	Total polyphenol index
12	Glycerol (g/L)
13	Methanol (vol. %)
14	Tartaric acid (g/L)

Get the data

The data are available in zipped MATLAB 6.x format. Download the data and write load Wine_v6 in MATLAB.

DOWNLOAD DATA

If you use the data we would appreciate that you report the results to us as a courtesy of the work involved in producing and preparing the data. Also you may want to refer to the data by referring to:

T. Skov, D. Balabio, R. Bro (2008). Multiblock Variance Partitioning. A new approach for comparing variation in multiple data blocks. Analytica Chimica Acta, 615 (1): 18-29

Zip-file information

Variable	Description	Dimensions
Aroma_compounds	Peak areas of aroma compounds	44×57
Class	Classes of wines (see Table 1)	44×1
Data_GC	Three-way data	44×2700×200
Elution_profiles	Summed mass dimension – see Figure 1	44×2700
IR_spectra	IR spectra without waterband	44×842
IR_spectra_with_waterband	IR spectra with waterband – see Figure 2	44×1056
Label_Aroma_comp	Label aroma compounds	1×57
Label_Elution_time	Label elution time in minutes	1×2700
Label_Mass_channels	Label m/z	1×200
Label_Pred_values_IR	Label quality parameters	1×14
Label_Wine_samples	Label wine samples ARG: Argentina AUS: Australia CHI: Chile SOU: South Africa	44×1
Mass_profiles	Summed elution time dimension	44×200
Pred_values_IR	Quality parameters (see Table 2)	44×14
axis_spectra_wavenumber	Axis for spectra in cm^-1	1×842
axis_spectra_with_waterband_wavenumber	Axis for spectra with waterband in cm^-1	1×1056