Chemometrics Research

Featured

Processing GC-MS made easy

Are you interested in learning a simple and free tool for turning untargeted GC-MS data into peak tables. And do so with less time, less user-dependence and with more analytes recovered. Then you may want to learn how to use PARADISe, a stand-alone Windows program for just that. We are running a two-day course on how to use the software:

PARADISe : a user friendly software for untargeted analysis of Gas Chromatography Mass Spectrometry (GC MS) data

Traditionally, GC-MS data analysis follows a targeted approach that involves several time consuming steps (integration and quantification), usually carried out sample by sample, being also subject to inter-user variability. Furthermore, interesting compounds are often left undiscovered due either to practical reasons or analytical limitations (limit of detection and quantification). PARADISe is a user-friendly tool for GC-MS deconvolution and identification It allows to perform untargeted analysis, meaning that all compounds present in the samples are considered, while overcoming the problems mentioned above.

Audience
The course is intended for GC-MS users at any level of expertise in any scientific
field, working in both academic and industrial environment Basic knowledge of chemometrics or statistics is advisable, but not mandatory.

This course will provide the participants a complete overview of the software, from theory to practice Participants are encouraged to bring and work with their own data, otherwise we will provide them with a dataset.

Teachers: Professor Rasmus Bro and Postdoc Beatriz Quintanilla Casas
Place: Online Microsoft Teams
Participation cost is 100 Euro
Registration: here!

Monday, November 13th 9-12 Theoretical background. The data science behind the tool
Monday, November 13th 12.30-15 Getting started with PARADISe
Tuesday, November 21 st 9-15 Discussion of your experience so far, troubleshooting, good practices and challenges

The N-way toolbox new version

Happy to announce that we have released a new version of our N-way toolbox. What is new? Nothing really! Same old, good old. You can do PARAFAC, multi-way PLS, PARAFAC2, Tucker and many other things. It is still state-of-the-art even though we released the first version back in the nineties. Check it out. Check out the course you will find on that page as well.

Monday Webinar: All sparse models are wrong, but some are useful

You can now find the webinar at our Youtube channel

~~Please join Dec 22, 15.00 (CET) when Pepe will talk about sparse models and all the pitfalls they have. See more here: https://www.linkedin.com/events/7406963924174020608/~~

Webinar: Probabilistic Tensor Decomposition

Join us Monday Sept 29 at 15 CET for a webinar with Jesper Løve Hinrich

The vast majority of tensor decomposition methods are based on least squares estimation – or equivalent maximum likelihood under a Gaussian distribution. In this presentation, I will introduce and motivate probabilistic tensor decomposition – based on Bayesian inference – and show the differences and similarities between the non-probabilistic and the probabilistic. Importantly, the probabilistic tensor decomposition methods are more robust to noise and model misspecification, provides new strategies for determining the right number of components, and in-build tools for characterizing model uncertainty. Some of these tools are already in the freely available probabilistic tensor toolbox https://github.com/JesperLH/prob-tensor-toolbox .

Enroll here

PhD thesis

August 29, 2025, Oksana Mykhalevych (Palamarchuk) defended her thesis “Data, Models, and Meaning: Structure–Function Insight in Carrageenan via PAT”. It was a very convincing and joyful event and if you are interested in the thesis just go here (search “oksana” to find it easily)

Webinar: ChemTastesDB – a curated database for the prediction of molecular taste

~~(Join us Monday July 7 at 15 CET for a webinar Davide Ballabio and Christian Rojas on their interesting work on taste prediction.~~

The webinar can be found at https://www.youtube.com/watch?v=ZMfVM69hxm4

Computational models that predict the taste of molecular tastants based on their chemical structure and machine learning classifiers serve as powerful tools in the advancing field of foodinformatics. We will describe the development of ChemTastesDB, a database that includes curated information of 4075 molecular tastants. ChemTastesDB is distributed to the scientific community to expand the information of molecular tastants; it could assist the analysis of the relationships between molecular structure and taste, as well as in silico (QSAR/QSPR) studies for taste prediction.

New industrial short courses

Join our two courses on Design of Experiments and on basic chemometrics and machine learning.

Analyze EEM data with PARAFAC

There are many nice tools for analyzing fluorescence data. Many use the good old Nway toolbox or PLS_Toolbox/Solo. PLS_Toolbox has a very nice user-interface for handling Rayleigh, Raman and other EEM-specific artefacts.

There are other more specialized tools. DOMFluor was a thing we wrote years back, but nowadays we recommend drEEM. There are tutorials on the drEEM website at https://dreem.openfluor.org/

Semi-Automated Machine Learning for Calibration Model Development

April 14 at 15 CET, Manuel A. Palacios and Barry M. Wise from Eigenvector Research gave a webinar about the very intriguing Diviner. You can see the webinar here.

The story: AutoML, or Automated Machine Learning, represents a paradigm where the entire pipeline of data preprocessing, variable selection, feature engineering, model selection, and hyperparameter tuning is automated, typically leading to a singular optimized model. While highly efficient and simple from the end-user perspective, this approach often lacks transparency and customizability. Furthermore, the black-box nature of AutoML might overlook models that an analyst would have preferred, instead prioritizing what it deems the ‘best’ model, potentially missing nuanced solutions tailored to specific needs.

In this talk we introduce Diviner, (from Divine—to discover or locate something by intuition, insight or supernatural means), a novel semi-autoML approach that transcends the traditional black-box model by actively involving the analyst in the model-building process. Unlike conventional autoML methods that yield a single optimal model, our approach creates a family of models and ranks them by their cross-validation performance, degree of overfitting, and prediction error on validation sets.

Diviner commences with user-assisted outlier assessment via Principal Components Analysis (PCA) and Robust Partial Least Squares (RPLS), followed by variable selection to hone significant predictors. Various data-relevant preprocessing algorithms are explored, and linear models (Partial Least Squares and Elastic Net) are calibrated (with hyperparameter tuning) and ranked. At this point, the end-user selects the models to keep and take for further refinement, including re-assessment of outliers, preprocessing and variable selection fine-tuning. Finally, the user chooses the models to output. This can be a single model, a series of the top-ranked models, or an ensemble of models which can be further optimized.

Diviner bridges the gap between full automation and user-led customization, leading to more insightful and transparent model-building. Through the evaluation of several datasets, we demonstrate promising results in predictive accuracy and model diversity. This research not only contributes to the practical application of autoML but also sparks a dialogue on rethinking how autoML models are constructed, evaluated, and understood, moving away from the black-box solution to a more collaborative, interpretable, and adaptive methodology.

Monday Webinar: Causal latent space-based models in the Quality by Design paradigm

Monday March 24, 15.00 (CET) Joan Borràs from Kensight talked about causal models and QbD:

Latent variable-based models, such as Partial Least Squares (PLS), are essential tools in the Quality by Design (QbD) paradigm, enabling the analysis of highly correlated datasets, typical of Industry 4.0, while preserving causal relationships in the reduced latent space. Aligned with the QbD initiative, this webinar presents a latent variable-based approach to define the raw material design space (i.e., raw material specifications) that ensures quality assurance with a given confidence level for Critical Quality Attributes (CQAs). Additionally, an effective process control system attenuating most raw material variations is implemented by manipulating process variables. A novel latent space-based multivariate capability index is also introduced to rank raw material suppliers, establishing a direct link between raw material properties and CQAs. This index facilitates supplier selection before manufacturing a single unit of the product, enhancing decision-making in quality management.

You can find the webinar here and also the slides

Monday webinar: Three-way data reduction based on essential information

Raffaele Vitale
Univ. Lille, CNRS, LASIRE (UMR 8516), Laboratoire Avancé de Spectroscopie pour les Interactions, la Réactivité et l’Environnement, F-59000 Lille, France

In the domain of bilinear curve resolution, the identification and extraction of essential information from sets of multivariate measurements has recently garnered significant attention primarily in the light of the fact that such an approach is capable of compressing the data at hand while preserving their local rank properties and, thus, enabling their accurate factorisation in a dramatically shorter amount of time. In this presentation, the idea of essential information-based data reduction is extended to trilinear datasets through the description of an original algorithmic procedure leveraging the principles of Higher Order Singular Value Decomposition (HOSVD). The performance of this novel algorithm will be evaluated in both real-world and simulated scenarios which will permit to highlight the benefits it can bring in domains like multiway fluorescence spectroscopy and imaging.

This work was conducted in collaboration with Azar Azizi (University of Sistan and Baluchestan, Zahedan, Iran), Nematollah Omidikia (University of Sistan and Baluchestan, Zahedan, Iran), Mahdiyeh Ghaffari (Radboud University, Nijmegen, The Netherlands), and Cyril Ruckebusch (Université de Lille, Lille, France).

Join the webinar March 3, 15.00 (CET) 2025. See more at https://www.linkedin.com/events/three-waydatareductionbasedones7297979296994750464/comments/

The recorded webinar is now available at https://youtu.be/MwQYStYjwnM and the slides can be found here.