Semi-Automated Machine Learning for Calibration Model Development

April 14 at 15 CET, Manuel A. Palacios and Barry M. Wise from Eigenvector Research gave a webinar about the very intriguing Diviner. You can see the webinar here.

The story: AutoML, or Automated Machine Learning, represents a paradigm where the entire pipeline of data preprocessing, variable selection, feature engineering, model selection, and hyperparameter tuning is automated, typically leading to a singular optimized model. While highly efficient and simple from the end-user perspective, this approach often lacks transparency and customizability. Furthermore, the black-box nature of AutoML might overlook models that an analyst would have preferred, instead prioritizing what it deems the ‘best’ model, potentially missing nuanced solutions tailored to specific needs.

In this talk we introduce Diviner, (from Divine—to discover or locate something by intuition, insight or supernatural means), a novel semi-autoML approach that transcends the traditional black-box model by actively involving the analyst in the model-building process. Unlike conventional autoML methods that yield a single optimal model, our approach creates a family of models and ranks them by their cross-validation performance, degree of overfitting, and prediction error on validation sets.

Diviner commences with user-assisted outlier assessment via Principal Components Analysis (PCA) and Robust Partial Least Squares (RPLS), followed by variable selection to hone significant predictors. Various data-relevant preprocessing algorithms are explored, and linear models (Partial Least Squares and Elastic Net) are calibrated (with hyperparameter tuning) and ranked. At this point, the end-user selects the models to keep and take for further refinement, including re-assessment of outliers, preprocessing and variable selection fine-tuning. Finally, the user chooses the models to output. This can be a single model, a series of the top-ranked models, or an ensemble of models which can be further optimized.

Diviner bridges the gap between full automation and user-led customization, leading to more insightful and transparent model-building. Through the evaluation of several datasets, we demonstrate promising results in predictive accuracy and model diversity. This research not only contributes to the practical application of autoML but also sparks a dialogue on rethinking how autoML models are constructed, evaluated, and understood, moving away from the black-box solution to a more collaborative, interpretable, and adaptive methodology.

Semi-Automated Machine Learning for Calibration Model Development

Published by Rasmus Bro

Leave a ReplyCancel reply

Share this:

Published by Rasmus Bro

Leave a ReplyCancel reply

Discover more from Chemometrics Research