
The Copenhagen Chemometrics Group
10 APPLICATIONS OF THE TUCKER3 MODEL
Contents
By C. A. Andersson
- Model dimensionality
- Constrained vs. unconstrained models
- Residual matrices
- Core rotation
- Theoretical question
- Three-way tucker regression
- Two-segment validation of component models
- Two-segment validation of regression models
Data dataset1.mat contains fluorescence excitation emission data of thick juice. Purpose Learning to do analysis of real data with the Tucker model. Information R. Henrion N-way principal component analysis theory, algorithms and applications. Chemom.Intell.Lab.Syst. 25:1-23, 1994 Prerequisites As for chapter 8 |
We will use Tucker3 modeling to investigate a data set
from spectrofluorometric analysis of thick juice, see Andersson et al. (Analysis of N-dimensional data arrays from fluorescence spectroscopy on an intermediary sugar product. Fresenius J.Anal.Chem. 359 (2):138-142, 1997) for more details on this.
Thick juice
is an intermediary product from the production of white crystalline sugar. The dimensions of the array is (28,20,311). The first mode is fraction number (or elution time), the second mode is the excitation wavelength (250 nm – 440 nm) and the third mode is the emission wavelength (250 nm – 560 nm). Start by loading the data set.
Load dataset1 and inspect fluorescence landscapes of the 28 fractions.
Look for features/patterns in the modes that could be exploited.
1. Model dimensionality
Explore all possible/valid dimensionalities
w = (1,1,1), (2,2,1),…,(4,4,4) to find the optimal dimensionality of the model of X.
You can do it a slow way and a much slower way!
Remember that the max. number of factors to extract cannot be higher than the product of the two lower. Let the computer do it in an automated way.
2. Constrained vs. unconstrained models
Using the dimensionality previously found
Estimate an orthogonal as well as an unconstrained model.
Investigate and explore both models thoroughly.
Argument in an exact manner for your findings and include comments on the differences between the two solutions.
Hint me
3. Residual matrices
Plot the residual matrices/landscape
comment on the distribution of the error over the samples.
Suggest a way to make a simple plot to compare the error of the samples.
Decide if one, or more, samples have to be removed.
4. Core rotation
Experiment with the available core rotations
and make a new solution that can easily be interpreted.
What are the explicit arguments for choosing this model. Check that the premisses hold: Orthogonality and/or non-orthogonality of factors, non-orthogonality of rotation matrices.
Estimate the relative percentage of the explained variation
accounted for by the significant combinations of outer products.
Construct a new model using only these combinations of factors (use the correct weights) and explore the model visually.
5. Theoretical question
Prove that the maximum number of factors that can be found
is restricted by the product of the two lowest number of factors for any model.
Use the equations given in Chapters 1 and 2.
Hint me
6. Three-way tucker regression
Use regression to model the behaviour of the y measurements (pH).
Hint me
7. Two-segment validation of component models
Make an M-file
that can construct at least two sub-arrays (splithalf) of the array X, and that,
using these arrays in turn, can plot a validation error (SSE) as a function of the feasible model dimensions from (1,1,1),…,(5,5,5).
8. Two-segment validation of regression models
Make an extension of the M-file from above
that can plot the validation (SSE) and prediction (RMSEP) error as a function of the feasible model dimensions from (1,1,1),…,(5,5,5).
Again use the split-half principle.
