# The Copenhagen Chemometrics Group

## 10 APPLICATIONS OF THE TUCKER3 MODEL

**Contents**

By C. A. Andersson

- Model dimensionality
- Constrained vs. unconstrained models
- Residual matrices
- Core rotation
- Theoretical question
- Three-way tucker regression
- Two-segment validation of component models
- Two-segment validation of regression models

Data dataset1.mat contains fluorescence excitation emission data of thick juice. PurposeLearning to do analysis of real data with the Tucker model. Information R. Henrion N-way principal component analysis theory, algorithms and applications. Chemom.Intell.Lab.Syst. 25:1-23, 1994PrerequisitesAs for chapter 8 |

**We will use Tucker3 modeling to investigate a data set**

from spectrofluorometric analysis of *thick juice*, see Andersson et al. (*Analysis of N-dimensional data arrays from fluorescence spectroscopy on an intermediary sugar product. Fresenius J.Anal.Chem. 359 (2):138-142, 1997*) for more details on this. **Thick juice**

is an intermediary product from the production of white crystalline sugar. The dimensions of the array is (28,20,311). The first mode is fraction number (or elution time), the second mode is the excitation wavelength (250 nm – 440 nm) and the third mode is the emission wavelength (250 nm – 560 nm). Start by loading the data set.

**Load dataset1** and inspect fluorescence landscapes of the 28 fractions.

Look for features/patterns in the modes that could be exploited.

### 1. Model dimensionality

**Explore all possible/valid dimensionalities** ** w** = (1,1,1), (2,2,1),…,(4,4,4) to find the optimal dimensionality of the model of

**X**.

**You can do it a slow way and a much slower way!**

Remember that the max. number of factors to extract cannot be higher than the product of the two lower. Let the computer do it in an automated way.

### 2. Constrained vs. unconstrained models

**Using the dimensionality previously found**

Estimate an orthogonal as well as an unconstrained model. **Investigate and explore both models thoroughly. **

Argument in an exact manner for your findings and include comments on the differences between the two solutions.

Hint me

### 3. Residual matrices

**Plot the residual matrices/landscape**

comment on the distribution of the error over the samples.

Suggest a way to make a simple plot to compare the error of the samples.

Decide if one, or more, samples have to be removed.

### 4. Core rotation

**Experiment with the available core rotations**

and make a new solution that can easily be interpreted.

What are the explicit arguments for choosing this model. Check that the premisses hold: Orthogonality and/or non-orthogonality of factors, non-orthogonality of rotation matrices. **Estimate the relative percentage of the explained variation **

accounted for by the significant combinations of outer products.

Construct a new model using only these combinations of factors (use the correct weights) and explore the model visually.

### 5. Theoretical question

**Prove that the maximum number of factors that can be found**

is restricted by the product of the two lowest number of factors for any model.

Use the equations given in Chapters 1 and 2.

Hint me

### 6. Three-way tucker regression

Use regression to model the behaviour of the **y** measurements (pH).

Hint me

### 7. Two-segment validation of component models

**Make an M-file**

that can construct at least two sub-arrays (splithalf) of the array **X**, and that,

using these arrays in turn, can plot a validation error (SSE) as a function of the feasible model dimensions from (1,1,1),…,(5,5,5).

### 8. Two-segment validation of regression models

**Make an extension of the M-file** **from above**

that can plot the validation (SSE) and prediction (RMSEP) error as a function of the feasible model dimensions from (1,1,1),…,(5,5,5).

Again use the split-half principle.