9. FEATURES OF THE TUCKER3 MODEL

9. FEATURES OF THE TUCKER3 MODEL

Contents
By C. A. Andersson

Data used
howto3.mat, howto4.mat and howto5.mat

Purpose 
Learning the advanced aspects of Tucker modeling
(e.g. core rotations)

Information
R. Henrion
N-way principal component analysis theory,
algorithms and applications. 
Chemom.Intell.Lab.Syst. 25:1-23, 1994

Prerequisites 
Knowledge of MATLAB,
multi-way arrays and the basis of Tucker models

Using the introduction in the prior chapter as a basis
the features and characteristics of the Tucker3 model will now be discussed.
Some of the features, e.g. rotational ambiguity, are known from two-way data analysis.

However, many analysts need not regard, or forget
the rotational possibilities of the PCA factors.
The non-uniqueness (i.e., rotation, scaling and permutation) of the factors resolved from two-way data results in ambiguous solutions, albeit offers possibilities for exploring the model in many interesting ways.

Post-processing solutions
from decomposition / factorization belongs to the discipline of factor analysis (FA).
The FA approach will be discussed in relation to the solutions from the Tucker3 model in order to equip the analyst with more ways to explore the estimated model – eventually get a better understanding of his or her data.

1. The informative core

We have referred to the core as containing the weights of all possible triads.
This is only meaningful if the factors in the component matrices AB and C are limited in some way. Otherwise the factors in e.g. A could be scaled by random scalars and the corresponding elements in the core would be down-scaled by the same scalars thereby not providing any meaningful basis for interpretation. In order to make the weights in G comparable it is therefore generally applied to scale the factors to have unit length, i.e., the root of the sum of squared elements equals one, such that the core elements represents the importance of the respective factor combinations.

The largest squared elements of the core will indicate what the most important factors are in the model of X. Furthermore, by requiring that the component matrices are column-wise orthonormal, it can be shown that, when the model fits data ideally, the sum of the squared entries of the core equals the sum of the squared entries of X.

Another benefit from requiring the component matrices to be columnwise orthonormal is that the computations often take significantly less time. It can be shown that the orthogonality constraint does not influence the fit, so from this viewpoint there is no difference; the models fit data equally well. The orthogonality constraint has an implication if the spectra are correlated, since the orthogonality will disrupt the spectral resemblance between the calculated profiles and the true measured spectra. This renders interpretation diffucult in most cases.

For orthogonal models
it can be shown that the size of the squared core entries are proportional to the variation explained of the model by the factor combination. In other words, to find the most significant factor combination, one must find the largest squared entry in the core. 

How to do in the N-way Toolbox #3

Using Eq. (4) from the previous section

we wish, by experiment, to verify that the orthogonality poses no change in the fit.
Using the same data set we first estimate an unconstrained model and then a constrained model. You will do this by using the commands below. Use this exciting lesson to get acquainted with the way data are handled and the N-way Toolbox. 

load howto3 %Load X and W 

%Make the ordinary orthogonality constrained Tucker3 model 
%The [0 0 0] parameters define an orthogonal model 
%(which is also the default setting) 

[Factorso,Go,SSEo]=tucker(X,W,[],[0 0 0]); 
[Ao Bo Co]=fac2let(Factorso); %Convert to component matrices 

%If the these three matrices exhibit identity the solutions are %orthogonal 
Ao'*Ao, Bo'*Bo, Co'*Co 

%Make an unconstrained model 
%The [2 2 2] parameters define an unconstrained model 

[Factorsu,Gu,SSEu]=tucker(X,W,[],[2 2 2]); 

[Au Bu Cu]=fac2let(Factorsu); %Convert to component matrices 

%Since the following three matrices are not identity matrices %the components are oblique (unconstrained) 
Au'*Au, Bu'*Bu, Cu'*Cu 

%The errors of the two model should be equal 
%(to rounding errors)
[SSEo SSEu] 

Example 3

Question
The core from the orthogonal model above has been slightly modified
and is listed as an unfolded matrix G below. Remember that w = (3,3,3). 

Try to identify the three most significant core elements?
From the positions of the significant elements, find out what the corresponding factors are? Is there one particularly important factor in the model?

Answer

2. Rotational freedom of the solution

The Tucker3 model does not provide unique solutions
Iit is possible to estimate an infinity of different solutions ABC and G that fit the same data array X equally well (here we of course assume that w is the same).
This means that the Tucker model has an infinity of solutions. This however, has no impact on the interpretation as long as one is aware of this because the systematic behavior caught by one model is the same in all models. Since this feature is important for the discussion to follow, we will elaborate on this by proving that a given solution may be rotated to any other of the infinite number of solutions.

From X (r1,r2,r3), a solution with orthonormal A (r1,w1), B (r2,w2), C (r3,w3) and G (w1,w2,w3) has been calculated. We may write the model of X as in Eq. (4).

If the component matrices are rotated by arbitrary columnwise orthonormal matrices O (w1,w1), P (w2,w2) and Q (w3,w3) the least squares fit of the model is maintained. Using the rotated factors, the new corresponding, or counter-rotated, core is calculated as shown in Eq. (9).

Eq. (9) is the basis for rotation of Tucker3 solution as the rotated core is easily calculated without involving direct rotation of the factors themselves. We will discuss this in the next section. To conclude, Eq. (9) is used in Eq. (10) to formulate the rotated solution.

From Eq. (10) we see that rotation of the factors results in a counter-rotation of the core when applied to X. Hence, any initial solution to a Tucker3 model may be rotated using arbitrary orthonormal rotation matrices without affecting the model of X.
Thus, the fit also remains unaffected as postulated above.  

How to do it, in the N-way Toolbox
In order to demonstrate the rotational abilities of the Tucker3 model we will calculate a model and arbitrarily rotate the factors and the core accordingly.
We use r = (8,6,7) and w = (3,2,2). Investigate/plot the differences between the rotated and the unrotated factors. Take a particular look to the differences between the cores.
Don’t clear the screen before you have answered Example 3. Later in the chapter we will use more intelligent schemes for optimizing the simplicity of cores. 

load howto4                %Load X and W
[Factors,G] = tucker(X,W); %Make a Tucker3 model
[A B C]= fac2let(Factors); %Make component matrices

%Lets make three arbitrary rotation matrices, one for each mode
O=orth(rand(W(1),W(1))); 
P=orth(rand(W(2),W(2)));
Q=orth(rand(W(3),W(3)));

Ar = A*O; %Rotate mode 1
Br = B*P; %Rotate mode 2
Cr = C*Q; %Rotate mode 3

Xunf = reshape(X,size(X,1),size(X,2)*size(X,3)); %Unfold X
Gr.  = Ar'*Xunf*ckron(Cr,Br); %Calculate the new unfolded core
Gr.  = reshape(Gr,W); %reshape to array
[sum(G(:).^2) sum(Gr(:).^2)] %Must be equal

SSEr = sum(sum(sum((X-nmodel({Ar,Br,Cr},Gr)).^2 ))); 
%Calculate the error of the rotated model

SSEo = sum(sum(sum((X-nmodel({A,B,C},G)).^2 ))); 
%Calculate the error of the initial model

[SSEo SSEr] %The errors should also be equal

Example 4

Question
Inspect the initial core G and the rotated core Gr from the How to do it above.
Are you free to choose any of these two models for interpretation?
If so, what features would you look for and why? 

Answer
 

3. Rotating models to simplicity

In the sequel we will discuss the orthogonality constrained Tucker3 models.
As indicated by Example #5 and How-to #4 there can be great differences between the cores of identical, but rotated, solutions. By a solution we understand the set of matrices ABC and the core G which together forms a model of X. Most often, a solution derived by orthogonality constrained models is rotated using factor analysis (FA) to yield a new solution that is optimal in a closer defined sense. One can undertake two approaches to FA: i) Optimize a measure defined in terms of the factors or component matrices, or ii) Optimize the core to have an optimal structure depending on the aim of the modeling.

For complex models
they may be use in combination. In this context we will assume that the rotation matrices, e.g. OP and Q as in the example above, are orthonormal. However, also non-orthogonal rotations matrices can be applied, but the features of the rotated matrices and cores becomes somewhat more difficult to predict. Also, it seems difficult to establish robust algorithms for this purpose in FA, at least in cases of core rotation.

The analyst may have a desire
that is formulated in terms of the components matrices.
E.g., it may be desirable to optimize the variance measures of the factors to explore hidden differences or commonalities in selected modes. The perhaps most common measures belong to the family of Orthomax, e.g. Varimax and Quartimax. In addition Quartimin and Covarimin may be useful. Any (orthogonal) two-way rotation can be applied in a straight-forward manner to obtain the rotation matrices for the components. These matrices are then applied to the core to complete the rotation of the model.

In chapter 5 of his textbook Kroonenberg (Three-mode Principal Component Analysis. Theory and Applications, Leiden:DSWO Press, 1983) suggest that one should seek to diagonalize the planes of the core.

To obtain an even simpler core Kiers (TUCKALS core rotations and constrained TUCKALS modelling. Statistica Applicata 4:659-667, 1992) propose to diagonalize the core in whole. By the latter method the solutions are rotated in a controlled way until the sum of the squared diagonal elements is at maximum.

Note that
the core is required to be quadratic, i.e., is required to have w1 = w2 = w3, otherwise the diagonal is not defined. By this clever proposal the Tucker3 solution will be rotated to have optimal resemblance with a PARAFAC solution with orthogonal factors. Depending on the data and the dimensionality of the model, this rotation may provide a strong simplification of the core. However, in cases where the Tucker3-model is needed over a PARAFAC model one often requires the presence of significant off-diagonal elements.

The variance-of-squares core measure
Realizing that the main interest of any analyst of Tucker3 models is to obtain a simple core structure, Henrion and Andersson (A new criterion for simple-structure transformations of core arrays in N-way principal components analysis. Chemom.Intell.Lab.Syst. 47 (2):189-204, 1999) propose to optimize the variance of squares of the core. The measure is calculated as the sum of fourth power of all elements after centering with the overall mean value of the squared elements. 

How to do it in the N-Way Toolbox
First we will calculate a Tucker3 model with orthogonal component matrices.
Then we will rotate the initial core to optimal diagonality and variance-of-squares. Continue with Example #5 before you blank the screen. 

load howto5 %Load X and W

[Factors,Go]=tucker(X,W); %Make the Tucker model
[A B C]=fac2let(Factors); %Convert to component matrices
[Gd,Od1,Od2,Od3]=maxdia3(Go); %Rotate to optimum diagonality
[Gv,Ov1,Ov2,Ov3]=maxvar3(Go); %Rotate to optimum variance-of-
                              %squares
explcore(Go,7); %Inspect the unrotated solution
explcore(Gd,7); %Inspect the diagonalized solution
explcore(Gv,7); %Inspect the variance-of-squares optimized
                %solution

%Reshape X and G to unfolded matrices
Xunf = reshape(X,size(X,1),size(X,2)*size(X,3));
Gounf = reshape(Go,size(Go,1),size(Go,2)*size(Go,3));
Gdunf = reshape(Gd,size(Go,1),size(Go,2)*size(Go,3));
Gvunf = reshape(Gv,size(Go,1),size(Go,2)*size(Go,3)); 
%Find the error
sum(sum((Xunf-A*Gounf*kron(C',B')).^2 ))
sum(sum((Xunf-(A*Od1)*Gdunf*kron((C*Od3)',(B*Od2)')).^2))
sum(sum( (Xunf - (A*Ov1)*Gvunf*kron((C*Ov3)',(B*Ov2)')).^2 )) 

Example 5

Question
Use some time to discuss the pros and cons of the three models obtained above – which model would you prefer to interprete? Why?
What makes you select one model over another?
Is there anything in the analysis that suggest to use the PARAFAC model on these data? 

We have, for some reason become suspicious with regards to the derived transformation matrices derived above. What would you do to check the orthogonality of the transformation matrices and the orthogonality of the rotated factors? 

Answer

As a final practical aspect of Tucker
it is described here, how to fit new samples to a model, i.e. finding the scores of a new samples. As opposed to PARAFAC this has to be done ‘manually’:

If you have a model with loads A, B, C and G and want to predict the scores Anew for a new sample, do the following:

g = reshape(G,size(G,1),size(G,2)*size(G,3)); % Unfold G
Z = g*kron(C,B)';

Now unfold the data used to fit the model

x = reshape(X,I,J*K);

then to verify, you should find that 

a = x*pinv(Z);

equals A. To fit new samples simply do 

Anew = xnew*pinv(Z);