Processing math: 7%

Encouraging similar covariance matrices between classes

This page is in reference to the article on Hierarchical Gaussian mixtures.

Model

Let n1N index subjects, i1I index voxels and k1K index classes. Let us write raw images xn. Let us assume that (expected) class belonging of each voxel is known and stored in zn (i.e., a “segmentation” or “responsibility” image). This is the situation that we have after the “Expectation” step of an EM (or variational EM) algorithm.

The conditional prior probability for each voxel is:

Each gaussian is assumed to stem from a Normal-Wishart distribution:

All prior inverse scale matrices stem from a shared Wishart distribution:

We make the mean field approximation that the posterior factorises over (\boldsymbol\mu_{k,n}, \boldsymbol\Lambda_{k,n}), \mathbf{W}_{k,0}^{-1} and \boldsymbol\Psi_0. We’ll be searching for mode estimates for all the other variables (\boldsymbol\mu_{k,0}, \lambda_{k,0}, \nu_{k,0}, m_0).

Posterior

Gaussian parameters (x Subject x Class)

with

yielding

Wishart scale matrix (x Class)

with

yielding

Lower bound

The lower bound on the model evidence (or negative free energy) can be written as

with

Hyper-parameters optimisation

Normal mean degrees of freedom (\lambda_{k,0})

Keeping only terms that depend on \lambda_{k,0}, we have:

By differentiating with respect to \lambda_{k,0}, we find:

Solving for a null derivative yields

Normal mean (\boldsymbol\mu_{k,0})

Keeping only terms that depend on \boldsymbol\mu_{k,0}, and substituting the above result for \lambda_{k,0}, we have:

Let us write \mathcal{S} = \sum_{n=1}^N \nu_{k,n}^\star(\boldsymbol\mu_{k,n}^\star - \boldsymbol\mu_{k,0})^{\mathrm{T}}\mathbf{W}_{k,n}^\star(\boldsymbol\mu_{k,n}^\star - \boldsymbol\mu_{k,0}) and \mathcal{L} = \sum_{n=1}^N \frac{M}{\lambda_{k,n}^\star} so that \mathcal{E} = MN\frac{\mathcal{S}}{\mathcal{L} + \mathcal{S}}, with

Differentiating the complete objective function yields

Consequently

Normal variance degrees of freedom (\nu_{k,0})

Keeping only terms that depend on \nu_{k,0}, we have:

with:

yielding:

Differentiating yields:

There is no closed form solution, so we’ll use the gradient and Hessian of the objective function, with a Gauss-Newton optimisation scheme, instead:

Hyper-Wishart inverse scale matrix (\boldsymbol\Psi_0^{-1})

Keeping only terms that depend on \boldsymbol\Psi_0, we have:

Differentiating yields:

Hyper-Wishart degrees of freedom (m_0)

Keeping only terms that depend on m_0, we have:

Note: In order to find the global optimum at once, I first tried substituting \boldsymbol\Psi_0 for its optimum value (above) that depends on m_0. However, this breaks the objective function’s convexity, resulting in an ill-conditionned optimisation problem. It is thus preferable to alternate between optimising m_0 by Gauss-Newton and updating \boldsymbol\Psi_0 based on this new value.

Once again, we will resolve to using numerical optimisation:


Created by Yaël Balbastre on 11 April 2018. Last edited on 13 April 2018.