This page is in reference to the article on Hierarchical Gaussian mixtures.
Let n∈1…N index subjects, i∈1…I index voxels and k∈1…K index classes. Let us write raw images xn. Let us assume that (expected) class belonging of each voxel is known and stored in zn (i.e., a “segmentation” or “responsibility” image). This is the situation that we have after the “Expectation” step of an EM (or variational EM) algorithm.
The conditional prior probability for each voxel is:
Each gaussian is assumed to stem from a Normal-Wishart distribution:
All prior inverse scale matrices stem from a shared Wishart distribution:
We make the mean field approximation that the posterior factorises over (\boldsymbol\mu_{k,n}, \boldsymbol\Lambda_{k,n}), \mathbf{W}_{k,0}^{-1} and \boldsymbol\Psi_0. We’ll be searching for mode estimates for all the other variables (\boldsymbol\mu_{k,0}, \lambda_{k,0}, \nu_{k,0}, m_0).
with
yielding
with
yielding
The lower bound on the model evidence (or negative free energy) can be written as
with
Keeping only terms that depend on \lambda_{k,0}, we have:
By differentiating with respect to \lambda_{k,0}, we find:
Solving for a null derivative yields
Keeping only terms that depend on \boldsymbol\mu_{k,0}, and substituting the above result for \lambda_{k,0}, we have:
Let us write \mathcal{S} = \sum_{n=1}^N \nu_{k,n}^\star(\boldsymbol\mu_{k,n}^\star - \boldsymbol\mu_{k,0})^{\mathrm{T}}\mathbf{W}_{k,n}^\star(\boldsymbol\mu_{k,n}^\star - \boldsymbol\mu_{k,0}) and \mathcal{L} = \sum_{n=1}^N \frac{M}{\lambda_{k,n}^\star} so that \mathcal{E} = MN\frac{\mathcal{S}}{\mathcal{L} + \mathcal{S}}, with
Differentiating the complete objective function yields
Consequently
Keeping only terms that depend on \nu_{k,0}, we have:
with:
yielding:
Differentiating yields:
There is no closed form solution, so we’ll use the gradient and Hessian of the objective function, with a Gauss-Newton optimisation scheme, instead:
Keeping only terms that depend on \boldsymbol\Psi_0, we have:
Differentiating yields:
Keeping only terms that depend on m_0, we have:
Note: In order to find the global optimum at once, I first tried substituting \boldsymbol\Psi_0 for its optimum value (above) that depends on m_0. However, this breaks the objective function’s convexity, resulting in an ill-conditionned optimisation problem. It is thus preferable to alternate between optimising m_0 by Gauss-Newton and updating \boldsymbol\Psi_0 based on this new value.
Once again, we will resolve to using numerical optimisation:
Created by Yaël Balbastre on 11 April 2018. Last edited on 13 April 2018.