The parameters of a multivariate Gaussian distribution of dimension $K$ are $\boldsymbol\mu$, its mean, and $\boldsymbol\Sigma$, its covariance matrix. It can also be paramererised by $\boldsymbol\Lambda = \boldsymbol\Sigma^{-1}$, its precision matrix.
Let $(\mathbf{x}_n)$ be a set of observed realisations from a multivariate Normal distribution.
$\hat{\boldsymbol\mu} \mid (\mathbf{x}_n)$ | $= \overline{\mathbf{x}}$ | $= \frac{1}{N}\sum_{n=1}^N \mathbf{x}_n$ |
$\hat{\boldsymbol\Sigma} \mid (\mathbf{x}_n)$ | $= \overline{\mathbf{x}\mathbf{x}^{\mathrm{T}}} - \overline{\mathbf{x}}\overline{\mathbf{x}}^{\mathrm{T}}$ | $= \frac{1}{N}\sum_{n=1}^N (\mathbf{x}_n - \overline{\mathbf{x}})(\mathbf{x}_n - \overline{\mathbf{x}})^{\mathrm{T}}$ |
$\hat{\boldsymbol\Sigma} \mid \boldsymbol\mu, (\mathbf{x}_n)$ | $= \overline{\mathbf{x}\mathbf{x}^{\mathrm{T}}} + (\boldsymbol\mu - 2\overline{\mathbf{x}})\boldsymbol\mu^{\mathrm{T}}$ | $= \frac{1}{N}\sum_{n=1}^N (\mathbf{x}_n - \boldsymbol\mu)(\mathbf{x}_n - \boldsymbol\mu)^{\mathrm{T}}$ |
I need to check $\hat{\boldsymbol\Sigma} \mid \boldsymbol\mu, (\mathbf{x}_n)$
We list here the distributions that can be used as conjugate prior for the parameters of a multivariate Normal distribution:
$\boldsymbol\mu \mid \boldsymbol\Lambda$ | Multivariate Normal | $\mathcal{N}_{\boldsymbol\Lambda}$ |
$\boldsymbol\Lambda \mid \boldsymbol\mu$ | Wishart | $\mathcal{W}_\mathcal{N}$ |
$\boldsymbol\Sigma \mid \boldsymbol\mu$ | Inverse-Wishart | $\mathrm{Inv-}\mathcal{W}_\mathcal{N}$ |
$\boldsymbol\mu, \boldsymbol\Lambda$ | $\mathcal{NW}$ | |
$\boldsymbol\mu, \boldsymbol\Sigma$ | $\mathcal{N}\mathrm{Inv-}\mathcal{W}$ |
Update equations can be found in the Conjugate prior article.
The KL-divergence can be written as
where $H$ is the cross-entropy. We have
Consequently
Or, if a parameterisation based on the precision matrix is used,
When the Normal distribution is used as a conjugate prior for the mean of another Normal distribution with known precision matrix $\boldsymbol{\Lambda}$, it makes sense to parameterise it in terms of its expected value, $\boldsymbol{\mu}_0$, and degrees of freedom, $n_0$:
The KL-divergence can be written as
Created by Yaël Balbastre on 6 April 2018. Last edited on 9 April 2018.