Multivariate Normal

Ususal parameterisation

Probability distribution function

The parameters of a multivariate Gaussian distribution of dimension $K$ are $\boldsymbol\mu$, its mean, and $\boldsymbol\Sigma$, its covariance matrix. It can also be paramererised by $\boldsymbol\Lambda = \boldsymbol\Sigma^{-1}$, its precision matrix.

Maximum likelihood estimators

Let $(\mathbf{x}_n)$ be a set of observed realisations from a multivariate Normal distribution.

$\hat{\boldsymbol\mu} \mid (\mathbf{x}_n)$ $= \overline{\mathbf{x}}$ $= \frac{1}{N}\sum_{n=1}^N \mathbf{x}_n$
$\hat{\boldsymbol\Sigma} \mid (\mathbf{x}_n)$ $= \overline{\mathbf{x}\mathbf{x}^{\mathrm{T}}} - \overline{\mathbf{x}}\overline{\mathbf{x}}^{\mathrm{T}}$ $= \frac{1}{N}\sum_{n=1}^N (\mathbf{x}_n - \overline{\mathbf{x}})(\mathbf{x}_n - \overline{\mathbf{x}})^{\mathrm{T}}$
$\hat{\boldsymbol\Sigma} \mid \boldsymbol\mu, (\mathbf{x}_n)$ $= \overline{\mathbf{x}\mathbf{x}^{\mathrm{T}}} + (\boldsymbol\mu - 2\overline{\mathbf{x}})\boldsymbol\mu^{\mathrm{T}}$ $= \frac{1}{N}\sum_{n=1}^N (\mathbf{x}_n - \boldsymbol\mu)(\mathbf{x}_n - \boldsymbol\mu)^{\mathrm{T}}$

I need to check $\hat{\boldsymbol\Sigma} \mid \boldsymbol\mu, (\mathbf{x}_n)$

Conjugate prior

We list here the distributions that can be used as conjugate prior for the parameters of a multivariate Normal distribution:

 $\boldsymbol\mu \mid \boldsymbol\Lambda$ Multivariate Normal $\mathcal{N}_{\boldsymbol\Lambda}$
 $\boldsymbol\Lambda \mid \boldsymbol\mu$ Wishart $\mathcal{W}_\mathcal{N}$
 $\boldsymbol\Sigma \mid \boldsymbol\mu$ Inverse-Wishart $\mathrm{Inv-}\mathcal{W}_\mathcal{N}$
$\boldsymbol\mu, \boldsymbol\Lambda$ Normal-Wishart $\mathcal{NW}$
$\boldsymbol\mu, \boldsymbol\Sigma$ Normal-Inverse-Wishart $\mathcal{N}\mathrm{Inv-}\mathcal{W}$

Update equations can be found in the Conjugate prior article.

Kullback-Leibler divergence

The KL-divergence can be written as

where $H$ is the cross-entropy. We have

Consequently

Or, if a parameterisation based on the precision matrix is used,

“Normal mean conjugate” parameterisation

When the Normal distribution is used as a conjugate prior for the mean of another Normal distribution with known precision matrix $\boldsymbol{\Lambda}$, it makes sense to parameterise it in terms of its expected value, $\boldsymbol{\mu}_0$, and degrees of freedom, $n_0$:

Kullback-Leibler divergence

The KL-divergence can be written as


Created by Yaël Balbastre on 6 April 2018. Last edited on 9 April 2018.