The Wishart distribution of dimension $K$ is defined over $K \times K$ positive definite matrices. Its parameters are $\mathbf{V}$, its scale matrix, and $\nu > K - 1$, its degrees of freedom.
where $\Gamma_K$ is the multivariate gamma function[wiki], $\psi_K$ is the multivariate digamma function[wiki] and $\psi_1$ is the trigamma function[wiki].
This distribution has a mode only if $\nu \geqslant K + 1$:
Let $(\mathbf{A}_n)$ a set of observed realisations from a Gamma distribution.
$\hat{\mathbf{V}} \mid (\mathbf{A}_n), \nu$ | $= \frac{1}{\nu}\overline{\mathbf{A}}$ |
$\hat{\nu} \mid (\mathbf{A}_n)$ | solution of: $K \ln \hat{\nu} - \psi_K\left(\frac{\hat{\nu}}{2}\right) = K \ln 2 + \ln\det\overline{\mathbf{A}} - \overline{\ln \det \mathbf{A}}$ |
$\hat{\mathbf{V}} \mid (\mathbf{A}_n)$ | $= \hat{\mathbf{V}} \mid (\mathbf{A}_n), \hat{\nu}$ |
where
There is no closed form solution for $\hat{\nu}$, but an approximate solution can be found by numerical optimisation.
I need to check my math for $\nu$
We list here the distributions that can be used as conjugate prior for the parameters of an univariate Normal distribution:
$\mathbf{V} \mid \nu$ | Inverse-Wishart | $\mathcal{W}^{-1}$ |
Update equations can be found in the Conjugate prior article.
The KL-divergence can be written as
where $H$ is the cross-entropy. We have
Specialisation | |
Exponential | $\mathrm{Exp}(\lambda) = \mathcal{W}\left(\frac{1}{2\lambda}, 2\right)$ |
Chi-squared | $\chi^2(\nu) = \mathcal{G}\left(1, \nu\right)$ |
Gamma | $\mathcal{G}(\alpha, \beta) = \mathcal{W}\left(\frac{1}{2\beta}, 2\alpha\right)$ |
Power | |
Inverse-Wishart | $\mathbf{X} \sim \mathcal{W}(\mathbf{V}, \nu) \Rightarrow \mathbf{X}^{-1} \sim \mathcal{W}^{-1}\left(\mathbf{V}^{-1}, \nu\right)$ |
Another parameterisation, which may feel more natural when using the Wishart distribution as a prior for the precision matrix of a multivariate Gaussian distribution, uses the expected matrix instead of the scale matrix:
This distribution has a mode only if $\nu \geqslant K + 1$:
Let $(\mathbf{A}_n)$ a set of observed realisations from a Gamma distribution.
$\hat{\boldsymbol\Lambda} \mid (\mathbf{A}_n)$ | $= \overline{\mathbf{A}}$ |
$\hat{\nu} \mid (\mathbf{A}_n)$ | solution of: $K \ln \hat{\nu} - \psi_K\left(\frac{\hat{\nu}}{2}\right) = K \ln 2 + \ln\det\overline{\mathbf{A}} - \overline{\ln \det \mathbf{A}}$ |
where
There is no closed form solution for $\hat{\nu}$, but an approximate solution can be found by numerical optimisation.
I need to check my math for $\nu$
The KL divergence becomes
Created by Yaël Balbastre on 10 April 2018. Last edited on 10 April 2018.