learnbayes.org > Completing the square > Multivariate

Multivariate

If we are working with multivariate distributions, it is often customary to represent them as functions of vectors and matrices. The density of the \(p\)-variate multivariate normal distribution, for instance, is defined as

\[ p(\boldsymbol{y}\mid \boldsymbol\mu, \boldsymbol\Sigma) = \left(2\pi\right)^{-\frac{p}{2}}\left|\boldsymbol\Sigma\right|^{-\frac{1}{2}}\exp\left\{-\frac{1}{2}\left(\boldsymbol y - \boldsymbol\mu\right)^\prime\boldsymbol\Sigma^{-1}\left(\boldsymbol y - \boldsymbol\mu\right)\right\} \]

where data \(\boldsymbol y\) and mean \(\boldsymbol \mu\) are \(p\times 1\) vectors, and \(\boldsymbol \Sigma\) is a \(p\times p\) covariance matrix. Consider the case that we have \(N\) observations from a multivariate normal with known covariance, and we wish to find the posterior distribution for the mean vector \(\boldsymbol\mu\). This case is analogous to the univariate case above, and, analogously, we will use a multivariate Normal prior on \(\boldsymbol\mu\). Our model is:

\[ \begin{eqnarray}\boldsymbol y_i&\stackrel{iid}{\sim}&\mbox{Multivariate Normal}_p(\boldsymbol\mu,\boldsymbol\Sigma)\\\boldsymbol\mu&\sim&\mbox{Multivariate Normal}_p(\boldsymbol\mu_0,\boldsymbol V)\end{eqnarray} \]

In the previous section, we detailed a lot of the preliminary algebra in deriving the posterior distribution. Since the algebra is analogous in the multivariate case, there is no reason to reproduce it here. After the preliminary algebra and dropping terms that are irrelevant, the terms inside exponent are

\[ -\frac{1}{2}\left(\boldsymbol\mu^\prime\left(N\boldsymbol\Sigma^{-1} + \boldsymbol V^{-1}\right)\boldsymbol\mu - \boldsymbol\mu^\prime\left(N\boldsymbol\Sigma^{-1}\bar{\boldsymbol y} + \boldsymbol V^{-1}\boldsymbol\mu_0\right) - \left(N\boldsymbol\Sigma^{-1}\bar{\boldsymbol y} + \boldsymbol V^{-1}\boldsymbol\mu_0\right)^\prime\boldsymbol\mu\right) \]

where \(\bar{\boldsymbol y} = \frac{1}{N}\sum_{i=1}^N\boldsymbol y\). At this point, we would like to complete the square in a fashion analogous to the way we completed the square in the univariate case. To facilitate this, we introduce analogous notation. Let \(\boldsymbol A = N\boldsymbol\Sigma^{-1} + \boldsymbol V^{-1}\), and let \(\boldsymbol b = N\boldsymbol\Sigma^{-1}\bar{\boldsymbol y} + \boldsymbol V^{-1}\boldsymbol\mu_0\). With this notation, the previous expression is equal to

\[ -\frac{1}{2}\left(\boldsymbol\mu^\prime\boldsymbol A\boldsymbol\mu - \boldsymbol\mu^\prime\boldsymbol b - \boldsymbol b^\prime\boldsymbol\mu\right). \]

In the univariate case, our next two steps were to bring \(a\) — analogous to \(\boldsymbol A\) here — outside the parentheses, and then to introduce new terms to complete the square. However, we cannot do that here. \(\boldsymbol A\) is between the two vector terms \(\boldsymbol\mu^\prime\) and \(\boldsymbol\mu\); because matrix multiplication does not commute, we cannot multiply the term by anything to move \(\boldsymbol A\). We can, though, introduce a term that is not a function of \(\boldsymbol\mu\) to complete the square:

\[ -\frac{1}{2}\left(\boldsymbol\mu^\prime\boldsymbol A\boldsymbol\mu - \boldsymbol\mu^\prime\boldsymbol b - \boldsymbol b^\prime\boldsymbol\mu + \boldsymbol b^\prime\boldsymbol A^{-1}\boldsymbol b\right). \]

In order to deal with the difficultly with \(\boldsymbol A\), we note three things: first, that \(\boldsymbol A\) is symmetric and invertible, because it is the weighted sum of two symmetric, full-rank covariance matrices; and second, that the identity matrix \(\boldsymbol I = \boldsymbol A^{-1}\boldsymbol A = \boldsymbol A\boldsymbol A^{-1}\). Since multiplying by \(\boldsymbol I\) does not change a matrix expression as long as the matrix is the proper dimension, we can rewrite our expression above:

\[ -\frac{1}{2}\left(\boldsymbol\mu^\prime\boldsymbol A\boldsymbol\mu - \boldsymbol\mu^\prime\boldsymbol A\boldsymbol A^{-1}\boldsymbol b - \boldsymbol b^\prime\boldsymbol A^{-1}\boldsymbol A\boldsymbol\mu + \boldsymbol b^\prime\boldsymbol A^{-1}\boldsymbol A\boldsymbol A^{-1}\boldsymbol b\right). \]

We can now introduce new notation, analogous to the notation at the end of the univariate competing the square section above. Let \(\boldsymbol\Sigma_n = \boldsymbol A^{-1}\) and \(\boldsymbol\mu_n = \boldsymbol A^{-1}\boldsymbol b\). Then the expression above becomes

\[ -\frac{1}{2}\left(\boldsymbol\mu^\prime\boldsymbol \Sigma_n^{-1}\boldsymbol\mu - \boldsymbol\mu^\prime\boldsymbol \Sigma_n^{-1}\boldsymbol \mu_n - \boldsymbol \mu_n^\prime\boldsymbol \Sigma_n^{-1}\boldsymbol\mu + \boldsymbol \mu_n^\prime\boldsymbol \Sigma_n^{-1}\boldsymbol \mu_n\right). \]

It is now obvious that we can factor the above expression into

\[ -\frac{1}{2}\left(\boldsymbol\mu - \boldsymbol\mu_n\right)^\prime\boldsymbol\Sigma_n^{-1}\left(\boldsymbol\mu - \boldsymbol\mu_n\right). \]

If you don’t see why, try expanding the expression above. Replacing the terms into the exponential function, we see that the posterior distribution is proportional to

\[ \exp\left\{-\frac{1}{2}\left(\boldsymbol\mu - \boldsymbol\mu_n\right)^\prime\boldsymbol\Sigma_n^{-1}\left(\boldsymbol\mu - \boldsymbol\mu_n\right)\right\}. \]

Because this is proportional to the probability density function for a multivariate Normal distribution with mean vector \(\boldsymbol\mu_n\) and covariance matrix \(\boldsymbol\Sigma_n\), we know the exact form of the posterior distribution for the multivariate Normal mean vector \(\boldsymbol\mu\) when the prior is also multivariate Normal:

\[ \boldsymbol\mu\mid \boldsymbol Y \sim \mbox{Multivariate Normal}_p\left(\boldsymbol\mu_n,\boldsymbol\Sigma_n\right), \]

where

\[ \begin{eqnarray}\boldsymbol\Sigma_n = \boldsymbol A^{-1}&=&\left(N\boldsymbol\Sigma^{-1} + \boldsymbol V^{-1}\right)^{-1},\\\boldsymbol\mu_n = \boldsymbol A^{-1}\boldsymbol b&=&\boldsymbol\Sigma_n\left(N\boldsymbol\Sigma^{-1}\bar{\boldsymbol y} + \boldsymbol V^{-1}\boldsymbol\mu_0\right).\end{eqnarray} \]

This is analogous to the posterior distribution in the univariate case, as can be easily seen by comparing the parameters of the respective posteriors.