Bayes: the science of evidence

Completing the square

Several of the explanations on this website require a bit of algebra to understand. "Completing the square" [wikipedia] refers to taking an expression of the form \(ax^2 + bx + c\) and, though algebraic manipulation, ending up with an expression of the form \(a(x + d)^2 + e\). Completing the square is an useful algebra trick to know since it arises often when multiplying likelihoods by priors, especially in the context of the Normal distribution. 

In this reference tutorial, we will demonstrate how to complete the square in both univariate (scalar) and multivariate (matrix) contexts. Incidentally, this article also contains the derivation of the posterior distribution of the mean of the normal distribution when the variance is known in both univariate and multivariate contexts.

Completing the square: univariate


Consider the case of observing \(N\) independent samples from a \(\mbox{Normal}(\mu,\sigma^2)\) distribution. Suppose we know the true value of \(\sigma^2\), and we are interested in determining the posterior distribution of \(\mu\). It is conventional to place a conjugate normal prior on \(\mu\). Our model is:


In Bayesian statistics, the posterior is proportional to the likelihood times the prior. Because all observations are independent, our likelihood is the product of \(N\) Normal density functions: one for each \(y_i\). The prior then provides another Normal density function term. After simplifying and dropping terms that are not functions of \(\mu\), we end up with a posterior distribution for \(\mu\) proportional to

\(\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N(y_i - \mu)^2\right\}\exp\left\{-\frac{1}{2\tau^2}(\mu - \mu_0)^2\right\}\)

The first term is the likelihood, and the second term is the prior. Because we desire a posterior that is a simple function of \(\mu\), we need to gather all the terms that include \(\mu\) together; as the posterior is written above, \(\mu\) is scattered across \(N+1\) terms. Completing the square is the trick that will allow us to gather all the \(\mu\) terms into one.

The first step is to expand the squares containing \(\mu\). This yields

\(\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N\left(y_i^2 - 2\mu y_i - \mu^2\right)\right\}\exp\left\{-\frac{1}{2\tau^2}(\mu^2 - 2\mu\mu_0 + \mu_0^2)^2\right\}\)

We now distribute the sum, yielding
\(\exp\left\{-\frac{1}{2\sigma^2}\left(\color{red}{\sum_{i=1}^Ny_i^2} - 2\mu \sum_{i=1}^Ny_i + N\mu^2\right)\right\}\exp\left\{-\frac{1}{2\tau^2}(\mu^2 - 2\mu\mu_0 + \color{red}{\mu_0^2})^2\right\}\)
There are several terms above that do not involve \(\mu\); these are highlighted in red. When we drop those terms, combine all remaining terms within one exponent, and then focus only on what is in the exponent, what remains is
\(-\frac{1}{2}\left(\frac{N}{\sigma^2}\mu^2 + \frac{1}{\tau^2}\mu^2 - 2\mu \frac{\sum_{i=1}^Ny_i}{\sigma^2} - 2\mu\frac{1}{\tau^2}\mu_0\right) \)
We can combine the \(\mu^2\) terms together, and the \(\mu\) terms together:
\(-\frac{1}{2}\left(\left(\frac{N}{\sigma^2} + \frac{1}{\tau^2}\right)\mu^2 - 2\left(\frac{\sum_{i=1}^Ny_i}{\sigma^2} + \frac{1}{\tau^2}\mu_0\right)\mu\right)\)
Finally, for ease of notation, we can use the fact that \(\sum_{i=1}^Ny_i = N\bar{y}\):
\(-\frac{1}{2}\left(\left(\frac{N}{\sigma^2} + \frac{1}{\tau^2}\right)\mu^2 - 2\left(\frac{N}{\sigma^2}\bar{y} + \frac{1}{\tau^2}\mu_0\right)\mu\right)\)
After all of this algebraic simplification, inside the parentheses what we have looks something like \(ax^2 - 2bx\). We can now "complete the square" to obtain something of the form \((x - c)^2\). 

Competing the square

Our first step is to make the notation easier to follow. Let \(a = \frac{N}{\sigma^2} + \frac{1}{\tau^2}\) and \(b = \frac{N}{\sigma^2}\bar{y} + \frac{1}{\tau^2}\mu_0\). Using the new, simplified notation, we have

\(-\frac{1}{2}\left(a\mu^2 - 2b\mu\right)\)

We can move the coefficient \(a\) on \(\mu\) outside the parentheses:

\(-\frac{a}{2}\left(\mu^2 - 2\frac{b}{a}\mu\right)\)

We now add and subtract the same value inside the parentheses. This doesn't change the value at all, since the terms sum to 0:

\(-\frac{a}{2}\left(\mu^2 - 2\frac{b}{a}\mu + \frac{b^2}{a^2} \color{red}{- \frac{b^2}{a^2}}\right)\)

In fact, neither term is a function of \(\mu\), so we can simply drop the term colored in red.

\(-\frac{a}{2}\left(\mu^2 - 2\frac{b}{a}\mu + \frac{b^2}{a^2}\right)\)

The terms within the parentheses are of the form \(x^2 - 2xc + c^2\), which, from the rules learned in algebra, can be simplified to \((x - c)^2\). Applying this to our terms, we obtain:

\(-\frac{a}{2}\left(\mu - \frac{b}{a}\right)^2\)

We have thus completed the square. We are not done, however: this was only the portion of the posterior distribution that was in the exponent. Replacing the terms in the exponent yields:

\(\exp\left\{-\frac{a}{2}\left(\mu - \frac{b}{a}\right)^2\right\}\)

By completing the square, we have revealed that the posterior distribution of \(\mu\) has the form of a normal distribution with a mean of \(b/a\) and a variance of \(1/a\), or

\(\mu\mid y \sim \mbox{Normal}\left(\mu_n, \sigma^2_n\right)\)


\(\begin{eqnarray}\sigma^2_n = \frac{1}{a}&=&\left(\frac{N}{\sigma^2}+\frac{1}{\tau^2}\right)^{-1},\\\mu_n = \frac{b}{a}&=&\sigma^2_n\left(\frac{N}{\sigma^2}\bar{y} + \frac{1}{\tau^2}\mu_0\right).\end{eqnarray}\)

You are here: Home Statistical Basics Reference Completing the square