learnbayes.org > Completing the square > Univariate

Univariate

Preliminaries

Consider the case of observing \(N\) independent samples from a \(\mbox{Normal}(\mu,\sigma^2)\) distribution. Suppose we know the true value of \(\sigma^2\), and we are interested in determining the posterior distribution of \(\mu\). It is conventional to place a conjugate normal prior on \(\mu\). Our model is:

\[ \begin{eqnarray}y_i&\stackrel{iid}{\sim}&\mbox{Normal}(\mu,\sigma^2)\\\mu&\sim&\mbox{Normal}(\mu_0,\tau^2)\end{eqnarray} \]

In Bayesian statistics, the posterior is proportional to the likelihood times the prior. Because all observations are independent, our likelihood is the product of \(N\) Normal density functions: one for each \(y_i\). The prior then provides another Normal density function term. After simplifying and dropping terms that are not functions of \(\mu\), we end up with a posterior distribution for \(\mu\) proportional to

\[ \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N(y_i - \mu)^2\right\}\exp\left\{-\frac{1}{2\tau^2}(\mu - \mu_0)^2\right\} \]

The first term is the likelihood, and the second term is the prior. Because we desire a posterior that is a simple function of \(\mu\), we need to gather all the terms that include \(\mu\) together; as the posterior is written above, \(\mu\) is scattered across \(N+1\) terms. Completing the square is the trick that will allow us to gather all the \(\mu\) terms into one.

The first step is to expand the squares containing \(\mu\). This yields

\[ \exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N\left(y_i^2 - 2\mu y_i - \mu^2\right)\right\}\exp\left\{-\frac{1}{2\tau^2}(\mu^2 - 2\mu\mu_0 + \mu_0^2)\right\} \]

We now distribute the sum, yielding

\[ \exp\left\{-\frac{1}{2\sigma^2}\left({\color{red} \sum_{i=1}^Ny_i^2} - 2\mu \sum_{i=1}^Ny_i + N\mu^2\right)\right\}\exp\left\{-\frac{1}{2\tau^2}(\mu^2 - 2\mu\mu_0 + \color{red}{\mu_0^2})\right\} \]

There are several terms above that do not involve \(\mu\); these are highlighted in red. When we drop those terms, combine all remaining terms within one exponent, and then focus only on what is in the exponent, what remains is

\[ -\frac{1}{2}\left(\frac{N}{\sigma^2}\mu^2 + \frac{1}{\tau^2}\mu^2 - 2\mu \frac{\sum_{i=1}^Ny_i}{\sigma^2} - 2\mu\frac{1}{\tau^2}\mu_0\right) \]

We can combine the \(\mu^2\) terms together, and the \(\mu\) terms together:

\[ -\frac{1}{2}\left(\left(\frac{N}{\sigma^2} + \frac{1}{\tau^2}\right)\mu^2 - 2\left(\frac{\sum_{i=1}^Ny_i}{\sigma^2} + \frac{1}{\tau^2}\mu_0\right)\mu\right) \]

Finally, for ease of notation, we can use the fact that \(\sum_{i=1}^Ny_i = N\bar{y}\):

\[ -\frac{1}{2}\left(\left(\frac{N}{\sigma^2} + \frac{1}{\tau^2}\right)\mu^2 - 2\left(\frac{N}{\sigma^2}\bar{y} + \frac{1}{\tau^2}\mu_0\right)\mu\right) \]

After all of this algebraic simplification, inside the parentheses what we have looks something like \(ax^2 - 2bx\). We can now “complete the square” to obtain something of the form \((x - c)^2\).

Competing the square

Our first step is to make the notation easier to follow. Let \(a = \frac{N}{\sigma^2} + \frac{1}{\tau^2}\) and \(b = \frac{N}{\sigma^2}\bar{y} + \frac{1}{\tau^2}\mu_0\). Using the new, simplified notation, we have

\[ -\frac{1}{2}\left(a\mu^2 - 2b\mu\right) \]

We can move the coefficient \(a\) on \(\mu\) outside the parentheses:

\[ -\frac{a}{2}\left(\mu^2 - 2\frac{b}{a}\mu\right) \]

We now add and subtract the same value inside the parentheses. This doesn’t change the value at all, since the terms sum to 0:

\[ -\frac{a}{2}\left(\mu^2 - 2\frac{b}{a}\mu + \frac{b^2}{a^2} \color{red}{- \frac{b^2}{a^2}}\right) \]

In fact, neither term is a function of \(\mu\), so we can simply drop the term colored in red.

\[ -\frac{a}{2}\left(\mu^2 - 2\frac{b}{a}\mu + \frac{b^2}{a^2}\right) \]

The terms within the parentheses are of the form \(x^2 - 2xc + c^2\), which, from the rules learned in algebra, can be simplified to \((x - c)^2\). Applying this to our terms, we obtain:

\[ -\frac{a}{2}\left(\mu - \frac{b}{a}\right)^2 \]

We have thus completed the square. We are not done, however: this was only the portion of the posterior distribution that was in the exponent. Replacing the terms in the exponent yields:

\[ \exp\left\{-\frac{a}{2}\left(\mu - \frac{b}{a}\right)^2\right\} \]

By completing the square, we have revealed that the posterior distribution of \(\mu\) has the form of a normal distribution with a mean of \(b/a\) and a variance of \(1/a\), or

\[ \mu\mid y \sim \mbox{Normal}\left(\mu_n, \sigma^2_n\right) \] where

\[ \begin{eqnarray}\sigma^2_n = \frac{1}{a}&=&\left(\frac{N}{\sigma^2}+\frac{1}{\tau^2}\right)^{-1},\\ \mu_n = \frac{b}{a}&=&\sigma^2_n\left(\frac{N}{\sigma^2}\bar{y} + \frac{1}{\tau^2}\mu_0\right).\end{eqnarray} \]