- Details
- Category: Basics
- Published on 02 January 2012
- Written by Richard D. Morey
- Hits: 16358

At the heart of Bayesian statistics is the concept of *conditioning*; that is, quantifying uncertainty based on what you know. When we observe data, we wish to condition our beliefs based on those data. This article introduces conditional probability and ultimately Bayes rule.

You are an employee at a coffee stand. Your coffee stand has two sizes of drink: regular and large. Suppose you are interested in how men and women order drinks at your coffee stand. Your population can be divided into four groups:

- Men who order large-sized drinks
- Men who order regular-sized drinks
- Women who order large-sized drinks
- Women who order regular-sized drinks

For the sake of this example, we assume that each person only orders one drink.

Each of these groups makes up some proportion of the population. Assuming that these four groups are the only groups that are possible, the four proportions must sum to exactly 1. Notice that if you add up all the numbers in the four central cells, they add up to 1.0.

Suppose that the four central cells in the table below shows the *true* proportions of each of these four groups (ignore the cells on the bottom and right margins for now). For instance, the proportion of the population who are men that order large drinks is
or

These four probabilities are called "joint" probabilities, because they are the probabilities which correspond to jointly knowing two relevant features of a person. Joint probabilities often use the word "and"; for example, if I ask "What is the probability that a patron is a woman *and* orders a large drink?" This probability is
or
We can write joint probabilities in mathematical notation like this: \[ Pr(A, B) \] which can be read as "the probability of both events \(A\) and \(B\) jointly occurring."

With joint probabilities, we can ask questions about the probabilities of events occurring together. Some examples of questions regarding joint probabilities are

- "What is the probability that a patron is a man and orders a small drink?"
- "What is the probability that the all students will pass the test tomorrow?" Here, we are asking about all students jointly.
- "What is the probability when rolling two dice that both show 1?"
- "What is the probability of that a randomly selected person will have HIV and test positively for HIV?"

All of the above are examples of joint probabilities, because they ask about the probability of two or more events occurring jointly.

Occasionally your boss comes to you to ask you some questions about the makeup of the patron population for the stand. From the central four cells, it is easy to ask question about joint probabilities. If your boss asked you "What is the probability that the next patron will be a man and will order a small drink?", you know immediately from looking at the table that the probability is

Suppose, on the other hand, that your boss comes to you and asks, "What is the probability that the next participant will be a man?". Contrasting this question to the questions in the previous section reveals a key difference: in this case, your boss is not interested in the size of drink ordered by the patron. The answer to your boss's question can be easily determined from the information in the table. There are two types of partons who are male: those that order a large drink, and those that order a small drink. Since each patron can only be in one category, we can obtain the proportion of men by adding up these two proportions. Since the proportion of men who order large drinks is and the proportion of men who order small drinks is the probability of a parton being a man is

In the table, you'll notice that the sums of the cells in each column are given in the bottom margin, and the sums of the cells in each row are given in the right margin. These probabilities are called *marginal* probabilities.

Marginal probabilities can be thought of as probabilities of one factor, while ignoring (or "marginalizing out") other factors. This simple idea can be translated into mathematical notation, using Sigma notation. \[ Pr(A) = \sum_{k=1}^K Pr(A,B_k) \] where \(K\) is the number of total levels of factor \(B\). In our example, \(A\) is the event that a patron is a man, each \(B_k\) is a size of drink, and \(K=2\) because there are two sizes of drinks. This notation is just a formal way to say "sum the probabilities of being a man over all sizes of drinks."

One day, you are working at your coffee stand, and a woman walks up to the stand. Before she orders, what size drink should you reach for? To answer this question, we could rephrase the question in several other ways:

- "What is the probability that she will order a large (or small) drink given that you already know she is a woman?"
- "What is the proportion of women patrons order large (or small) drinks?"

These questions are not exactly about either joint probabilities nor marginal probabilities, although, as you'll see, we can use joint and marginal probabilities to answer them. In these questions, we know one piece of information: that the customer is a woman. We would like to work out, given this information, what the probability is that she will choose a particular size of drink.

The first thing to notice is that the probabilities involving men in the table are completely irrelevant to answering this problem. We know that the patron is a woman; it seems, then, that we should be able to concentrate only on the cells of the table involving women. That leaves the two cells in the right column, which together comprise the proportion of the population: the marginal proportion of women.

The main idea behind conditioning is that we restrict our attention to only the cells which are consistent with the information we know: that is, the cells in the right column. If we treat women as if they were the entire population, what proportion of this new, restricted population orders large drinks? in order to answer this question, we need the joint probability that a patron is a woman *and* orders a large drink, and the marginal probability that a patron is a woman. "Women who order large drinks" is the group for which we want to find our conditional probability, and "women" is the population of interest. We can thus find the proportion of women who order large drinks by simple division:
This is the probability that the patron will order a large drink, *conditional* on (or "given") the fact that she is a woman.

We can take generalize the preceding example into a formula, which is the definition of conditional probability: \[ Pr(A\mid B) = \frac{Pr(A,B)}{Pr(B)} \] This equation has three parts, which I describe in turn:

- The left hand side of the equation is the notation for "the probability of \(A\) given \(B\)"; the vertical line \(\mid\) means "given".
- The denominator of the right hand side, \(Pr(B)\), is the marginal probability of the event we are conditioning on. The denominator corresponds to restricting our attention to a particular row (or column). In our example, this was the probability of that the patron is a woman.
- The numerator of the right hand side, \(Pr(A, B)\), is the joint probability that both of the events in question are true. In our example, this was the probability that a patron is a woman and orders a large drink.

If you move your mouse over the column names or row names in the table, you can see the conditional probabilities computed automatically. Moving your mouse over the column names, for example, will show the probabilities of ordering a large (or regular) drink given that the patron is a man (left column) or a woman (right column). Moving your mouse over the row names will show the probabilities that the patron is a man (or woman) given that the patron ordered a large drink (top row) or a regular drink (bottom row). This should enable you to answer the following question: "You are told that a patron just ordered a regular drink. What is the probability that the patron is a woman?"

Using the definition of conditional probability, we can rewrite our equation for marginal probabilities. Since \(Pr(A\mid B)Pr(B) = Pr(A,B)\), that means that

This equation is often called the law of total probability. It means that if I want to know the probability of some event \(A\) (say, the probability that a child will have dichromacy), all I need to know is the probability of the event given all possible scenarios (for example, the sex of the child), and the probabilities of those scenarios.

The probability of a male child having dichromacy is about 0.024, and the probability of a female child having dichromacy is 0.0003. If the probability of a male child is 0.52, then the marginal probability of a child having dichromacy is

\( \begin{eqnarray} Pr(A) &=& \sum_{k=1}^K Pr(A\mid B_k)Pr(B_k)\\ &=&Pr(\mbox{dichromacy}\mid \mbox{male})Pr(\mbox{male}) + Pr(\mbox{dichromacy}\mid \mbox{female})Pr(\mbox{female})\\ &=&0.024\times 0.52 + 0.0003\times 0.48\\ &\approx&0.013 \end{eqnarray} \)

That is, the marginal probability that a child will have dichromatic vision is about 1.3%.

Conditional probability is a critical concept for Bayesian data analysis. All conditional probability statements are statements about the probability of some unknown quantity of interest given something known. In data analysis, our known values are the data we collect. The unknown quantity of interest may be the true effect of a drug, the expected value of a population, or the number of whales in the ocean. Conditional probability gives us a way of stating the problem: "Given the data I observed, what do I believe about the effect of this drug?"; Bayesian statistics gives us a way of getting an answer, and the foundation of Bayesian statistics is Bayes' theorem.