(1578 words)

In this post we will review some of the most basic concepts of Probability Theory.

**Note:** the goal of this post is not to teach Probability Theory from scratch, but rather highlight important concepts in order to help build useful abstractions for more advanced topics or serve as a quick reminder for those who need it.

### Classroom Example:

Consider a classroom of 4th grade students that consists of 4 boys and 6 girls. Now assume that the teacher of the class selects one boy and one girl at random. We can now ask various different questions about the identity of the boy and girl that were selected. For example:

- What are the chances of picking boy number 3 and girl number 6? (this is a futuristic class where people have no names, only numbers)
- What are the chances of picking boy number 2?
- What are the chances of picking a boy that is a Justin Bieber fan? (it is well known that boys 2 and 4 are great Justin Bieber fans)
- What are the chances of picking a girl that is good at math? (it is also known that girls 1,4,5 and 6 are amazing at math)
- What are the chances of picking a boy who is a Justin Bieber fan and picking a girl who is good at math?
- What are the chances of picking either a boy who is a Justin Bieber fan or a girl who is good at math?

These are the types of questions Probability Theory tried to address, and by answering them we will familiarize ourselves with the important concepts and thinking strategies of the field.

When thinking about probability, the **first** and **most important** thing we should be asking ourselves about is **“What are the options?”**.

Answering this simple question usually just makes answering any subsequent questions much easier. The space of all possible options is usually denoted as Ω.

So what are the options in our 4th grade classroom case?

As we said, we have 4 boys and 6 girls in our classroom, one (boy, girl) pair is selected at random by the teacher, so the list of all possible options is given by the above table. 24 total options.

So, let’s ask ourselves the first question *“What are the chances of picking boy number 3 and girl number 6?”*

The option that we inquired about is marked in red, and it’s probability is given by:

Let’s continue to the second question we asked previously * “What are the chances of picking boy number 2?”. *here we don’t mind about the identity of the girl that is being picked, it can be either girl 1 or 2 or any other girls.

Let’s mark the options that will satisfy the condition that boy number 2 is picked:

And therefore the probability is:

Note that you could alternatively go about this question differently: We have 4 boys, one of them is chosen at random, we ask what are the chances of selecting one particular boy and we get the same result. Why didn’t we do that?

Here I would like to highlight another very important point: if we have a complex multidimensional problem (here the teacher is selecting one boy and one girl, each of those selections can be though of as a separate dimension), asking questions about only one of the dimensions (in this case “who is the boy?”) can be done by summing over the probabilities of all options of the other dimensions (in this case “who is the girl?”). This is sometimes called marginalization and can be summarized by the following equation:

This is large because it’s important.

Let’s continue with our questions: * “What are the chances of picking a boy that is a Justin Bieber fan?”. *Recall that boys 2 and 4 are Justin Bieber fans.

What are the options that satisfy our question’s condition?

If we define the group of Justin Bieber fans as JBF = {boy2, boy4}, we can write the probability as:

Next question ** “What are the chances of picking a girl that is good at math?”** Recall that girls 1,4,5 and 6 are amazing at math.

What are the options that satisfy our question’s condition?

If we define the group of girls who are good at math as GAM = {girl1, girl4, girl5, girl6}, we can write the probability as:

Next question *“What are the chances of picking a boy who is a Justin Bieber fan and picking a girl who is good at math?”*

What are the options that satisfy our question’s condition?

What is the probability?

Next question *“What are the chances of picking either a boy who is a Justin Bieber fan or a girl who is good at math?”*

What are the options that satisfy our question’s condition?

What is the probability?

We can summarize to ourselves the process once more:

- Understand the space of all possible options of our scenario
- Understand what are all options that answer a desired question or condition
- Simply sum up the probability of all options that answer that condition

Note: Marginalization is just a specific condition where we care only about what happens to a specific variable, and therefore we sum over all other options of the other variables.

### Balls In A Bucket Example:

Consider a bucket in which there are 3 green balls and 5 yellow balls. Imagine a person taking out a first ball and holding it in his left hand, and then reaching out with his right hand for a second ball. In the end, he has two balls, one in each hand, The first is left hand, and the second in his right hand. Similarly to the previous example, we can ask several questions regarding this scenario as well:

- What are the chances of the ball in the left hand (1st draw) being yellow?
- What are the chances of the ball in the left hand (1st draw) being yellow
**and**the ball in the right hand (2nd draw) being green? - If the 1st ball was green, what are the chances of the 2nd ball being green as well?
- What are the chances of the 2nd ball being yellow?
- What are the chances of having one green ball in any of the hands and one yellow ball in the other hand (i.e. regardless of the order we drew them)

These questions already feel a lot more challenging, The reason is the fact that in this scenario we have dependency between the first and second draws.

Regardless of this pickle we are in, like in previous example, the **first** and **most important** thing we should ask ourselves is **“What are the options?”**

In this case, unlike the previous example, it’s not immediately clear what are the chances of each of these options. But there are some things that are immediately clear, like the chance of the 1st draw being green (we have 8 balls, 3 of them are green, and therefore the chances are 3/8) and some things are very easy to calculate by “mentally simulating” the scenario, like calculating the chance of the second draw being green given that the first draw was also green (if the first draw is green, there are 2 green balls left in the bucket out of a total of 7 balls that are left in the bucket, and therefore the chances are 2/7).

Let’s list all of the things that are immediately calculable from the question and the dynamics of how it was presented to us:

But how can we get from these probabilities to the probabilities of all possible options? for this, we will use the Bayes Rule:

Again, this is lagrebecause it’s important.

We can think of the Bayes Rule as a story that is unfolding in time.

The **probability of two things** to happen, is the **probability of the first** of them to happen **multiplied** by the **probability of the second**, **provided** that **the first** event has **already occurred** and is known. More specifically, in our case:

Now that we have the probability of all of the options happening, we can move on with answering the questions:

*“What are the chances of the ball in the left hand (1st draw) being yellow?”*

What are the options that satisfy our question’s condition?

And the probability is just summing up the the probability of these two individual cases.

Next question *“*What are the chances of the ball in the left hand (1st draw) being yellow and the ball in the right hand (2nd draw) being green?*“*

What are the options that satisfy our question’s condition?

And the probability is just the probability of this particular case.

Next question *“*If the 1st ball was green, what are the chances of the 2nd ball being green as well?*“*

We have already answered this question previously by “mentally simulating” the dynamics of the question.

Next question *“What are the chances of the 2nd ball being yellow?”*

What are the options that satisfy our question’s condition?

And the probability is just summing up the the probability of these two individual cases.

Next question *“What are the chances of having one green ball in any of the hands and one yellow ball in the other hand (i.e. regardless of the order we drew them)”*

What are the options that satisfy our question’s condition?

And the probability is just summing up the the probability of these two individual cases.

We can see that once we have the answer to **“What are the options?”** and we also know the probability that is assigned to of all of these options, it’s very easy to sum up probabilities and answer almost any question about the problem.