## § Elementary probability theory (TODO)

I've never learnt elementary probability theory "correctly". This is me attempting to fix it.

#### § Defn: Sample space

set of all possible outcomes / things that could happen.

#### § Defn: Outcome / Sample point / atomic event

An outcome consists of all the information about the experiment after it has been performed including the values of all random choices.

#### § NOTE: Keeping straight event v/s outcome

It's easy to get confused between 'event' and 'outcome' (linguistically). I personally remember that one of them is the element of the sample space and another the subsets, but I can't remember which is which. Here's how I recall which is which:
every experiment has an outcome . We write an outcome section when we write a lab manual/lab record for a given experiment.
Now, we when perform an expriment, or something random happens, sometimes, the result (ie, the outcome) can be eventful ; it's not linguistically right to say that some events can be outcomeful . So, an event is a predicate over the set of outcomes; event: outcome -> bool. This is the same as being a subset of outcomes (the event is identified with the set of outcomes it considers eventful), so we have event ~= 2^outcomes.

### § Example: Monty hall

An outcome of the monty hall game when the the contestant switches consists of:
• the box with the prize.
• the box chosen by the contestant.
• the box that was revealed.
Once we know the three things, we know everything that happened. For example, the sample point $(2, 1, 3)$:
• the prize is in box 2
• the player first picks box 1
• the assistant, Carol, reveals box 3.
• The contestant wins, because we're assuming the player switches. Hnce, they will switch from their initial choice of (1) to (2).
Note that not all 3-tuples correspond to sample points. For example,
• $(1, 2, 1)$ is not a sample point, because we can't reveal the box with the prize.
• $(2, 1, 1)$ is not a sample point, because we can't reveal the box the player chose.
• $(1, 1, 2), (1, 1, 3)$ is OK. The player chooses the correct box, carol reveals some box, and then the player switches.

### § Constructing the sample space: tree method

We build a decision tree.

#### § where is the prize?

(prize 1)
(prize 2)
(prize 3)


#### § player's choice

(prize 1
(choice 1)
(choice 2)
(choice 3))
(prize 2
(choice 1)
(choice 2)
(choice 3))
(prize 3
(choice 1)
(choice 2)
(choice 3))



#### § Which box is revealed

(prize 1
(choice 1
(reveal 2)
(reveal 3))
(choice 2
(reveal 3))
(choice 3)
(reveal 2))
(prize 2
(choice 1
(reveal 3)
(choice 2
(reveal 1)
(reveal 3))
(choice 3)
(reveal 1))
(prize 3
(choice 1
(reveal 2))
(choice 2
(reveal 1))
(choice 3)
(reveal 1)
(reveal 2))


#### § Win/Loss

(prize 1
(choice 1
loss  (reveal 2)
loss  (reveal 3))
(choice 2
win   (reveal 3))
(choice 3)
win   (reveal 2))
(prize 2
(choice 1
win  (reveal 3)
(choice 2
loss (reveal 1)
loss  (reveal 3))
(choice 3)
win  (reveal 1))
(prize 3
(choice 1
win  (reveal 2))
(choice 2
win  (reveal 1))
(choice 3)
loss (reveal 1)
loss (reveal 2))

This seems like it's 50/50! But what we're missing is the likelihood of an outcome.

#### § Defn: Probability space

A probability space consists of a sample space (space of al outcomes) and a probability function $P$ that maps the sample space to the real numbers, such that:
• For every outcome, the probability is between zero and one.
• The sum of all the probabilities is one.
Interpretation : For every outcome, the $P(outcome)$ is the probability of that outcome happening in an experiment.

#### § Assumptions for monty hall

• Carol put the prize uniformly randomly. Probability 1/3.
• No matter where the prize is, the player picks each box with probability 1/3.
• No matter where the prize is, the box that carol reveals will be picked uniformly randomly. Probability 1/2.

#### § Assigning probabilities to each edge

(prize 1 [1/3]
(choice 1 [1/3]
l  (reveal 2)   [1/2]
l  (reveal 3))  [1/2]
(choice 2 [1/3]
w   (reveal 3)) 
(choice 3) [1/3]
w   (reveal 2)) 
(prize 2 [1/3]
(choice 1
w  (reveal 3)
(choice 2
l (reveal 1)
l  (reveal 3))
(choice 3)
w  (reveal 1))
(prize 3  [1/3]
(choice 1
w  (reveal 2))
(choice 2
w  (reveal 1))
(choice 3)
l (reveal 1)
l (reveal 2))


#### § Assigning probabilities to each outcome

• Probability for a sample point is the product of probabilities leading to the outcome
(prize 1 [1/3]
(choice 1 [1/3]
l  (reveal 2)   [1/2]: 1/18
l  (reveal 3))  [1/2]: 1/18
(choice 2 [1/3]
w   (reveal 3)) : 1/9
(choice 3) [1/3]
w   (reveal 2)) : 1/9
...

So the probability of winning is going to be $6 \times 1/9 = \frac{2}{3}$.

#### § Defn: Event

An event is a subset of the sample space.
• For example, $E_l$ is the event that the person loses in Monty Hall.

#### § Probability of an event

The probability that an event $E$ occurs is the sum of the probabilities of the sample points of the event: $P(E) \equiv \sum_{e \in E} P(e)$.

I win $2/3$rds of the time when I switch . If I don't switch, I must have lost. So if I choose to stay, then I lose $2/3$rds of the time. We're using that
• $P(\texttt{win with switch}) = P(\texttt{lose with stick})$.

#### § Gambing game

• Dice $A$: $\{2, 6, 7\}$.
\ 2  /
\  /
6 \/ 7
||

it's the same on the reverse side. It's a fair dice. So the probability of getting $2$ is a third. Similarly for $6, 7$.
• Dice $B$: $\{1, 5, 9 \}$.
• Dice $C$: $\{3, 4, 8 \}$.
• We both dice. The higher dice wins. Loser pays the winner a dollar.

#### § Analysis: Dice A v/s Dice C

• Dice $A$ followed by dice $C$:
(2
(3
4
8))
(6
(3
4
8))
(7
(3
4
8))

• Assign winning
(2
(3    C
4    C
8))  C
(6
(3    A
4    A
8))  C
(7
(3    A
4    A
8))  C

Each of the outcomes has a probability $1/9$, so dice $C$ wins.

#### § Lecture 19: Conditional probability

P(A|B) where both A and B are events, read as probability of A given B.
$P(A|B) \equiv \frac{P(A \cap B)}{P(B)}$
We know $B$ happens so we normalize by $B$. We then intersect $A$ with $B$ because we want both $A$ and $B$ to have happened, so we consider all outcomes that both $A$ and $B$ consider eventful, and then reweigh the probability such that our definition of "all possible outcomes" is simply "outcomes in $B$".
• A quick calculation shows us that $P(B|B) = P(B \cap B)/Pr(B) =1$.

#### § Product Rule

$P(A \cap B) = P(B) P(A|B)$
follows from the definition by rearranging.

#### § General Product Rule

$P(A_1 \cap A_2 \dots A_n) = P(A_1) P(A_2 | A_1) P(A_3 | A_2 \cap A_1) P(A_4 | A_3 \cap A_2 \cap A_1) \dots P(A_n | A_1 \cap \dots \cap A_{n-1})$

#### § Example 1:

In a best two out of three series, the probability of winning the first game is $1/2$. The probability of winning a game immediately after a victory is $2/3$. Probability of winning after a loss is $1/3$. What is the probability of winning given that we win in the first game? Tree method:
(W1
(W2)
(L2
(W3
L3)))
(L1
(W2)
(L2
(W3
L3)))
(L1)

The product rule sneakily uses conditional probability! $P(W_1W_2) = P(W_1) P(W_2|W_1)$. Etc, solve the problem.

#### § Definition: Independence

events $A$, $B$ are independent if $P(A|B) = P(A)$ or $P(B) = 0$.

#### § Disjointness and independence

Disjoint events are never independent, because $P(A|B) = 0$ while $P(A)$ need not be zero.

#### § What do indepdent events look like?

We know that we need $P(A|B) = P(A)$. We know that $P(A|B)$ is how much of $A$ is within $B$. So we will have $P(A|B) = P(A)$ if the space that $A$ occupies in the sample space is the same proprtion of $A$ that occupies $B$. Euphimistically, $A/S = (A \cap B)/B$.

#### § Independence and intersection

If $A$ is independent of $B$ then $P(A\cap B) = P(A) P(B)$.
\begin{aligned} P(A) = P(A|B) \text{(given)} \\ P(A) = P(A \cap B) / P(B) \text{(defn of computing P(A|B))} \\ P(A) P(B) = P(A \cap B) \text{(rearrange)} \\ \end{aligned}

#### § Are these two independent?

• A = event coins match
• B = event that the first coin is heads.
Intuitively it seems that these should be dependent because knowing something about the first coin should tells us if the coins match. $P(A|B)$ is the probability that (second coin is heads) which is $1/2$. $P(A) = 1/2$. But our intuition tells us that these should be different!

#### § Be suspect! Try general coins

Let prob. of heads is $p$ and tails is $(1-p)$ for both coins. $P(A|B) = p$, while $P(A) = p^2 + (1-p)^2$.

#### § Mutual independence

Events $A_1, A_2, \dots A_n$ are mutually independent if any knowledge about any of the rest of the events tells us anything about the $i$th event.

#### § Random variables

A random variable $R$ is a function from the sample space $S$ to $\mathbb R$. We can create equivalence classes of the fibers of $R$. Each of this is an event, since it's a subset of the sample space. Thus, $P(R = x)$ = $P(R^{-1}(x)) = \sum_{w: R(w) = x} P(w)$

#### § Independence of random variables

$\forall x_1, x_2 \in \mathbb R, P(R_1 = x_1 | R_2 = x_2) = P(R_1 = x_1)$
Slogan: No value of $R_2$ can influence any value of $R_1$.

#### § Equivalent definition of independence:

$P(R_1 = x_1 \land R_2 = x_2) = P(R_1 = x_1) P(R_2 = x_2)$