§ Elementary probability theory (TODO)

I've never learnt elementary probability theory "correctly". This is me attempting to fix it.

§ Defn: Sample space

set of all possible outcomes / things that could happen.

§ Defn: Outcome / Sample point / atomic event

An outcome consists of all the information about the experiment after it has been performed including the values of all random choices.

§ NOTE: Keeping straight event v/s outcome

It's easy to get confused between 'event' and 'outcome' (linguistically). I personally remember that one of them is the element of the sample space and another the subsets, but I can't remember which is which. Here's how I recall which is which:
every experiment has an outcome . We write an outcome section when we write a lab manual/lab record for a given experiment.
Now, we when perform an expriment, or something random happens, sometimes, the result (ie, the outcome) can be eventful ; it's not linguistically right to say that some events can be outcomeful . So, an event is a predicate over the set of outcomes; event: outcome -> bool. This is the same as being a subset of outcomes (the event is identified with the set of outcomes it considers eventful), so we have event ~= 2^outcomes.

§ Example: Monty hall

An outcome of the monty hall game when the the contestant switches consists of:
  • the box with the prize.
  • the box chosen by the contestant.
  • the box that was revealed.
Once we know the three things, we know everything that happened. For example, the sample point (2,1,3)(2, 1, 3):
  • the prize is in box 2
  • the player first picks box 1
  • the assistant, Carol, reveals box 3.
  • The contestant wins, because we're assuming the player switches. Hnce, they will switch from their initial choice of (1) to (2).
Note that not all 3-tuples correspond to sample points. For example,
  • (1,2,1)(1, 2, 1) is not a sample point, because we can't reveal the box with the prize.
  • (2,1,1)(2, 1, 1) is not a sample point, because we can't reveal the box the player chose.
  • (1,1,2),(1,1,3)(1, 1, 2), (1, 1, 3) is OK. The player chooses the correct box, carol reveals some box, and then the player switches.

§ Constructing the sample space: tree method

We build a decision tree.

§ where is the prize?

(prize 1)
(prize 2)
(prize 3)

§ player's choice

(prize 1
   (choice 1)
   (choice 2)
   (choice 3))
(prize 2
   (choice 1)
   (choice 2)
   (choice 3))
(prize 3
   (choice 1)
   (choice 2)
   (choice 3))

§ Which box is revealed

(prize 1
   (choice 1
      (reveal 2)
      (reveal 3))
   (choice 2
      (reveal 3))
   (choice 3)
      (reveal 2))
(prize 2
   (choice 1
     (reveal 3)
   (choice 2
     (reveal 1)
      (reveal 3))
   (choice 3)
     (reveal 1))
(prize 3
   (choice 1
     (reveal 2))
   (choice 2
     (reveal 1))
   (choice 3)
     (reveal 1)
     (reveal 2))

§ Win/Loss

(prize 1
   (choice 1
loss  (reveal 2)
loss  (reveal 3))
   (choice 2
win   (reveal 3))
   (choice 3)
win   (reveal 2))
(prize 2
   (choice 1
win  (reveal 3)
   (choice 2
loss (reveal 1)
loss  (reveal 3))
   (choice 3)
win  (reveal 1))
(prize 3
   (choice 1
win  (reveal 2))
   (choice 2
win  (reveal 1))
   (choice 3)
loss (reveal 1)
loss (reveal 2))
This seems like it's 50/50! But what we're missing is the likelihood of an outcome.

§ Defn: Probability space

A probability space consists of a sample space (space of al outcomes) and a probability function PP that maps the sample space to the real numbers, such that:
  • For every outcome, the probability is between zero and one.
  • The sum of all the probabilities is one.
Interpretation : For every outcome, the P(outcome)P(outcome) is the probability of that outcome happening in an experiment.

§ Assumptions for monty hall

  • Carol put the prize uniformly randomly. Probability 1/3.
  • No matter where the prize is, the player picks each box with probability 1/3.
  • No matter where the prize is, the box that carol reveals will be picked uniformly randomly. Probability 1/2.

§ Assigning probabilities to each edge

(prize 1 [1/3]
   (choice 1 [1/3]
l  (reveal 2)   [1/2]
l  (reveal 3))  [1/2]
   (choice 2 [1/3]
w   (reveal 3)) [1]
   (choice 3) [1/3]
w   (reveal 2)) [1]
(prize 2 [1/3]
   (choice 1
w  (reveal 3)
   (choice 2
l (reveal 1)
l  (reveal 3))
   (choice 3)
w  (reveal 1))
(prize 3  [1/3]
   (choice 1
w  (reveal 2))
   (choice 2
w  (reveal 1))
   (choice 3)
l (reveal 1)
l (reveal 2))

§ Assigning probabilities to each outcome

  • Probability for a sample point is the product of probabilities leading to the outcome
(prize 1 [1/3]
   (choice 1 [1/3]
l  (reveal 2)   [1/2]: 1/18
l  (reveal 3))  [1/2]: 1/18
   (choice 2 [1/3]
w   (reveal 3)) [1]: 1/9
   (choice 3) [1/3]
w   (reveal 2)) [1]: 1/9
...
So the probability of winning is going to be 6×1/9=236 \times 1/9 = \frac{2}{3}.

§ Defn: Event

An event is a subset of the sample space.
  • For example, ElE_l is the event that the person loses in Monty Hall.

§ Probability of an event

The probability that an event EE occurs is the sum of the probabilities of the sample points of the event: P(E)eEP(e)P(E) \equiv \sum_{e \in E} P(e).

§ What about staying?

I win 2/32/3rds of the time when I switch . If I don't switch, I must have lost. So if I choose to stay, then I lose 2/32/3rds of the time. We're using that
  • P(win with switch)=P(lose with stick)P(\texttt{win with switch}) = P(\texttt{lose with stick}).

§ Gambing game

  • Dice AA: {2,6,7}\{2, 6, 7\}.
\ 2  /
 \  /
6 \/ 7
  ||
it's the same on the reverse side. It's a fair dice. So the probability of getting 22 is a third. Similarly for 6,76, 7.
  • Dice BB: {1,5,9}\{1, 5, 9 \}.
  • Dice CC: {3,4,8}\{3, 4, 8 \}.
  • We both dice. The higher dice wins. Loser pays the winner a dollar.

§ Analysis: Dice A v/s Dice C

  • Dice AA followed by dice CC:
(2
  (3
   4
   8))
(6
  (3
   4
   8))
(7
  (3
   4
   8))
  • Assign winning
(2
  (3    C
   4    C
   8))  C
(6
  (3    A
   4    A
   8))  C
(7
  (3    A
   4    A
   8))  C
Each of the outcomes has a probability 1/91/9, so dice CC wins.

§ Lecture 19: Conditional probability

P(A|B) where both A and B are events, read as probability of A given B.
P(AB)P(AB)P(B) P(A|B) \equiv \frac{P(A \cap B)}{P(B)}
We know BB happens so we normalize by BB. We then intersect AA with BB because we want both AA and BB to have happened, so we consider all outcomes that both AA and BB consider eventful, and then reweigh the probability such that our definition of "all possible outcomes" is simply "outcomes in BB".
  • A quick calculation shows us that P(BB)=P(BB)/Pr(B)=1P(B|B) = P(B \cap B)/Pr(B) =1.

§ Product Rule

P(AB)=P(B)P(AB) P(A \cap B) = P(B) P(A|B)
follows from the definition by rearranging.

§ General Product Rule

P(A1A2An)=P(A1)P(A2A1)P(A3A2A1)P(A4A3A2A1)P(AnA1An1) P(A_1 \cap A_2 \dots A_n) = P(A_1) P(A_2 | A_1) P(A_3 | A_2 \cap A_1) P(A_4 | A_3 \cap A_2 \cap A_1) \dots P(A_n | A_1 \cap \dots \cap A_{n-1})

§ Example 1:

In a best two out of three series, the probability of winning the first game is 1/21/2. The probability of winning a game immediately after a victory is 2/32/3. Probability of winning after a loss is 1/31/3. What is the probability of winning given that we win in the first game? Tree method:
(W1
  (W2)
   (L2
      (W3
      L3)))
(L1
  (W2)
   (L2
      (W3
      L3)))
(L1)
The product rule sneakily uses conditional probability! P(W1W2)=P(W1)P(W2W1)P(W_1W_2) = P(W_1) P(W_2|W_1). Etc, solve the problem.

§ Definition: Independence

events AA, BB are independent if P(AB)=P(A)P(A|B) = P(A) or P(B)=0P(B) = 0.

§ Disjointness and independence

Disjoint events are never independent, because P(AB)=0P(A|B) = 0 while P(A)P(A) need not be zero.

§ What do indepdent events look like?

We know that we need P(AB)=P(A)P(A|B) = P(A). We know that P(AB)P(A|B) is how much of AA is within BB. So we will have P(AB)=P(A)P(A|B) = P(A) if the space that AA occupies in the sample space is the same proprtion of AA that occupies BB. Euphimistically, A/S=(AB)/BA/S = (A \cap B)/B.

§ Independence and intersection

If AA is independent of BB then P(AB)=P(A)P(B)P(A\cap B) = P(A) P(B).
P(A)=P(AB)(given)P(A)=P(AB)/P(B)(defn of computing P(AB))P(A)P(B)=P(AB)(rearrange) \begin{aligned} P(A) = P(A|B) \text{(given)} \\ P(A) = P(A \cap B) / P(B) \text{(defn of computing $P(A|B)$)} \\ P(A) P(B) = P(A \cap B) \text{(rearrange)} \\ \end{aligned}

§ Are these two independent?

  • A = event coins match
  • B = event that the first coin is heads.
Intuitively it seems that these should be dependent because knowing something about the first coin should tells us if the coins match. P(AB)P(A|B) is the probability that (second coin is heads) which is 1/21/2. P(A)=1/2P(A) = 1/2. But our intuition tells us that these should be different!

§ Be suspect! Try general coins

Let prob. of heads is pp and tails is (1p)(1-p) for both coins. P(AB)=pP(A|B) = p, while P(A)=p2+(1p)2P(A) = p^2 + (1-p)^2.

§ Mutual independence

Events A1,A2,AnA_1, A_2, \dots A_n are mutually independent if any knowledge about any of the rest of the events tells us anything about the iith event.

§ Random variables

A random variable RR is a function from the sample space SS to R\mathbb R. We can create equivalence classes of the fibers of RR. Each of this is an event, since it's a subset of the sample space. Thus, P(R=x)P(R = x) = P(R1(x))=w:R(w)=xP(w)P(R^{-1}(x)) = \sum_{w: R(w) = x} P(w)

§ Independence of random variables

x1,x2R,P(R1=x1R2=x2)=P(R1=x1) \forall x_1, x_2 \in \mathbb R, P(R_1 = x_1 | R_2 = x_2) = P(R_1 = x_1)
Slogan: No value of R2R_2 can influence any value of R1R_1.

§ Equivalent definition of independence:

P(R1=x1R2=x2)=P(R1=x1)P(R2=x2) P(R_1 = x_1 \land R_2 = x_2) = P(R_1 = x_1) P(R_2 = x_2)

§ References