A Universe of Sorts

§ Elementary probability theory (TODO)

I've never learnt elementary probability theory "correctly". This is me attempting to fix it.

§ Defn: Sample space

set of all possible outcomes / things that could happen.

§ Defn: Outcome / Sample point / atomic event

An outcome consists of all the information about the experiment after it has been performed including the values of all random choices.

§ NOTE: Keeping straight event v/s outcome

It's easy to get confused between 'event' and 'outcome' (linguistically). I personally remember that one of them is the element of the sample space and another the subsets, but I can't remember which is which. Here's how I recall which is which:

every experiment has an outcome . We write an outcome section when we
write a lab manual/lab record for a given experiment.

Now, we when perform an expriment, or something random happens, sometimes, the result (ie, the outcome) can be eventful ; it's not linguistically right to say that some events can be outcomeful .
So, an event is a predicate over the set of outcomes; event: outcome -> bool. This is the same as being a subset of outcomes (the event is identified with the set of outcomes it considers eventful), so we have event ~= 2^outcomes.

§ Example: Monty hall

An outcome of the monty hall game when the the contestant switches consists of:

the box with the prize.
the box chosen by the contestant.
the box that was revealed.

Once we know the three things, we know everything that happened.
For example, the sample point

(2, 1, 3)

the prize is in box 2
the player first picks box 1
the assistant, Carol, reveals box 3.
The contestant wins, because we're assuming the player switches. Hnce, they will switch from their initial choice of (1) to (2).

Note that not all 3-tuples correspond to sample points. For example,

$(1, 2, 1)$ is not a sample point, because we can't reveal the box with the prize.
$(2, 1, 1)$ is not a sample point, because we can't reveal the box the player chose.
$(1, 1, 2), (1, 1, 3)$ is OK. The player chooses the correct box, carol reveals some box, and then the player switches.

§ Constructing the sample space: tree method

We build a decision tree.

§ where is the prize?

(prize 1)
(prize 2)
(prize 3)

§ player's choice

(prize 1
   (choice 1)
   (choice 2)
   (choice 3))
(prize 2
   (choice 1)
   (choice 2)
   (choice 3))
(prize 3
   (choice 1)
   (choice 2)
   (choice 3))

§ Which box is revealed

(prize 1
   (choice 1
      (reveal 2)
      (reveal 3))
   (choice 2
      (reveal 3))
   (choice 3)
      (reveal 2))
(prize 2
   (choice 1
     (reveal 3)
   (choice 2
     (reveal 1)
      (reveal 3))
   (choice 3)
     (reveal 1))
(prize 3
   (choice 1
     (reveal 2))
   (choice 2
     (reveal 1))
   (choice 3)
     (reveal 1)
     (reveal 2))

§ Win/Loss

(prize 1
   (choice 1
loss  (reveal 2)
loss  (reveal 3))
   (choice 2
win   (reveal 3))
   (choice 3)
win   (reveal 2))
(prize 2
   (choice 1
win  (reveal 3)
   (choice 2
loss (reveal 1)
loss  (reveal 3))
   (choice 3)
win  (reveal 1))
(prize 3
   (choice 1
win  (reveal 2))
   (choice 2
win  (reveal 1))
   (choice 3)
loss (reveal 1)
loss (reveal 2))

This seems like it's 50/50! But what we're missing is the likelihood of an outcome.

§ Defn: Probability space

A probability space consists of a sample space (space of al outcomes) and a probability function

P

that maps the sample space to the real numbers, such that:

For every outcome, the probability is between zero and one.
The sum of all the probabilities is one.

Interpretation : For every outcome, the

P(outcome)

is the probability of that outcome happening in an experiment.

§ Assumptions for monty hall

Carol put the prize uniformly randomly. Probability 1/3.
No matter where the prize is, the player picks each box with probability 1/3.
No matter where the prize is, the box that carol reveals will be picked uniformly randomly. Probability 1/2.

§ Assigning probabilities to each edge

(prize 1 [1/3]
   (choice 1 [1/3]
l  (reveal 2)   [1/2]
l  (reveal 3))  [1/2]
   (choice 2 [1/3]
w   (reveal 3)) [1]
   (choice 3) [1/3]
w   (reveal 2)) [1]
(prize 2 [1/3]
   (choice 1
w  (reveal 3)
   (choice 2
l (reveal 1)
l  (reveal 3))
   (choice 3)
w  (reveal 1))
(prize 3  [1/3]
   (choice 1
w  (reveal 2))
   (choice 2
w  (reveal 1))
   (choice 3)
l (reveal 1)
l (reveal 2))

§ Assigning probabilities to each outcome

Probability for a sample point is the product of probabilities leading to the outcome

(prize 1 [1/3]
   (choice 1 [1/3]
l  (reveal 2)   [1/2]: 1/18
l  (reveal 3))  [1/2]: 1/18
   (choice 2 [1/3]
w   (reveal 3)) [1]: 1/9
   (choice 3) [1/3]
w   (reveal 2)) [1]: 1/9
...

So the probability of winning is going to be

6 \times 1/9 = \frac{2}{3}

§ Defn: Event

An event is a subset of the sample space.

For example, $E_l$ is the event that the person loses in Monty Hall.

§ Probability of an event

The probability that an event

E

occurs is the sum of the probabilities of the sample points of the event:

P(E) \equiv \sum_{e \in E} P(e)

§ What about staying?

I win

2/3

rds of the time when I switch . If I don't switch, I must have lost. So if I choose to stay, then I lose

2/3

rds of the time. We're using that

$P(\texttt{win with switch}) = P(\texttt{lose with stick})$ .

§ Gambing game

Dice $A$ : $\{2, 6, 7\}$ .

\ 2  /
 \  /
6 \/ 7
  ||

it's the same on the reverse side. It's a fair dice. So the probability of getting

2

is a third. Similarly for

6, 7

Dice $B$ : $\{1, 5, 9 \}$ .
Dice $C$ : $\{3, 4, 8 \}$ .

We both dice. The higher dice wins. Loser pays the winner a dollar.

§ Analysis: Dice A v/s Dice C

Dice $A$ followed by dice $C$ :

(2
  (3
   4
   8))
(6
  (3
   4
   8))
(7
  (3
   4
   8))

Assign winning

(2
  (3    C
   4    C
   8))  C
(6
  (3    A
   4    A
   8))  C
(7
  (3    A
   4    A
   8))  C

Each of the outcomes has a probability

1/9

, so dice

C

wins.

§ Lecture 19: Conditional probability

P(A|B) where both A and B are events, read as probability of A given B.

P(A|B) \equiv \frac{P(A \cap B)}{P(B)}

We know

B

happens so we normalize by

B

. We then intersect

A

with

B

because we want both

A

and

B

to have happened, so we consider all outcomes that both

A

and

B

consider eventful, and then reweigh the probability such that our definition of "all possible outcomes" is simply "outcomes in

B

A quick calculation shows us that $P(B|B) = P(B \cap B)/Pr(B) =1$ .

§ Product Rule

P(A \cap B) = P(B) P(A|B)

follows from the definition by rearranging.

§ General Product Rule

P(A_1 \cap A_2 \dots A_n) = P(A_1) P(A_2 | A_1) P(A_3 | A_2 \cap A_1) P(A_4 | A_3 \cap A_2 \cap A_1) \dots P(A_n | A_1 \cap \dots \cap A_{n-1})

§ Example 1:

In a best two out of three series, the probability of winning the first game is

1/2

. The probability of winning a game immediately after a victory is

2/3

. Probability of winning after a loss is

1/3

. What is the probability of winning given that we win in the first game?
Tree method:

(W1
  (W2)
   (L2
      (W3
      L3)))
(L1
  (W2)
   (L2
      (W3
      L3)))
(L1)

The product rule sneakily uses conditional probability!

P(W_1W_2) = P(W_1) P(W_2|W_1)

. Etc, solve the problem.

§ Definition: Independence

events

A

B

are independent if

P(A|B) = P(A)

P(B) = 0

§ Disjointness and independence

Disjoint events are never independent, because

P(A|B) = 0

while

P(A)

need not be zero.

§ What do indepdent events look like?

We know that we need

P(A|B) = P(A)

. We know that

P(A|B)

is how much of

A

is within

B

. So we will have

P(A|B) = P(A)

if the space that

A

occupies in the sample space is the same proprtion of

A

that occupies

B

. Euphimistically,

A/S = (A \cap B)/B

§ Independence and intersection

A

is independent of

B

then

P(A\cap B) = P(A) P(B)

\begin{aligned} P(A) = P(A|B) \text{(given)} \\ P(A) = P(A \cap B) / P(B) \text{(defn of computing $P(A|B)$)} \\ P(A) P(B) = P(A \cap B) \text{(rearrange)} \\ \end{aligned}

§ Are these two independent?

A = event coins match
B = event that the first coin is heads.

Intuitively it seems that these should be dependent because knowing something about the first coin should tells us if the coins match.

P(A|B)

is the probability that (second coin is heads) which is

1/2

P(A) = 1/2

.
But our intuition tells us that these should be different!

§ Be suspect! Try general coins

Let prob. of heads is

p

and tails is

(1-p)

for both coins.

P(A|B) = p

, while

P(A) = p^2 + (1-p)^2

§ Mutual independence

Events

A_1, A_2, \dots A_n

are mutually independent if any knowledge about any of the rest of the events tells us anything about the

i

th event.

§ Random variables

A random variable

R

is a function from the sample space

S

\mathbb R

. We can create equivalence classes of the fibers of

R

. Each of this is an event, since it's a subset of the sample space. Thus,

P(R = x)

P(R^{-1}(x)) = \sum_{w: R(w) = x} P(w)

§ Independence of random variables

\forall x_1, x_2 \in \mathbb R, P(R_1 = x_1 | R_2 = x_2) = P(R_1 = x_1)

Slogan: No value of $R_2$ can influence any value of $R_1$ .

§ Equivalent definition of independence:

P(R_1 = x_1 \land R_2 = x_2) = P(R_1 = x_1) P(R_2 = x_2)