§ Elementary probability theory (TODO)
I've never learnt elementary probability theory "correctly". This is me
attempting to fix it.
§ Defn: Sample space
set of all possible outcomes / things that could happen.
§ Defn: Outcome / Sample point / atomic event
An outcome consists of all the information about the experiment
after it has been performed including the values of all random choices.
§ NOTE: Keeping straight event v/s outcome
It's easy to get confused between 'event' and 'outcome' (linguistically). I
personally remember that one of them is the element of the sample space and
another the subsets, but I can't remember which is which. Here's how I
recall which is which:
every experiment has an outcome . We write an outcome section when we
write a lab manual/lab record for a given experiment.
Now, we when perform an expriment, or something random happens, sometimes,
the result (ie, the outcome) can be eventful ; it's not linguistically
right to say that some events can be outcomeful .
So, an event is a predicate over the set of outcomes; event: outcome -> bool
.
This is the same as being a subset of outcomes (the event is identified
with the set of outcomes it considers eventful), so we have event ~= 2^outcomes
.
§ Example: Monty hall
An outcome of the monty hall game when the the contestant switches consists of:
- the box with the prize.
- the box chosen by the contestant.
- the box that was revealed.
Once we know the three things, we know everything that happened.
For example, the sample point (2,1,3):
- the prize is in box 2
- the player first picks box 1
- the assistant, Carol, reveals box 3.
- The contestant wins, because we're assuming the player switches. Hnce, they will switch from their initial choice of (1) to (2).
Note that not all 3-tuples correspond to sample points. For example,
- (1,2,1) is not a sample point, because we can't reveal the box with the prize.
- (2,1,1) is not a sample point, because we can't reveal the box the player chose.
- (1,1,2),(1,1,3) is OK. The player chooses the correct box, carol reveals some box, and then the player switches.
§ Constructing the sample space: tree method
We build a decision tree.
§ where is the prize?
(prize 1)
(prize 2)
(prize 3)
§ player's choice
(prize 1
(choice 1)
(choice 2)
(choice 3))
(prize 2
(choice 1)
(choice 2)
(choice 3))
(prize 3
(choice 1)
(choice 2)
(choice 3))
§ Which box is revealed
(prize 1
(choice 1
(reveal 2)
(reveal 3))
(choice 2
(reveal 3))
(choice 3)
(reveal 2))
(prize 2
(choice 1
(reveal 3)
(choice 2
(reveal 1)
(reveal 3))
(choice 3)
(reveal 1))
(prize 3
(choice 1
(reveal 2))
(choice 2
(reveal 1))
(choice 3)
(reveal 1)
(reveal 2))
§ Win/Loss
(prize 1
(choice 1
loss (reveal 2)
loss (reveal 3))
(choice 2
win (reveal 3))
(choice 3)
win (reveal 2))
(prize 2
(choice 1
win (reveal 3)
(choice 2
loss (reveal 1)
loss (reveal 3))
(choice 3)
win (reveal 1))
(prize 3
(choice 1
win (reveal 2))
(choice 2
win (reveal 1))
(choice 3)
loss (reveal 1)
loss (reveal 2))
This seems like it's 50/50! But what we're missing is the likelihood of an
outcome.
§ Defn: Probability space
A probability space consists of a sample space (space of al outcomes) and a
probability function P that maps the sample space to the real numbers, such that:
- For every outcome, the probability is between zero and one.
- The sum of all the probabilities is one.
Interpretation : For every outcome, the P(outcome) is the probability of
that outcome happening in an experiment.
§ Assumptions for monty hall
- Carol put the prize uniformly randomly. Probability 1/3.
- No matter where the prize is, the player picks each box with probability 1/3.
- No matter where the prize is, the box that carol reveals will be picked uniformly randomly. Probability 1/2.
§ Assigning probabilities to each edge
(prize 1 [1/3]
(choice 1 [1/3]
l (reveal 2) [1/2]
l (reveal 3)) [1/2]
(choice 2 [1/3]
w (reveal 3)) [1]
(choice 3) [1/3]
w (reveal 2)) [1]
(prize 2 [1/3]
(choice 1
w (reveal 3)
(choice 2
l (reveal 1)
l (reveal 3))
(choice 3)
w (reveal 1))
(prize 3 [1/3]
(choice 1
w (reveal 2))
(choice 2
w (reveal 1))
(choice 3)
l (reveal 1)
l (reveal 2))
§ Assigning probabilities to each outcome
- Probability for a sample point is the product of probabilities leading to the outcome
(prize 1 [1/3]
(choice 1 [1/3]
l (reveal 2) [1/2]: 1/18
l (reveal 3)) [1/2]: 1/18
(choice 2 [1/3]
w (reveal 3)) [1]: 1/9
(choice 3) [1/3]
w (reveal 2)) [1]: 1/9
...
So the probability of winning is going to be 6×1/9=32.
§ Defn: Event
An event is a subset of the sample space.
- For example, El is the event that the person loses in Monty Hall.
§ Probability of an event
The probability that an event E occurs is the sum of the probabilities of the
sample points of the event: P(E)≡∑e∈EP(e).
§ What about staying?
I win 2/3rds of the time when I switch . If I don't switch, I must have lost.
So if I choose to stay, then I lose 2/3rds of the time. We're using that
- P(win with switch)=P(lose with stick).
§ Gambing game
- Dice A: {2,6,7}.
\ 2 /
\ /
6 \/ 7
||
it's the same on the reverse side. It's a fair dice. So the probability of
getting 2 is a third. Similarly for 6,7.
- Dice B: {1,5,9}.
- Dice C: {3,4,8}.
- We both dice. The higher dice wins. Loser pays the winner a dollar.
§ Analysis: Dice A v/s Dice C
- Dice A followed by dice C:
(2
(3
4
8))
(6
(3
4
8))
(7
(3
4
8))
(2
(3 C
4 C
8)) C
(6
(3 A
4 A
8)) C
(7
(3 A
4 A
8)) C
Each of the outcomes has a probability 1/9, so dice C wins.
§ Lecture 19: Conditional probability
P(A|B)
where both A
and B
are events, read as probability of A
given B
.
P(A∣B)≡P(B)P(A∩B)
We know B happens so we normalize by B. We then intersect A with B
because we want both A and B to have happened, so we consider all outcomes
that both A and B consider eventful, and then reweigh the probability such
that our definition of "all possible outcomes" is simply "outcomes in B".
- A quick calculation shows us that P(B∣B)=P(B∩B)/Pr(B)=1.
§ Product Rule
P(A∩B)=P(B)P(A∣B)
follows from the definition by rearranging.
§ General Product Rule
P(A1∩A2…An)=P(A1)P(A2∣A1)P(A3∣A2∩A1)P(A4∣A3∩A2∩A1)…P(An∣A1∩⋯∩An−1)
§ Example 1:
In a best two out of three series, the probability of winning the first
game is 1/2. The probability of winning a game immediately after a victory is 2/3.
Probability of winning after a loss is 1/3. What is the probability of winning
given that we win in the first game?
Tree method:
(W1
(W2)
(L2
(W3
L3)))
(L1
(W2)
(L2
(W3
L3)))
(L1)
The product rule sneakily uses conditional probability! P(W1W2)=P(W1)P(W2∣W1).
Etc, solve the problem.
§ Definition: Independence
events A, B are independent if P(A∣B)=P(A) or P(B)=0.
§ Disjointness and independence
Disjoint events are never independent, because P(A∣B)=0 while P(A) need not
be zero.
§ What do indepdent events look like?
We know that we need P(A∣B)=P(A). We know that P(A∣B) is how much of A
is within B. So we will have P(A∣B)=P(A) if the space that A occupies
in the sample space is the same proprtion of A that occupies B. Euphimistically,
A/S=(A∩B)/B.
§ Independence and intersection
If A is independent of B then P(A∩B)=P(A)P(B).
P(A)=P(A∣B)(given)P(A)=P(A∩B)/P(B)(defn of computing P(A∣B))P(A)P(B)=P(A∩B)(rearrange)
§ Are these two independent?
- A = event coins match
- B = event that the first coin is heads.
Intuitively it seems that these should be dependent because knowing something
about the first coin should tells us if the coins match.
P(A∣B) is the probability that (second coin is heads) which is 1/2.
P(A)=1/2.
But our intuition tells us that these should be different!
§ Be suspect! Try general coins
Let prob. of heads is p and tails is (1−p) for both coins.
P(A∣B)=p, while P(A)=p2+(1−p)2.
§ Mutual independence
Events A1,A2,…An are mutually independent if any knowledge
about any of the rest of the events tells us anything about the ith event.
§ Random variables
A random variable R is a function from the sample space S to R. We can
create equivalence classes of the fibers of R. Each of this is an event,
since it's a subset of the sample space. Thus, P(R=x) = P(R−1(x))=∑w:R(w)=xP(w)
§ Independence of random variables
∀x1,x2∈R,P(R1=x1∣R2=x2)=P(R1=x1)
Slogan: No value of R2 can influence any value of R1.
§ Equivalent definition of independence:
P(R1=x1∧R2=x2)=P(R1=x1)P(R2=x2)
§ References