§ Hypothesis Testing
§ Mnemonic for type I versus type II errors
- Once something becomes "truth", challenging the status quo and making it "false" is very hard. (see: disinformation).
- Thus, Science must have high barriers for accepting hypothesis as true.
- That is, we must have high barries for incorrectly rejecting the null (that nothing happened).
- This error is called as type I error, and is denoted by α (more important error).
- The other type of error, where something is true, but we conclude it is false is less important. Some grad student can run the experiment again with better experimental design and prove it's true later if need be.
- Our goal is to protect science from entrenching/enshrining "wrong" facts as true. Thus, we control type I errors.
- Our goal is to "reject" current theories (the null) and create "new theories" (the alternative). Thus, in statistics, we setup our tests with the goal of enabling us to "reject the null".
§ Mnemonic for remembering the procedure
- H0 is the null hypothesis (null for zero). They are presumed innocent until proven guilty.
- If H0 is judged guilty, we reject them (from society) and send them to the gulag.
- If H0 is judged not guilty, we retain them (in society).
- We are the prosecution, who are trying to reject H0 (from society) to send them to the gulag.
- The scientific /statistical process is the Judiciary which is attempting to keep the structure of "innocent until proven guilty" for H0.
- We run experiments, and we find out how likely it is that H0 is guilty based on our experiments.
- We calculate an error α, which is the probably we screw up the fundamental truth of the court: we must not send an innocent man to the gulag. Thus, α it the probability that H0 is innocent (ie, true) but we reject it (to the gulag).
§ P value, Neyman interpretation
- Now, suppose we wish to send H0 to the gulag, because we're soviet union like that. What's the probability we're wrong in doing so? (That is, what is the probability that us sending H0 is innocent and we are condemning them incorrectly to a life in the gulag)? that's the p value. We estimate this based on our expeiment, of course.
- Remember, we can never speak of the "probability of H0 being true/false", because H0 is true or is false [frequentist ]. There is no probability.
§ P value, Fisher interpretation
- The critical region of the test corresponds to those values of the test statistic that would lead us to reject null hypothesis (and send it to the gulag).
- Thus, the critical region is also sometimes called the "rejection region", since we reject H0 from society if the test statistic lies in this region.
- The rejection region is usually corresponds to the tails of the sampling distribution.
- The reason for that is that a good critical region almost always corresponds to those values of the test statistic that are least likely to be observed if the null hypothesis is true. This will be the "tails" / "non central tendency" if a test is good.
- In this situation, we define the p value to be the probability we would have observed a test statistic that is at least as extreme as the one we did get.
P(new test stat >= cur test stat)
. - ??? I don't get it.
§ P value, completely wrong edition
- "Probability that the null hypothesis is true" --- WRONG
- compare to "probability us rejecting the null hypothesis is wrong" -- CORRECT. The probability is in US being wrong, and has NOTHING to do with the truth or falsity of the null hypothesis itself .
§ Power of the test
- The value β is the probability that H0 was guilty, but we chose to retain them into society instead.
- The less we do this (ie, the larger is 1−β), the more "power" our test has.