§ VC dimension
Consider a ground set X. Let the space of all possible binary classifications
be the function space C≡{f∣f:X→±1}.
Now, a hypothesis class H is a subset of C. For example, some model
such as "return +1 if a point is inside a region, −1 otherwise" is a subset
of the full class C.
The VC dimension of H measures how good H is generating different classification.
We first need the notion of shattering to define this.
A subset S⊆X of the ground set shatters a hypothesis class H
if the function actS has full range, where actS is defined as:
actS:H→∣S∣{0,1}actS(h)=(h(s0),h(s1),h(s2),…,h(sn))
That is, the hypothesis class H can classify all the subsets of S.
Now the VC dimension of the hypothesis class H of a ground set X is
the size of largest possible S⊆X such that S is shattered
by H.
§ Correct interpretation
- We need just one set S of size n to be shattered by H. We get to pick the set S.
§ Subtletly 1:
- We do not need all sets of size n to be shattered by H.
We can have the case where:
- All sets of size 3 are shattered by H
- Only one set of size 4 is shattered by H. All other sets of size 4 are not.
- Only one size of size 5 is shattered by H. All other sets of size 5 are not.
- No set of size 6 is shattered by H.
In this case, the VC dimension of H is 5, not 3 .
§ Subtletly 2:
We cannot have the case where:
- All sets of size 3 are shattered by H
- No set of size 4 is shattered by H
- Some set of size 5 is shattered by H
For contradiction, let S be the set of size 5 that is shattered by H.
Let T⊊S, ∣T∣=4. Now, H shatters T since H shatters S.
Hence, Some set of size 4 has been shattered. Contradiction, since we assumed
that no set of size 4 is shattered by H.
So, to prove that sets of size (≥n) cannot be shattered, it suffices
to prove that sets of size equal to n cannot be shattered.
§ Growth of number of sets shattered in ∣S∣ for S⊆X for a fixed H.
If we fix a hypothesis class H for X, and we want to understand how H
varies over subsets of X, the idea is this:
Let S be a set that is the maximum sized set that is shattered by X. ie,
∣S∣=Vcdim(H) and H shatters S.
Now, the idea is this:
- For subsets T⊆S, ∣actT(H)∣=2∣T∣ -- exponential.
- For subpersets S⊊Sup, ∣actSup(H)=Comb(∣Sup∣,∣S) -- polynomial.
We can show that this exponential/polynomial behaviour happens in general
for S⊆X.