§ VC dimension


Consider a ground set XX. Let the space of all possible binary classifications be the function space C{ff:X±1}C \equiv \{ f \mid f : X \rightarrow \pm 1 \}. Now, a hypothesis class HH is a subset of CC. For example, some model such as "return +1+1 if a point is inside a region, 1-1 otherwise" is a subset of the full class CC.
The VC dimension of HH measures how good HH is generating different classification. We first need the notion of shattering to define this.
A subset SXS \subseteq X of the ground set shatters a hypothesis class HH if the function actSact_S has full range, where actSact_S is defined as:
actS:HS{0,1}actS(h)=(h(s0),h(s1),h(s2),,h(sn)) act_S: H \rightarrow |S|^{\{0, 1\}} act_S(h) = (h(s_0), h(s_1), h(s_2), \dots, h(s_n))

That is, the hypothesis class HH can classify all the subsets of SS.
Now the VC dimension of the hypothesis class HH of a ground set XX is the size of largest possible SXS \subseteq X such that SS is shattered by HH.

§ Correct interpretation


§ Subtletly 1:


We can have the case where:

In this case, the VC dimension of HH is 5, not 3 .

§ Subtletly 2:


We cannot have the case where:
For contradiction, let SS be the set of size 55 that is shattered by HH. Let TST \subsetneq S, T=4|T| = 4. Now, HH shatters TT since HH shatters SS. Hence, Some set of size 4 has been shattered. Contradiction, since we assumed that no set of size 4 is shattered by HH.
So, to prove that sets of size (n)(\geq n) cannot be shattered, it suffices to prove that sets of size equal to nn cannot be shattered.

§ Growth of number of sets shattered in S|S| for SXS \subseteq X for a fixed HH.


If we fix a hypothesis class HH for XX, and we want to understand how HH varies over subsets of XX, the idea is this:
Let SS be a set that is the maximum sized set that is shattered by XX. ie, S=Vcdim(H)|S| = Vcdim(H) and HH shatters SS.
Now, the idea is this:

We can show that this exponential/polynomial behaviour happens in general for SXS \subseteq X.