A Universe of Sorts

§ VC dimension

Consider a ground set

X

. Let the space of all possible binary classifications be the function space

C \equiv \{ f \mid f : X \rightarrow \pm 1 \}

. Now, a hypothesis class

H

is a subset of

C

. For example, some model such as "return

+1

if a point is inside a region,

-1

otherwise" is a subset of the full class

C

. The VC dimension of

H

measures how good

H

is generating different classification. We first need the notion of shattering to define this. A subset

S \subseteq X

of the ground set shatters a hypothesis class

H

if the function

act_S

has full range, where

act_S

is defined as:

act_S: H \rightarrow |S|^{\{0, 1\}} act_S(h) = (h(s_0), h(s_1), h(s_2), \dots, h(s_n))

That is, the hypothesis class

H

can classify all the subsets of

S

. Now the VC dimension of the hypothesis class $H$ of a ground set $X$ is the size of largest possible $S \subseteq X$ such that

S

is shattered by

H

§ Correct interpretation

We need just one set $S$ of size $n$ to be shattered by $H$ . We get to pick the set $S$ .

§ Subtletly 1:

We do not need all sets of size $n$ to be shattered by $H$ .

We can have the case where:

All sets of size 3 are shattered by H
Only one set of size 4 is shattered by H. All other sets of size 4 are not.
Only one size of size 5 is shattered by H. All other sets of size 5 are not.
No set of size 6 is shattered by H.

In this case, the VC dimension of

H

is 5, not 3 .

§ Subtletly 2:

We cannot have the case where:

All sets of size 3 are shattered by H
No set of size 4 is shattered by H
Some set of size 5 is shattered by H

For contradiction, let

S

be the set of size

5

that is shattered by

H

. Let

T \subsetneq S

|T| = 4

. Now,

H

shatters

T

since

H

shatters

S

. Hence, Some set of size 4 has been shattered. Contradiction, since we assumed that no set of size 4 is shattered by

H

. So, to prove that sets of size

(\geq n)

cannot be shattered, it suffices to prove that sets of size equal to

n

cannot be shattered.

§ Growth of number of sets shattered in $|S|$ for $S \subseteq X$ for a fixed $H$ .

If we fix a hypothesis class

H

for

X

, and we want to understand how

H

varies over subsets of

X

, the idea is this: Let

S

be a set that is the maximum sized set that is shattered by

X

. ie,

|S| = Vcdim(H)

and

H

shatters

S

. Now, the idea is this:

For subsets $T \subseteq S$ , $|act_T(H)| = 2^{|T|}$ -- exponential.
For subpersets $S \subsetneq Sup$ , $|act_{Sup}(H) = Comb(|Sup|, |S)$ -- polynomial.

We can show that this exponential/polynomial behaviour happens in general for

S \subseteq X

§ VC dimension

§ Correct interpretation

§ Subtletly 1:

§ Subtletly 2:

§ Growth of number of sets shattered in ∣S∣|S|∣S∣ for S⊆XS \subseteq XS⊆X for a fixed HHH.

§ Growth of number of sets shattered in $|S|$ for $S \subseteq X$ for a fixed $H$ .