A Universe of Sorts

§ Myhill Nerode Theorem

Take a language $L$ over an alphabet $A$ .
Define $x \in A^\star$ to have a disginguishing extension from $y \in A^\star$ iff there exists a $s \in A^\star$ such that either (a) $xs \in L \land ys \not \in L$ , or $xs \not \in L \land ys \in L$
Said differently, given $x$ and $y$ , there suffix $s$ which can distinguish $x$ and $y$ using the membership oracle for $L$ .
Now define $x \sim_L y$ ( $x$ is indistinguishable from $y$ ) iff there is no distinguishing extension between $x$ and $y$ with respect to $L$ .
See that this is an equivalence relation:
(1) Reflexivity : $x$ cannot be distinguished from $x$ (using $L$ ), because any suffix $s$ cannot produce different outputs for $x$ and $x$ .
(2) Symmetry : If $x$ cannot be disguished from $y$ (using $L$ ), then $y$ cannot be distinguished from $x$ (using $L$ ).
(3) Transitivity : If $x$ cannot be distinguished from $y$ (using $L$ ) and $y$ cannot be distinguished from $z$ (using $L$ ), then $x$ cannot be distinguished from $z$ (using $L$ ). Intuition: What about $y$ in this situation? It can't be indistinguishable from both $x$ and $z$ .
Proof of Transitivity: Suppose for contradiction $x$ can be distinguished from $z$ . There there is a suffix $s$ such that $xs \in L$ while $zs \not \in L$ (WLOG). Now what about $ys$ ? if $ys \in L$ then we can distinguish $y$ and $z$ , contradicting assumption that $y$ is indistinguishable from $z$ . If $y \not \in L$ then we can distinguish $x$ from $y$ contradicting assumption that $x$ is indistinguishable from $y$ .
Hence, being indistinguishable is an equivalence relation, denoted by $x \sim y$ .
Myhill nerode says that the minimal DFA for a language $L$ has as many states as there are equivalence classes for $\sim_L$ .

§ Given DFA $D$ of language $L$ over $A$ : $\sim_D$ implies $\sim_L$

Let $L$ be a regular langugage over alphabet $A$ and let $D$ be a DFA (with finite number of states $|D|$ in the DFA).
Partition the set of all strings $A*\star$ via the relation $\sim_D$ : $x \sim_D y$ iff $x$ and $y$ end at the same state when fed to DFA.
Thus will have $|D|$ equivalence classes for $\sim_D$ , one for each state of the DFA.
For any two strings such that $x \sim_D y$ , given any suffix $s$ , we have that $xs \sim_D ys$ since we start at the same state at $x$ (or $y$ ) and continue when we feed the suffix $s$ .
Thus, strings such that $x \sim_D y$ are indistinguishable for the DFA.
So we have $x \sim_L y$ , since on any extension, both $xs$ and $ys$ either belong or don't belong to $L$ .

§ given language $L$ over $A$ : show that $\sim_L$ implies $\sim_D$ .

Let $L$ be a langugage over alphabet $A$ such that $\sim_L$ has finitely many equivalence classes.
We will design DFA (called $D$ )for $L$ with as many states as equivalence classes.
The start state of $D$ is the equivalence class of the empty string with respect to $\sim_L$ .
At an equivalence class/state $T$ , given character $c \in A$ , we move to $T \diamond c$ (extend $T$ by $c$ ).
Formally, $\delta(T, c) \equiv T \diamond c$ , where $T \diamond c \equiv \{ tc : t \in T \}$
This is well defined, as if $t \sim_L t'$ are in the equivalence class $T$ , then we must have $tc \sim_L t'c$ .
Suppose $tc$ is distinguishable from $t'c$ by a suffix $s$ . Then we can distinguish between $t$ and $t'$ via suffix $cs$ . This contradicts $t \sim_L t'$ . Thus we have that $t \sim_L t$ implies $tc \sim_L t'c$ .
Thus our transition function $\delta$ is well defined over equivalence classes $A/\sim_L$ .
A state $T$ in $D$ is accepting if the state contains \emph{any} string $l \in L$ .
That is, $T$ is accepting iff there exists a $l \in L$ such that $l \in T$ .
In this case, we infer that $T \subseteq L$ , or any string in $T$ is accepted by $L$ .
Suppose for contradiction that $l \in L$ such that $l \in T$ while also having a string $z \in T$ , $z \not in L$ .
the empty string would distinguish the two strings $l, z$ which contradicts $z \sim_L l$ since they are both in the equivalence class $T$ .
Thus for a regular language $L$ there is a DFA $D$ which accepts strings from $L$ and has number of states $|D|$ equal to number of equivalence clases $|A/\sim_L|$ .

§ DFA needs at least $|A/\sim_L|$ states

Let $n \equiv |A/\sim_L|$ be the number of equivalence classes of $\sim_L$ .
Suppose a DFA $D$ recongizes $L$ and has fewer than $|A/\sim_L|$ states.
Let $x_1, x_2, \dots x_n$ be strings from different equivalence classes of $L$ .
Push these strings through $D$ . Some two strings $x_i, x_j$ must land on the state state $d \in D$ of $D$ (by pigeonhole).
We must have $x_i$ and $x_j$ distinguishable, since they come from different equivalence classes. So the DFA must accept one and reject the other.
But the DFA can't tell the difference between $x_i$ and $x_j$ since they landed on the same state! So the DFA will accept or reject both.
Thus, we have a contradiction from the assumptions (a) $D$ has fewer states than $n$ and (b) $D$ recognizes $L$ .
Thus the DFA needs at least $|A/\sim_L|$ states.

§ The two imply DFA minimization

We have seen that every DFA for $L$ needs at least $|A/\sim L|$ states.
Now starting from $L$ , we can build an automata $D^\star$ such that $|D^\star|$ is exactly $|A/\sim L|$ .
Thus the automata $D^\star$ is a (the) minimal automata for $L$ .

§ Myhill Nerode Theorem

§ Given DFA DDD of language LLL over AAA: ∼D\sim_D∼D​ implies ∼L\sim_L∼L​

§ given language LLL over AAA: show that ∼L\sim_L∼L​ implies ∼D\sim_D∼D​.

§ DFA needs at least ∣A/∼L∣|A/\sim_L|∣A/∼L​∣ states

§ The two imply DFA minimization

§ Given DFA $D$ of language $L$ over $A$ : $\sim_D$ implies $\sim_L$

§ given language $L$ over $A$ : show that $\sim_L$ implies $\sim_D$ .

§ DFA needs at least $|A/\sim_L|$ states