§ Myhill Nerode Theorem
- Take a language $L$ over an alphabet $A$.
- Define $x \in A^\star$ to have a disginguishing extension from $y \in A^\star$iff there exists a $s \in A^\star$ such that either (a) $xs \in L \land ys \not \in L$, or $xs \not \in L \land ys \in L$
- Said differently, given $x$ and $y$, there suffix $s$ which can distinguish $x$ and $y$ using the membership oracle for $L$.
- Now define $x \sim_L y$ ( $x$ is indistinguishable from $y$) iff there is no distinguishing extension between $x$ and $y$ with respect to $L$.
- See that this is an equivalence relation:
- (1) Reflexivity : $x$ cannot be distinguished from $x$ (using $L$), because any suffix $s$ cannot produce different outputs for $x$ and $x$.
- (2) Symmetry : If $x$ cannot be disguished from $y$ (using $L$), then $y$ cannot be distinguished from $x$ (using $L$).
- (3) Transitivity : If $x$ cannot be distinguished from $y$ (using $L$) and $y$ cannot be distinguished from $z$ (using $L$), then $x$ cannot be distinguished from $z$ (using $L$). Intuition: What about $y$ in this situation? It can't be indistinguishable from both $x$ and $z$.
- Proof of Transitivity: Suppose for contradiction $x$ can be distinguished from $z$. There there is a suffix $s$ such that $xs \in L$ while $zs \not \in L$ (WLOG). Now what about $ys$? if $ys \in L$ then we can distinguish $y$and $z$, contradicting assumption that $y$ is indistinguishable from $z$. If $y \not \in L$ then we can distinguish $x$ from $y$ contradicting assumption that $x$ is indistinguishable from $y$.
- Hence, being indistinguishable is an equivalence relation, denoted by $x \sim y$.
- Myhill nerode says that the minimal DFA for a language $L$ has as many states as there are equivalence classes for $\sim_L$.
§ Given DFA $D$ of language $L$ over $A$: $\sim_D$ implies $\sim_L$
- Let $L$ be a regular langugage over alphabet $A$ and let $D$ be a DFA (with finite number of states $|D|$ in the DFA).
- Partition the set of all strings $A*\star$ via the relation $\sim_D$: $x \sim_D y$ iff $x$ and $y$ end at the same state when fed to DFA.
- Thus will have $|D|$ equivalence classes for $\sim_D$, one for each state of the DFA.
- For any two strings such that $x \sim_D y$, given any suffix $s$, we have that $xs \sim_D ys$since we start at the same state at $x$ (or $y$) and continue when we feed the suffix $s$.
- Thus, strings such that $x \sim_D y$ are indistinguishable for the DFA.
- So we have $x \sim_L y$, since on any extension, both $xs$ and $ys$ either belong or don't belong to $L$.
§ given language $L$ over $A$: show that $\sim_L$ implies $\sim_D$.
- Let $L$ be a langugage over alphabet $A$ such that $\sim_L$ has finitely many equivalence classes.
- We will design DFA (called $D$)for $L$ with as many states as equivalence classes.
- The start state of $D$ is the equivalence class of the empty string with respect to $\sim_L$.
- At an equivalence class/state $T$, given character $c \in A$, we move to $T \diamond c$ (extend $T$by $c$).
- Formally, $\delta(T, c) \equiv T \diamond c$, where $T \diamond c \equiv \{ tc : t \in T \}$
- This is well defined, as if $t \sim_L t'$ are in the equivalence class $T$, then we must have $tc \sim_L t'c$.
- Suppose $tc$ is distinguishable from $t'c$ by a suffix $s$. Then we can distinguish between $t$and $t'$ via suffix $cs$. This contradicts $t \sim_L t'$. Thus we have that $t \sim_L t$ implies $tc \sim_L t'c$.
- Thus our transition function $\delta$ is well defined over equivalence classes $A/\sim_L$.
- A state $T$ in $D$ is accepting if the state contains \emph{any} string $l \in L$.
- That is, $T$ is accepting iff there exists a $l \in L$ such that $l \in T$.
- In this case, we infer that $T \subseteq L$, or any string in $T$ is accepted by $L$.
- Suppose for contradiction that $l \in L$ such that $l \in T$ while also having a string $z \in T$, $z \not in L$.
- the empty string would distinguish the two strings $l, z$ which contradicts $z \sim_L l$since they are both in the equivalence class $T$.
- Thus for a regular language $L$ there is a DFA $D$ which accepts strings from $L$ and has number of states $|D|$ equal to number of equivalence clases $|A/\sim_L|$.
§ DFA needs at least $|A/\sim_L|$ states
- Let $n \equiv |A/\sim_L|$ be the number of equivalence classes of $\sim_L$.
- Suppose a DFA $D$ recongizes $L$ and has fewer than $|A/\sim_L|$ states.
- Let $x_1, x_2, \dots x_n$ be strings from different equivalence classes of $L$.
- Push these strings through $D$. Some two strings $x_i, x_j$ must land on the state state $d \in D$of $D$ (by pigeonhole).
- We must have $x_i$ and $x_j$ distinguishable, since they come from different equivalence classes. So the DFA must accept one and reject the other.
- But the DFA can't tell the difference between $x_i$ and $x_j$ since they landed on the same state! So the DFA will accept or reject both.
- Thus, we have a contradiction from the assumptions (a) $D$ has fewer states than $n$ and (b) $D$ recognizes $L$.
- Thus the DFA needs at least $|A/\sim_L|$ states.
§ The two imply DFA minimization
- We have seen that every DFA for $L$ needs at least $|A/\sim L|$ states.
- Now starting from $L$, we can build an automata $D^\star$ such that $|D^\star|$ is exactly $|A/\sim L|$.
- Thus the automata $D^\star$ is a (the) minimal automata for $L$.