## § Z algorithm

- The
`Z`

algorithm, for a given string $s$, computes a function $Z: [len(s)] \rightarrow [len(s)]$. - $Z[i]$ is the length of the longest common prefix between $S$ and $S[i:]$.
- So, $S[0] = S[i]$, $S[1] = S[i+1]$, $S[2] = S[i+2]$, and so on till $S[Z[i]] = S[i + Z[i]]$, and then $S[Z[i]+1] \neq S[i + Z[i] + 1]$.

- If we can compute the
`Z`

function for a string, we can then check if pattern `P`

is a substring of text `T`

by constructing the string `P#T$`

. Then, if we have an index such that `Z[i] = len(P)`

, we know that at that index, we have the string `P`

as a substring.

- Note that the
`Z`

-algorithm computes the `Z`

function in *linear time *.

- The key idea of the Z algorithm is to see that if we are at an index
`i`

, but we have an index `l < i`

such that `i < l + z[l]`

, then we have that `s[0:z[l]] = s[l:l+z[l]]`

. Thus, we are "in the shade" of the `l`

.

- In this situation, we can reuse
`z[i-l]`

as a seed for `z[i]`

. There are two cases: `i + z[i-l] > l + z[l]`

and the converse. - If
`i + z[i-l] < l + z[l]`

, then we are still "in the shade" of `l`

, so we can safely set `z[i] = z[i-l]`

. - If not, we set
`z[i] = l + z[l]`

, since we know that we match *at least * this much of the beginning of the string.

#### § Specification

```
vector<int> calcz(std::string s) {
const int n = s.size();
vector<int> z(n);
z[0] = 0;
for(int i = 1; i < s.size(); ++i) {
z[i] = 0;
while(i + z[i] < n && s[i+z[i]] == s[z[i]]) {
z[i]++;
}
}
return z;
}
```

#### § Implementation

```
vector<int> myz(std::string s) {
const int n = s.size();
vector<int> z(n);
for(int i = 0; i < n; ++i) { z[i] = 0; }
int l = 0;
for(int i = 1; i < n; ++i) {
z[i] = max(0, min(l + z[l] - i, z[i-l]));
while (i + z[i] < n && s[z[i]] == s[i + z[i]]) { z[i]++; }
if (i + z[i] >= l + z[l]) { l = i; }
}
return z;
}
```

- Reference: Algorithms on strings, trees, and sequences.