§ Z algorithm
- The
Z
algorithm, for a given string s, computes a function Z:[len(s)]→[len(s)]. - Z[i] is the length of the longest common prefix between S and S[i:].
- So, S[0]=S[i], S[1]=S[i+1], S[2]=S[i+2], and so on till S[Z[i]]=S[i+Z[i]], and then S[Z[i]+1]=S[i+Z[i]+1].
- If we can compute the
Z
function for a string, we can then check if pattern P
is a substring of text T
by constructing the string P#T$
. Then, if we have an index such that Z[i] = len(P)
, we know that at that index, we have the string P
as a substring.
- Note that the
Z
-algorithm computes the Z
function in linear time .
- The key idea of the Z algorithm is to see that if we are at an index
i
, but we have an index l < i
such that i < l + z[l]
, then we have that s[0:z[l]] = s[l:l+z[l]]
. Thus, we are "in the shade" of the l
.
- In this situation, we can reuse
z[i-l]
as a seed for z[i]
. There are two cases: i + z[i-l] > l + z[l]
and the converse. - If
i + z[i-l] < l + z[l]
, then we are still "in the shade" of l
, so we can safely set z[i] = z[i-l]
. - If not, we set
z[i] = l + z[l]
, since we know that we match at least this much of the beginning of the string.
§ Specification
vector<int> calcz(std::string s) {
const int n = s.size();
vector<int> z(n);
z[0] = 0;
for(int i = 1; i < s.size(); ++i) {
z[i] = 0;
while(i + z[i] < n && s[i+z[i]] == s[z[i]]) {
z[i]++;
}
}
return z;
}
§ Implementation
vector<int> myz(std::string s) {
const int n = s.size();
vector<int> z(n);
for(int i = 0; i < n; ++i) { z[i] = 0; }
int l = 0;
for(int i = 1; i < n; ++i) {
z[i] = max(0, min(l + z[l] - i, z[i-l]));
while (i + z[i] < n && s[z[i]] == s[i + z[i]]) { z[i]++; }
if (i + z[i] >= l + z[l]) { l = i; }
}
return z;
}
- Reference: Algorithms on strings, trees, and sequences.