§ Z algorithm
- The
Z algorithm, for a given string s, computes a function Z:[len(s)]→[len(s)]. - Z[i] is the length of the longest common prefix between S and S[i:].
- So, S[0]=S[i], S[1]=S[i+1], S[2]=S[i+2], and so on till S[Z[i]]=S[i+Z[i]], and then S[Z[i]+1]=S[i+Z[i]+1].
- If we can compute the
Z function for a string, we can then check if pattern Pis a substring of text T by constructing the string P#T$. Then, if we have an index such that Z[i] = len(P), we know that at that index, we have the string P as a substring.
- Note that the
Z-algorithm computes the Z function in linear time .
- The key idea of the Z algorithm is to see that if we are at an index
i, but we have an index l < isuch that i < l + z[l], then we have that s[0:z[l]] = s[l:l+z[l]]. Thus, we are "in the shade" of the l.
- In this situation, we can reuse
z[i-l] as a seed for z[i]. There are two cases: i + z[i-l] > l + z[l] and the converse. - If
i + z[i-l] < l + z[l], then we are still "in the shade" of l, so we can safely set z[i] = z[i-l]. - If not, we set
z[i] = l + z[l], since we know that we match at least this much of the beginning of the string.
§ Specification
vector<int> calcz(std::string s) {
const int n = s.size();
vector<int> z(n);
z[0] = 0;
for(int i = 1; i < s.size(); ++i) {
z[i] = 0;
while(i + z[i] < n && s[i+z[i]] == s[z[i]]) {
z[i]++;
}
}
return z;
}
§ Implementation
vector<int> myz(std::string s) {
const int n = s.size();
vector<int> z(n);
for(int i = 0; i < n; ++i) { z[i] = 0; }
int l = 0;
for(int i = 1; i < n; ++i) {
z[i] = max(0, min(l + z[l] - i, z[i-l]));
while (i + z[i] < n && s[z[i]] == s[i + z[i]]) { z[i]++; }
if (i + z[i] >= l + z[l]) { l = i; }
}
return z;
}
- Reference: Algorithms on strings, trees, and sequences.