§ Dynamic Programming: Erik Demaine's lectures
I realized I'd never bothered to ever formally learn dynamic programming,
so I'm watching Erik Demaine's lectures and taking down notes here.
§ DP 1: Fibonacci, shortest paths
- DP ~= careful brute force.
- DP ~= subproblems + "recurse"
§ Fibonacci
F(1) = F(2) = 1; F(n) = F(n-1) + F(n-2)
§ Naive:
fib(n):
if n <= 2: f = 1
else f = fib(n-1) + fib(n-2)
return f
EXPONENTIAL time! T(n) = T(n-1) + T(n-2) + O(1)
. Since it's the fibonacci
recurrence, the solution is rougly ϕn where ϕ is the golden
ratio. Alternate, T(n) >= 2T(n-2) ~ 2^(n/2)
§ Memoized DP:
memo = {}
fib(n):
if n in memo: return memo[n]
if n <= 2: f = 1
else f = fib(n-1) + fib(n-2)
memo[n] = f
return f
- We can think about it in terms of the recursion tree, where this allows us to not have to recompute some of the data.
- The alternative way of thinking about it is that there are two ways of calling
fib
: the first time, it's non-memoized, which recurses. Every other time, we're doing memoized calls that are constant time. - The number of non memoized calls in
n
. These we have to pay for. The non recursive work per call is constant. - Therefore, the running time is linear! Linear because there are
n
non memoized calls, and each of them cost constant time . - In general, in DP, we memoize (remember) solutions to subproblems that help us solve the actual problem. So,
DP = recursion + memo
. -
Running time = number of different subproblems x time per subproblem
. When we measure time per subproblem , we ignore recursive calls! (don't count recursions).
§ Bottom up DP algorithm
fib = {}
-- | some thought for the loop
for k in range(1, n+1):
if k <= 2: f = 1
else: f = fib[k-1] + fib[k-2]
fib[k] = f
Order based on any topo sort of dependency DAG. From the bottom up perspective,
we can decide how much we need to store based on how much state we need.
*------------*
| v
f(n-2) f(n-1) f(n)
| ^
*----*
§ Single Source Shortest paths ( s-v
path)
- Tool to find answers: guessing . Suppose you don't know it. how do you find the answer? guess! Don't try any guess, try all guesses! (then take the best one).
- DP = recursion + memoization + guessing.
There is some hypothetical path from s
to v
. We don't know what the first
edge of this hypothetical path is, so we guess. We try all of the paths from
s->s'
. This changes s
, but we really care about single source shortest
path. So rather, we choose to guess v
. We guess the last edge u? -> v
.
Recursively compute the path from s
to u?
, and then add the path to v
.
\delta(s, v) = \min_{(u,v) \in E} \delta(s, u) + w(u, v)
Subpaths of shortest paths are shortest paths! Memoize to make it fast? Why
is it faster on memoization?
*---a----*
v ^ v
s | w
| | |
*-->b<---*
δ(s, w)
δ(s, a)
δ(s, b)
δ(s, s)
δ(s, w) <- INFINITE
- Infinite time on graphs with cycles.
- For a DAG, it runs on V+E. Number of subproblems =
V
. Time we spend per subproblem at a vertex is the number of incoming edges. we can't take product, because the time per subproblem can vary wildly. So we restate our "time formula" as
total time = sum over times of all subproblems (modulo recursion)
This gives us:
total time = sum indeg(v) + O(1) = O(E) + O(1)
- LESSON LEARNT: subproblem dependencies should be acyclic!
- Claim: can use same approach for graphs! Explode a cycle over time. This makes any graph acyclic.
Sid question : Can we derive Djikstras using the same "cycle explosion"
trick?
We define δk(s,v) to be weight of shortest path that uses at most k
edges.
\delta_k(s, v) = \min_{(u, v) \in E} \delta_{k-1}(s, u) + w(u, v)
We've increased the number of subproblems. We know that the longest path
possible can have ∣V∣−1 edges. So the k
parameter goes from [0..|V|-1]
while the vertex v
can be any vertex. Per vertex v
we spend indeg(v)
time.
So we get the total recurrence as:
(k∈[∣V∣−1],v∈V)∑T(k,v)=k∈[∣V∣−1]∑v∑indeg(v)=k∈[∣V∣−1]∑E=VE
§ DP 2: Text Justification, Blackjack
§ 5 "easy" steps to a DP
- Define subproblems; analysis - number of subproblems
- Guess (part of solution); analysis - number of choices for the guess
- Relate subproblem solutions [with a recurrence ]; analysis - time per subproblem (ignoring recursion)
- Build an algorithm: [recursion/memo, or tabling ]; check recurrence is acyclic
- solve original problem; total time: total time across all subproblems (ignoring recursion). In simple cases, total time = number of subproblems x time per subproblem.
- Check that the original problem actually gets solved!
§ Recap: Fibonacci
- subproblems:
F(1)...F(n)
- guess: nothing
- relate:
F(n) = F(n-1) + F(n-2)
; O(1)
time - F(n). constant time to find
§ Recap: Shortest path
- subproblems: δk(s,v). V2 subproblems.
- guess: last edge; edge into v.
- relate: δk(s,v)=minuδk−1(s,u)+w(u,v);
indegree(v)
time -
- δv−1(s,v) for all v. This takes Θ(V).
§ Text Justification
split text into "good lines". can only cut between word boundaries. Text is
a list of words. badness(i, j)
: how bad is it to use words[i:j]
in a line.
They may fit, or they may not fit. If they don't fit, then badness is ∞
.
Otherwise, it's going to be (pagewidth - total width)^3
.
We want to minimize the sum of badnesses of the lines.
- subproblems: the hard part! exponential: Guess for every word, whether a line begins or not. What is the natural thing to guess? guess how long the first line is / guess where the second line begins. After I guess where the second line is, I now have the remaining words to text justify. So the subproblems are going to be suffixes of the array:
words[i:]
. If we have n
words, we have n
suffixes. We're going to only remember one line [forget the past! ], not all the lines! [this is exponential! ] - Guess: where to start the second line. If we are at location
i
, there are n-i
choices which we will think of as O(n)
. - Recurrence: dp[i]=mini+1≤j≤nbadness(words[i:j])+dp[j]. Time per subproblem is
constant x [i+1..n]
which is O(n)
. - Check recurrence is acyclic/topo order:
n, n-1, ... 1
- Total time:
number of subproblems x time per subproblem = O(n^2)
- Original problem:
dp[0]
.
§ Parent pointers
Remember which guess was best. Find actual solution, not just the cost of the
§ fiat
fiat lux.
Let there be light
solution.
n suffixes | exp many sybsets. dont need to know history!
§ Blackjack
whatever, I don't particularly care about the game
§ DP 3: Paranthesization, Edit distance, knapsack
§ Sequences
Good choices of objects to perform DP on:
- Suffixes:
x[i:]
for all i
. O(n)
. - Prefixes:
x[:j]
for all j
. O(n)
. - Substrings:
x[i:j]
for all i
and j
. O(n^2)
.
§ Parenthesiztion
optimal order of associative expression.: A[0] . A[1] ... A[n-1]
. Order
matters for matmul!
- What should we guess? There are exponentially many parenthesizations! Guess the outermost/last multiplication. Ie, we want to know:
(A[0] ... A[k-1]) * (A[k] ... A[n-1])`
- We can't just use prefixes and suffies, because when we recurse into
A[0]...A[k-1]
, we're going to get splits of the form A[0]...A[k']
and A[k']...A[k-1]
. In general, if we feel we need both prefixes AND suffixes, we likely need the full power of substrings. - So our choice of subproblem is:
dp[i][j]
is the optimal outermost split for A[i]...A[j-1]
The number of choices is O(j-i+1) = O(n)
.
dp[i][j] = min{
for k in range(i+1, j):
dp[i][k] + dp[k][j] + cost of (A[i:k] * A[k:j])
}
- Time is polynomial.
O(n)
time for subproblem ignoring recursions. We have O(n^2)
subproblems (substrings). So the running time is O(n^3)
. - Topological order: in general, if we have prefixes, we go left to right. have suffixes, we go right to left. If we have substrings, we evaluate based on increasing substring lengths , since when we split, we get substrings with smaller lengths.
§ Edit distance
Given two strngs x
and y
. Find the cheapest way to convert x
into y
.
We allow character edits to turn x
into y
: We can (1) insert a character
anywhere in x
, (2) delete a character anywhere in x
, (3) edit any character in x
.
We have custom costs for each insert and delete.
- Can also solve longest common subsequence.
HIEROGLYPOHOLOGY
, MICHAELANGELO
. Drop any set of letters from x and y, we want them to be equal. Model it as edit distance. Cost of insert/delete is 1
, cost of replacement is 0
if characters are equal, ∞
otherwise. - We will look at suffixes of
x
and y
at the subproblem. Subproblem is edit distance on x[i:]
AND y[j:]
. Number of subproblems is O(∣x∣∣y∣). - We need to guess! Not so obvious. Look at the first characters. What can I do with the first character of
x
? (1) I can replace the first characters. (2) I can insert the character y[j]
into x
. (3) I can delete the character x[i]
. So we have:
- Replace
x[i]
with y[j]
. - Insert
y[j]
. - Delete
x[i]
.
dp[i][j] = min{
cost of replace x[i] with y[j] + dp[i+1, j+1],
cost of insert y[j] + dp[i][j+1],
cost of delete x[i] + dp[i+1][j],
}
The topological order is going to have smaller to larger suffixes.
What I found really interesting is the offhand remark that longest common
substring is just edit distance where we are allowed to only delete
or keep characters.
§ Knapsack
List of items, each of size s[i]
and a desire/value v[i]
. The sizes
are integers. We have a backpack of total size S
. We want to choose a subset
of the items which maximize the value, and also fit into the backpack: ∑s[i]≤S.
- Even though it seems like we don't have a sequence, we have a set of items we can put in any sequence. We can look at sequences of items. At the item
i
, we should guess if item i
is included or not.
dp[i] = ... max(dp[i+1], dp[i+1] + v[i])`
We don't keep track of the sizes! Rather, we choose our subproblem to be the suffix
AND the remaining capacity x≤S. We have O(nS) subproblems
dp[i][s] = max(dp[i+1][s], dp[i+1][s-s[i]] + v[i])
To be polynomial in the input, it would have to be θ(nlogS) because
S is given as a number. It would not be nS; S is exponential in the
input encoding logS.
§ R21: DP: Knapsack
We have n
decisions: d[i]
is do I take item i
. What do we need to keep
track of? I need to know how much weight I have left. This is equivalent to
knowing the sum of the items. The edge is an arbitrary item, the weight is -v[i]
since we're trying to phrase the problem in terms of shortest path. The state
in the node is item I'm looking at, and weight of the items I've taken so far.
§ DP4: Guitar fingering, tetris, super mario bros
A second kind of guessing. Guessing usually which subproblem to use to solve
a bigger subproblem. Another way of guessing is to add more subproblems to guess
or remember more features of the solution.
§ Mapping to knapsack
obvious solution was suffix in knapsack. So we needed to know how many units
of the knapsack we've used up; we're remembering something about the prefix
(but not the full prefix itself). On the other hand, in the forward direction,
we were solving more types of subproblems, for varying sizes of knapsacks.
§ Piano and guitar fingering: Take 1
Given some musical piece to play: a sequence of n
notes we want to play.
We want to find a fingering for each note. We have fingers 1
upto f
. We want
to assign a finger to each note. We have a difficulty measure d(p, f, p', f')
:
how hard is to transition from note p
(p for pitch) with finger f
to note
p'
with finger f'
?
- Subproblems: prefixes? suffixes? substrings? Suffixes are kind of fine. How to play notes
n[i:]
. - Guess: Which finger to put on note
i
? - Recurrence:
dp[i] = min({
for f in fingers:
dp[i+1] + d(i, f, i+1, ?)
})
- Add more subproblems! how to play
notes[i:]
when using finger f
for notes[i]
. - What to guess? finger
g
for note (i+1)
dp[i][f] = min({
for g in fingers:
dp[i+1][g] + d[notes[i], f, notes[i+1], g]
}
for i reversed(range(n)):
for f in range(F): ...
- Original problem: we don't know what finger to use for
dp[0]
. So we can take a min
over all fingers. min([ dp[0][f] for f in range(F)])
§ guitar chords:
Generalize the notion of "finger" to "finger(F) + string(S)". This gives
us O(N((F+S)2))=O(NF2S2). Multiple notes: notes[i] = list of F notes
for
a piano.
- state: we need to know about the assignment of fingers to notes (or no note). So that's (N+1)F. Generalize the rest.
§ Tetris
We know the entire sequence of pieces that's going to fall. For each, we must
drop the piece from the top. Also, full rows don't clear.
The width of the board is small. The board is initially emoty.
Can you survive?
The subproblems are how to play suffixes of pieces[i:]
. We need to know
what the board looks like. If the board doesn't clear and we always drop from
the top, then all we need to know is the skyline.
1| ###
2| #
3|####
4|####
5|####
- So we also store the board skyline. We have
h
different choices for each column. There are w
columns. So we have (h+1)^w
number of choices for skylines. Total number of subproblems is n.(h+1)^w
. - Guess: what do I do with piece
i
? I can rotate it 0, 1, 2, 3
times, and the where to drop it. I can guess where to drop the piece. This is 4w
choices --- 4
for rotation, w
for where we drop.
- Here the answer is a boolean: survive (1) or not (0). we want to survive, so we can just use
max
on a boolean.
§ Super Mario Bros
Recall that in the original super mario bros, if something moves out of the
screen it's lost forever; We can't move back in the old mario. We're given
the level, and a small w×h screen. The configuration of the game is
everything on the screen! Total info is going to be cwh where c
is some
constant. We also need Mario's velocity, the score, and time. score can be S
big, time can be T big. The number of configurations is the product of all of
these. We also need to know how far to the right we have gone, which is another
W
. Draw a graph of all configurations, and then use DP
§ Lecture 10: Advanced DP by Srinivas
§ Longest palindromic sequence
Find palindrome inside longer word. Given a string X[1..n]
. Find longest
palindrome that is a subsequence.
character
c arac
answer will be greater than or equal to 1 in length because a single letter
is a palindrome.
turboventilator
r o t ator
-
L[i, j]
: length of longest palindromic subsequence in string xs[i:j]
where i<=j
.
def L(i, j):
if i > j: return 0
if i == j: return 1
if x[i] == x[j]:
return 2 + l(i+1, j-1)
return max(L(i+1, j), L(i, j-1))
- number of subpbroblems: O(n2). Time per subproblem assuming recursion is free: O(1). Hence, total time is O(n2).
§ Optimal binary search trees
Find most balanced BST for a set of keys. We have weights for the keys, which
are search probabilities. Find a BST T (there are exponential number of BSTs)
that minimizes ∑iwi(depthT(ki)+1). Depth of root is 0. Depth
of a node is distance from the root. This minimizes expected search cost.
§ Enumeration
We have exponentially many trees.
§ Greedy soltution / Why doesn't greedy work?
Assume the keys are sorted.
Pick K[i]
in some greedy fashion (max.weight).
This immediately splits the set of keys into the left and right.
If we define e(i, j)
to be the cost of the optimal BST on keys k[i], ..., k[j]
.
greedy:
-------
2|w=10
1|w=1 4|w=9
3|w=8
optimal:
-------
3|w=8
2|w=10 4|w=9
1|w=1
§ DP
- Guess all possible root nodes. The greedy algorithm doesn't try to guess the root node, that's the only difference.
-- e(i, j): cost of tree with keys k: i <= k <= j
def e(i, j):
if i == j: return w[i]
return min([e(i, r-1) + e(r+1, j) + w[r]
for k in range(i, j+1)])
The weights are going to change when we increase depth, so we actually need
to add all the weights from i
to j
! So we write:
-- e(i, j): cost of tree with keys k: i <= k <= j
def e(i, j):
if i == j: return w[i]
return min([e(i, r-1) + e(r+1, j) + weight(i, j)
for k in range(i, j+1)])
§ Alternating coins
Have a list of coins. We have an even number of coins.
Can only pick coins from the outside.
First player can always not lose in the game. What the first player does
is compute v[1] + v[3] + ... v[n-1]
versus v[2] + v[4] + ...
which is even.
If the odd values win, the he picks v[1]
. P2 can pick v[2]
or v[n]
.
P1 can pick either v[3]
or v[n-1]
depending on if P2 picked v[2]
or v[n]
.
- We now want to maximize the amount of money.
v(i, j) = max([ range is(i+1, j) with P2 + v[i],
range is (i, j-1) with P2 + v[j] ])
If we have v(i+1, j)
subproblem with the opponent picking, we are guaranteed
that the opponent plays min(v(i+1, j-1), v(i+2, j))
. So we can unfold this,
to get the full DP:
v(i, j) = max([ min(v(i+1, j-1), v(i+2, j)) + v[i],
min(v(i, j-1), v(i+1, j)) + v[j],
§ DP: All pairs shortest paths.