§ Amortized analysis
§ Table doubling
- Expected cost of hash lookup: O(1+n/m) where n items in table of size m( n/m is the load factor)
- Total cost for n insertions: 20+21+⋯+2logn≃O(n).
- Amortized cost per operation is O(1).
§ Aggregate method
We do some sequence of k operations. Measure the total cost of the k
operations and divide by k. This defines the amortized cost (Weak definition).
§ Generalized definition (amortized cost)
- assign some cost for each operation, called the amortized cost, such that it "preserves the sum" of the cost for all operation sequences
op[i]
.
∑amortized-cost(op[i])≥∑real-cost(op[i])
- The cost that obeys this inequality is called as the amortized cost .
§ 2/3 trees:
-
O(1)
to create -
O(log n)
to insert (amortized) -
O(0)
to delete? (amortized) We can bound the deletion cost by the insertion cost, because we can't delete more items than we have inserted! We can bound the delete cost by the insert cost .
-
c
creation time, i
insertions, d
deletions. - Let n⋆ be the largest size of the tree we've encountered in this sequence of operations. This way, we are really bounding the worst case.
- The real cost is (c+ilogn⋆+dlogn⋆). Let's try to show an amortized bound with
O(0)
to delete!.
O(c+ilogn⋆+dlogn⋆)=O(c+(i+d)logn⋆)[d≤i since we can delete at most number of items]≤O(c+2ilogn⋆)≤O(c+ilogn⋆)≤O(c+ilogn⋆+0d)
§ Accounting method
Define a time bank account. An operation can store time credit in that bank account.
Bank account must always have non-negative time balance. Performing operations
costs time. So we can either directly pay for operations, or we can pull money
from the time bank account. We can pay for time using the stored credit in the
bank.
§ Sid note
On the whole I always find this confusing; Why would I have a bank account
if I can also simultaneously pay with an infinite amount of money? So I prefer
to think of it as me having (1) an infinitely large, slow, reservoir of gold, (2)
a quick cache of gold [which replaces the "bank" ]. To pay for an operation,
I can only access money from my quick cache, because pulling money from my slow
reservoir is slow. I can transfer money from my infinitely large reservoir
to my quick cache of gold. The amortized method calculates the total amount of
gold that was pulled from the reservoir and stored into the cache.
We might have leftover gold in the quick cache (upper bound).
We can't have "negative gold" in the cache.
We define the amortized cost to be the total number of gold coins
pulled out of infinite reservoir by an operation.
§ Accounting example: 2/3 trees
When we insert, I pull 2log(n⋆) from my reservoir. I use 1log(n⋆)
to pay for the insertion, and keep a reserve of log(n⋆) in the bank.
Then when we delete an element, we use the log(n⋆) extra we had kept
in the cache from the insert.
§ Better bounds: removing the star
We want to say that we pay log(n) for insert and delete
where n is the size of the tree when we perform the insert or delete.
Per insert, we pull in two gold coins worth log(n) from the reservoir
into the cache. When we delete, we use the log(n) in the cache from
the money that was withdrawn when we created that element. See that the n
changes as the size of the data structure changes.
§ Table doubling via accounting
- When we insert a item into a table, withdraw c+O(1) value from the reservoir. Keep gold coin worth c that was in the cache on the item that was inserted . So imagine a real gold coin worth c floating above the item.
- When we double the size of the table, use up as many old coins as possible, and then withdraw the rest of the cost from the infinite gold reservoir. Also withdraw to keep coins on the newly inserted items.
- So by the time we double, half of the elements have coins, other half don't. At the end, I'm going to have n/2 coins. The amortized cost of doubling is going to be O(n)−cn/2 which is going to be zero if c is large. since I will have cn/2 coins in my cache of gold.
- Insert costs O(1+c)=O(c) since we need to pull out those many coins from the infinite gold reservoir.
§ Charging method
Time travel/blame the past for your mistakes. We allow operations to charge
their cost retroactively to their past (not to their future!). An operation
can withdraw more value that it immediately needs from the infinite reservoir
into the gold cache, and it keeps it "for itself" for future-itself.
- amortized cost = total gold pulled out of infinite reservoir.
§ Charging example: Table doubling
After we double the table, the table is half-full, we need to perform n/2
insertions to get the table to be full. When we double the array next time,
we charge the doubling to the insertions since the last doubling!
There are n/2 items since the last doubling (I had 2
items, I doubled to 4
items,
so there are n/2
new items. Now I want to double and add 4
more items).
I have a cost of O(n)
for the doubling. So I can charge O(1)
cost to
all the new items since the last doubling. Note that I only charge once; after
I've charged items, I've doubled the array, so they are now "old".
§ Potential method (Most powerful) / Defining karma:
We define a potential function ϕ mapping a data-structure-configuration
to an natural number pile of gold coins. It tries to measure how bad the datastructure
is right now. We can pull money out of the pile of money in the potential.
Amortized cost is the actual cost plus the change in the amount of money in the
data structure configuration after and before:
amortized(op)=real(op)+ϕ(after(op))−ϕ(before(op))
Adding up all the amortized costs, the sum telescopes, giving us
amortized(total)=real(total)+ϕ(end)−ϕ(begin)
- We really like to have ϕ(begin)=0.
§ Potential example: binary counter
flipping a bit is O(1). How much will increment cost? An increment is going
to cost 1 + the number of trailing ones. We want to make this constant .
- What is making this bad? the number of trailing ones? Consider
11110
which has zero trailing ones. If I increment it I get 11111
, which has 5 trailing ones. So I need to pull on 5 gold coins, which is O(n)
. No good, I want O(1)
. Though this is the natural thing to try, it doesn't quite work.
- Rather, we can define ϕ to be the total number of 1 bits. Whenever I increment, at maximum, I can add one
1
. If a number has t
trailing bits, then on incrementing, I destroy t
one bits, and add a single one-bit. eg: 0110 + 1 = 1000
.
- The amortized cost is:
1 + t
(actual cost). The change in potential is: t - 1
[lost t
1s, gained 1
one ]. So the amortized cost is total cost - change in potential, which is 1 + t - (t - 1) = 2
, a constant.