A Universe of Sorts

§ Amortized analysis

Erik Demaine: amortization --- amortized analysis

§ Table doubling

Expected cost of hash lookup: $O(1 + n/m)$ where $n$ items in table of size $m$ ( $n/m$ is the load factor)
Total cost for $n$ insertions: $2^0 + 2^1 + \dots + 2^{\log n} \simeq O(n)$ .
Amortized cost per operation is $O(1)$ .

§ Aggregate method

We do some sequence of

k

operations. Measure the total cost of the

k

operations and divide by

k

. This defines the amortized cost (Weak definition).

§ Generalized definition (amortized cost)

assign some cost for each operation, called the amortized cost, such that it "preserves the sum" of the cost for all operation sequences op[i].

\begin{aligned} \sum \texttt{amortized-cost}(op[i]) \geq \sum \texttt{real-cost}(op[i]) \end{aligned}

The cost that obeys this inequality is called as the amortized cost .

§ 2/3 trees:

O(1) to create
O(log n) to insert (amortized)
O(0) to delete? (amortized) We can bound the deletion cost by the insertion cost, because we can't delete more items than we have inserted! We can bound the delete cost by the insert cost .

c creation time, i insertions, d deletions.
Let $n^\star$ be the largest size of the tree we've encountered in this sequence of operations. This way, we are really bounding the worst case.
The real cost is $(c + i \log n^\star + d \log n^\star)$ . Let's try to show an amortized bound with O(0) to delete!.

\begin{aligned} &O(c + i \log n^\star + d \log n^\star) \\ &= O(c + (i + d) \log n^\star) \\ &[\text{$d \leq i$ since we can delete at most number of items}] \\ &\leq O(c + 2i \log n^\star) \\ &\leq O(c + i \log n^\star) &\leq O(c + i \log n^\star + 0d) \end{aligned}

§ Accounting method

Define a time bank account. An operation can store time credit in that bank account. Bank account must always have non-negative time balance. Performing operations costs time. So we can either directly pay for operations, or we can pull money from the time bank account. We can pay for time using the stored credit in the bank.

§ Sid note

On the whole I always find this confusing; Why would I have a bank account if I can also simultaneously pay with an infinite amount of money? So I prefer to think of it as me having (1) an infinitely large, slow, reservoir of gold, (2) a quick cache of gold [which replaces the "bank" ]. To pay for an operation, I can only access money from my quick cache, because pulling money from my slow reservoir is slow. I can transfer money from my infinitely large reservoir to my quick cache of gold. The amortized method calculates the total amount of gold that was pulled from the reservoir and stored into the cache. We might have leftover gold in the quick cache (upper bound). We can't have "negative gold" in the cache. We define the amortized cost to be the total number of gold coins pulled out of infinite reservoir by an operation.

§ Accounting example: 2/3 trees

When we insert, I pull

2 \log(n^\star)

from my reservoir. I use

1 \log(n^\star)

to pay for the insertion, and keep a reserve of

\log(n^\star)

in the bank. Then when we delete an element, we use the

\log(n^\star)

extra we had kept in the cache from the insert.

§ Better bounds: removing the star

We want to say that we pay

\log(n)

for insert and delete where

n

is the size of the tree when we perform the insert or delete. Per insert, we pull in two gold coins worth

\log(n)

from the reservoir into the cache. When we delete, we use the

\log(n)

in the cache from the money that was withdrawn when we created that element. See that the

n

changes as the size of the data structure changes.

§ Table doubling via accounting

When we insert a item into a table, withdraw $c + O(1)$ value from the reservoir. Keep gold coin worth $c$ that was in the cache on the item that was inserted . So imagine a real gold coin worth $c$ floating above the item.

When we double the size of the table, use up as many old coins as possible, and then withdraw the rest of the cost from the infinite gold reservoir. Also withdraw to keep coins on the newly inserted items.

So by the time we double, half of the elements have coins, other half don't. At the end, I'm going to have $n/2$ coins. The amortized cost of doubling is going to be $O(n) - cn/2$ which is going to be zero if $c$ is large. since I will have $cn/2$ coins in my cache of gold.

Insert costs $O(1 + c) = O(c)$ since we need to pull out those many coins from the infinite gold reservoir.

§ Charging method

Time travel/blame the past for your mistakes. We allow operations to charge their cost retroactively to their past (not to their future!). An operation can withdraw more value that it immediately needs from the infinite reservoir into the gold cache, and it keeps it "for itself" for future-itself.

amortized cost = total gold pulled out of infinite reservoir.

§ Charging example: Table doubling

After we double the table, the table is half-full, we need to perform

n/2

insertions to get the table to be full. When we double the array next time, we charge the doubling to the insertions since the last doubling! There are

n/2

items since the last doubling (I had 2 items, I doubled to 4 items, so there are n/2 new items. Now I want to double and add 4 more items). I have a cost of O(n) for the doubling. So I can charge O(1) cost to all the new items since the last doubling. Note that I only charge once; after I've charged items, I've doubled the array, so they are now "old".

§ Potential method (Most powerful) / Defining karma:

We define a potential function

\phi

mapping a data-structure-configuration to an natural number pile of gold coins. It tries to measure how bad the datastructure is right now. We can pull money out of the pile of money in the potential. Amortized cost is the actual cost plus the change in the amount of money in the data structure configuration after and before:

\texttt{amortized(op)} = \texttt{real(op)} + \phi(\texttt{after(op)}) - \phi(\texttt{before(op)})

Adding up all the amortized costs, the sum telescopes, giving us

\texttt{amortized(total)} = \texttt{real(total)} + \phi(\texttt{end}) - \phi(\texttt{begin})

We really like to have $\phi(\texttt{begin}) = 0$ .

§ Potential example: binary counter

flipping a bit is

O(1)

. How much will increment cost? An increment is going to cost 1 + the number of trailing ones. We want to make this constant .

What is making this bad? the number of trailing ones? Consider 11110 which has zero trailing ones. If I increment it I get 11111, which has 5 trailing ones. So I need to pull on 5 gold coins, which is O(n). No good, I want O(1). Though this is the natural thing to try, it doesn't quite work.

Rather, we can define $\phi$ to be the total number of 1 bits. Whenever I increment, at maximum, I can add one 1. If a number has t trailing bits, then on incrementing, I destroy t one bits, and add a single one-bit. eg: 0110 + 1 = 1000.

The amortized cost is: 1 + t (actual cost). The change in potential is: t - 1 [lost t 1s, gained 1 one ]. So the amortized cost is total cost - change in potential, which is 1 + t - (t - 1) = 2, a constant.