Central idea: assume a memory model where computation is free, only cost is pulling data from cache into memory. Cache has total size M, can hold blocks of size B. So it can hold M/B blocks of main memory. Memory memory has infinite size. Cost is number of transfers. We assume that the algorithm does not know M or B . We assume that the cache replacement strategy is optimal (kick out block that is going to be used farthest in the future). This is an OK assumption to make since an LRU cache using twice the memory of a "oracular" cache performs equally well (citation?) These data structures are cool since they essentially "Adapt" to varying cache hierarchies and even multiple level cache hierarchies. We study how to build cache-oblivious B-trees.
§ Building optimal cache-oblivious B trees to solve search
We use a balanced BST. We want to find an order to store nodes in memory such that when we search for an element, we minimize number of blocks we need to pull in.
All standard orders such as level order, pre-order, post-order fail.
Corrrect order is "VEB (Van Em De Boas) order": carve a tree at the middle level of its edges. Layout a "triangle" or smaller collection of nodes linearly. Then Recursively layout the trees, linearly in memory.
Supposedly if the number of nodes is N, we wil have roughly (N)nodes on the top, and then (N)triangles at the bottom.
§ Analysis Claim: we need to pull O(logBN) blocks for any B for any search query
N is the number of nodes in the BST. Note that in the analysis, we know what B is , even though the algorithm does not .
We look at a particuar level of recursion. We will call it a "level of detail" straddling B.
We will have large triangles of size ≥B, inside which there are smaller triangles of size ≤B (reminds me of sierpinski).
We know that the algorithm recursively lays it out, and triangle stores everything "inside" it in a contiguous region . So we stop at the requisite size where we know that the tree's triangles themselves contain triangles which fit into the block size.
A little triangle of size less than B can live in at most two memory blocks by straddling a block boundary: by eg. having (B−1) bits in one block and a single bit in another block.
1 2 3 4 5 6 7 8 <- index
| | | <- boundary
|-xxxxxxx-----| <- data
The question is that on a root-to-leaf bpath, how many such triangles do we need to visit. Since we repeatedly divide the nodes in half with respect to height until the little triangle has number of nodes less than B, the height is going to be O(logB) since it's still a binary tree.
total height in O(logN).
so height of "chunked tree" where we view each triangle as a single node is logN/logB=logBn.
insight : ou data structure construction in some sense permits us to "binary search on B" since we divide the data structure into levels based on B. if B=N, then the full data structure fits into memory and we're good.
Note that when we perform post-order inside a triangle that has 3 triangles of size ≤B, we need to alternate between parent triangle and child triangle. Since the parent triangle is of size ≤B and can therefore take at most 2B blocks of memory, similarly the child can take at most 2Bblocks of memory.
So if our cache can hold 4 blocks of memory, we're done. We won't need to kick anything out when performing the post-order traversal.
For levels that are above the bottom 2 levels, we're still OK. there are not many triangles! / not many nodes! ( 1:16:00 in the video)