§ Efficient tree transformations on GPUs (TODO)

All material lifted straight from Aaron Hsu's PhD thesis . I'll be converting APL notation to C++-like notation.

§ Tree repsentation as multi-dimensional ragged nested arrays

We're interested in this tree:
      ∘
┌──┬──┴────┐
a  b       c
│ ┌┴┐  ┌───┼───┐
p q r  s   t   u
  │    │   |
  │   ┌┴┐ ┌┴┐
  v   w x y z
I'll be writing APL commands in front of a $ to mimic bash, and I'll write some arrays as multi-line. To run them, collapse them into a single line. The ast object is represented in memory as:
$ ast ← ('∘'
           ('a' ('p'))
           ('b'
             ('q' ('v'))
             ('r'))
           ('c'
             ('s' ('w' 'x'))
             ('t' ('y' 'z'))
             ('u')))
$ ]disp ast
┌→┬──┬────────┬───────────────────┐
│∘│ap│┌→┬──┬─┐│┌→┬──────┬──────┬─┐│
│ │  ││b│qv│r│││c│┌→┬──┐│┌→┬──┐│u││
│ │  │└─┴─→┴─┘││ ││s│wx│││t│yz││ ││
│ │  │        ││ │└─┴─→┘│└─┴─→┘│ ││
│ │  │        │└─┴─────→┴─────→┴─┘│
└─┴─→┴───────→┴──────────────────→┘
Here's how read the array representation. Look at the top level of the tree. we have a root node with three children:
      ∘
┌──┬──┴────┐
a  b       c

┌→┬──┬────────┬─────────────┐
│∘│  │        │             │
│ │ a│   b    │     c       │
│ │  │        │             │
└─┴─→┴───────→┴────────────→┘
With the first being the root node, and the three adjacent cells being the children. Next, we look at how x is represented. This is predictably recursive. Let's see the subtree under x:
      ∘
┌──┬──┴────┐
a  b       c
│
p

┌→┬──┬────────┬─────────────┐
│∘│ap│        │             │
│ │  │  b     │   c         │
│ │  │        │             │
└─┴─→┴───────→┴────────────→┘

Similarly for y:
      ∘
┌──┬──┴────┐
a  b       c
│ ┌┴┐
p q r

┌→┬──┬────────┬─────────────┐
│∘│ap│┌→┬──┬─┐│             │
│ │  ││b│q │r││   c         │
│ │  │└─┴─→┴─┘│             │
└─┴─→┴───────→┴────────────→┘
And so on, leading to the final representation:
      ∘
┌──┬──┴────┐
a  b       c
│ ┌┴┐  ┌───┼───┐
p q r  s   t   u
  │    │   |
  │   ┌┴┐ ┌┴┐
  v   w x y z
┌→┬──┬────────┬───────────────────┐
│∘│ap│┌→┬──┬─┐│┌→┬──────┬──────┬─┐│
│ │  ││b│qv│r│││c│┌→┬──┐│┌→┬──┐│u││
│ │  │└─┴─→┴─┘││ ││s│wx│││t│yz││ ││
│ │  │        ││ │└─┴─→┘│└─┴─→┘│ ││
│ │  │        │└─┴─────→┴─────→┴─┘│
└─┴─→┴───────→┴──────────────────→┘
Note that for this representation to work, we need to be able to:
  • nest arrays inside arrays.
  • have subarrays of different sizes (ragged arrays)
  • of different nesting depths --- so it's really not even an array?
I don't understand the memory layout of this, to be honest. I feel like to represent this in memory would still rely on pointer-chasing, since we need to box all the arrays. This is possibly optimised by APL to not be too bad.

§ The depth vector representation

      ∘             0
┌──┬──┴────┐
a  b       c        1
│ ┌┴┐  ┌───┼───┐
p q r  s   t   u    2
  │    │   |
  │   ┌┴┐ ┌┴┐
  v   w x y z       3
If we visit this tree and record depths in pre-order (node left right), we arrive at the list:
(∘:0
  (a:1 (p:2)) (b:1 (q:2 (v:3)) (r:2))
  (c:1 (s:2 (w:3 x:3)) (t:2 (y:3 z:3)) (u:2)))
formatted as:
(∘:0
  (a:1
    (p:2))
  (b:1
    (q:2 (v:3))
    (r:2)
  )
  (c:1 (s:2 (w:3 x:3))
       (t:2 (y:3 z:3))
       (u:2))
)
This linearlized is the list:
    (∘ a p b q v r c s w x t y z u)
d ← (0 1 2 1 2 3 2 1 2 3 3 2 3 3 2)

      ∘             0
┌──┬──┴────┐
a  b       c        1
│ ┌┴┐  ┌───┼───┐
p q r  s   t   u    2
  │    │   |
  │   ┌┴┐ ┌┴┐
  v   w x y z       3
To convert the ast object into a depth vector representation, we can use the following call:
$ ast ← ('∘' ('a' ('p')) ('b' ('q' ('v')) ('r')) ('c' ('s' ('w' 'x')) ('t' ('y' 'z')) ('u')))
$ d ← ∊0{(⊢,(⍺+1)∇⊣)/⌽⍺,1↓⍵}ast
0 1 2 1 2 3 2 1 2 3 3 2 3 3 2
Let's break this down: TODO

§ Inverted tables

We represent data associated with our nodes as follows:
$ data ← ⍪ ¨d(15⍴'T')(↑15⍴⊂'n.')
$ ]disp data
┌→┬─┬──┐
│0│T│n.│
│1│T│n.│
│2│T│n.│
│1│T│n.│
│2│T│n.│
│3│T│n.│
│2│T│n.│
│1│T│n.│
│2│T│n.│
│3│T│n.│
│4│T│n.│
│2│T│n.│
│3│T│n.│
│4│T│n.│
│2↓T↓n.↓
└→┴→┴─→┘
This is the same thing as a structure of arrays (SOA) representation , where each array of information (eg, the depth at data[1], the T information at data[2]) are each arrays which can be accessed well on SIMD instructions.

§ AST representation

TODO

§ Path matrices

We want information of how to go up and down the tree in ideally constant time. We store this information in what is known as a path matrix . For our recurring example, the path matrix is:
∘ a p b q v r c s w x t y z u | preorder traversal
──────────────────────────────────────────────────
∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ | depth=0
- a a b b b b c c c c c c c c | depth=1
- - p - q q r - s s s t t t u | depth=2
- - - - - v - - - w x - y z - | depth=3

      ∘             0
┌──┬──┴────┐
a  b       c        1
│ ┌┴┐  ┌───┼───┐
p q r  s   t   u    2
  │    │   |
  │   ┌┴┐ ┌┴┐
  v   w x y z       3
To efficiently compute this, we first replace every value in our tree with its preorder traversal visit time. This changes the tree to:
              ∘:0                               0
┌──────────┬──┴─────────────────┐
a:1        b:3                 c:7              1
│      ┌───┴───┐     ┌──────────┼───────┐
p:2    q:4     r:6   s:8        t:11    u:14    2
       │             │          │
       │          ┌──┴──┐     ┌─┴───┐
       v:5        w:9   x:10  y:12  z:13        3
The values we store in the tree are the integers. The old labels are represented for clarity. The path matrix for this tree is:
0  1  2  3  4  5  6  7  8  9 10 11 12 13 14  | preorder traversal
────────────────────────────────────────────────────────────
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  | depth=0
-  1  1  3  3  3  3  7  7  7  7  7  7  7  7  | depth=1
-  -  2  -  4  4  6  -  8  8  8 11 11 11 14  | depth=2
-  -  -  -  -  5  -  -  -  9 10  - 12 13  -  | depth=3

              ∘:0                               0
┌──────────┬──┴─────────────────┐
a:1        b:3                 c:7              1
│      ┌───┴───┐     ┌──────────┼───────┐
p:2    q:4     r:6   s:8        t:11    u:14    2
       │             │          │
       │          ┌──┴──┐     ┌─┴───┐
       v:5        w:9   x:10  y:12  z:13        3

§ Rendering the depth information in 2D

We use the incantation:
$ d ← (0 1 2 1 2 3 2 1 2 3 3 2 3 3 2)
$ ((⍳≢d)@(d,¨⍳≢d)) ((⌈/d) (≢d))⍴'-'
0 - - - - - - - - -  -  -  -  -  -
- 1 - 3 - - - 7 - -  -  -  -  -  -
- - 2 - 4 - 6 - 8 -  - 11  -  - 14
- - - - - 5 - - - 9  -  - 12  -  -
- - - - - - - - - - 10  -  - 13  -
Let's break this down (the symbol means a lamp, for commenting/illumination)
$ ⍳ 3 ⍝ iota: make a list of n elements:.
1 2 3
$ d
0 1 2 1 2 3 2 1 2 3 4 2 3 4 2

$ ≢d ⍝ tally: ≢`. count no. of elements in d:
15
⍳≢d  ⍝ list of elements of len (no. of elements in d).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
$ ]disp (1 2 3),(4 5 6) ⍝ ,:concatenate
┌→────┬─────┐
│1 2 3│4 5 6│
└~───→┴~───→┘
]disp (1 2 3) ,¨ (4 5 6)
┌→──┬───┬───┐
│1 4│2 5│3 6│
└~─→┴~─→┴~─→┘
The use of ¨ needs some explanation. ¨ is a higher order function which takes a function and makes it a mapped version of the original function. So, is a function which attemps to map the concatenation operator. Now, given two arrays (1 2 3) and (4 5 6), (1 2 3) ,¨ 4 5 6 attemps to run , on each pair 1 and 4, 2 and 5, 3 and 6. This gives us tuples ((1 4) (2 5) (3 6)). So, for our purposes, zip ← ,¨.
]disp (d,¨⍳≢d) ⍝ zip d with [1..len d].
┌→──┬───┬───┬───┬───┬───┬───┬───┬───┬───┬────┬────┬────┬────┬────┐
│0 0│1 1│2 2│1 3│2 4│3 5│2 6│1 7│2 8│3 9│4 10│2 11│3 12│4 13│2 14│
└~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~──→┴~──→┴~──→┴~──→┴~──→┘
$ ((⌈/d) (≢d))⍴'-' ⍝ array of dim (max val in d) x (no. of elem in d)
---------------
---------------
---------------
---------------
  • is the maximum operator and / is the fold operator, so ⌈/d finds the maximum in d. Recall that (≢d) find the no. of elements in d. reshapes an array to the desired size. We pass it a 1x1 array containing only -, which gets reshaped into a (⌈/d) x (≢d) sizes array of - symbols.
TODO: explain @ and its use

§ Creating the path matrix

$ ⎕IO ← 0 ⍝ (inform APL that we wish to use 0-indexing.)
$ d ← (0 1 2 1 2 3 2 1 2 3 3 2 3 3 2)
$ PM ← ⌈\((⍳≢d)@(d,¨⍳≢d))(((⌈/d+1)(≢d))⍴0)

0 0 0 0 0 0 0 0 0 0  0  0  0  0  0
0 1 1 3 3 3 3 7 7 7  7  7  7  7  7
0 0 2 2 4 4 6 6 8 8  8 11 11 11 14
0 0 0 0 0 5 5 5 5 9 10 10 12 13 13

      0               0
┌──┬──┴───────┐
1  3          7       1
│ ┌┴┐  ┌──────┼───┐
2 4 6  8     11   14  2
  │    │      |
  │   ┌┴─┐   ┌┴──┐
  5   9  10  12  13   3
The incantation can be broken down into:
  • (((⌈/d+1)(≢d))⍴0) is used to create a max(d+1)x|d| dimension array of zeros. Here, the rows define depths, and the columns correspond to tree nodes which for us are their preorder indexes.
$ grid←(⌈/d+1) (≢d) ⍴ 0
$ grid
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • ((d ,¨ ⍳≢d)) creates an array of pairs (depth, preindex). We will use this to fill index (d, pi) with the value pi.
$ writeixs ← (d,¨⍳≢d)
$ ]disp writeixs
┌→──┬───┬───┬───┬───┬───┬───┬───┬───┬───┬────┬────┬────┬────┬────┐
│0 0│1 1│2 2│1 3│2 4│3 5│2 6│1 7│2 8│3 9│3 10│2 11│3 12│3 13│2 14│
└~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~──→┴~──→┴~──→┴~──→┴~──→┘
  • ixgrid ← ((⍳≢d)@writeixs) grid rewrites at index writeixs[i] the value ( (i≢d)[i]).
$ ixgrid ← ((⍳≢d)@writeixs) grid
$ ixgrid
0 0 0 0 0 0 0 0 0 0  0  0  0  0  0
0 1 0 3 0 0 0 7 0 0  0  0  0  0  0
0 0 2 0 4 0 6 0 8 0  0 11  0  0 14
0 0 0 0 0 5 0 0 0 9 10  0 12 13  0
  • Finally, is the maximum operator, and \ is the prefix scan operator, so ⌈\ixgrid creates a prefix scan of the above grid to give us our final path matrix:
$ PM ← ⌈\ixgrid
$ PM
0 0 0 0 0 0 0 0 0 0  0  0  0  0  0
0 1 1 3 3 3 3 7 7 7  7  7  7  7  7
0 0 2 2 4 4 6 6 8 8  8 11 11 11 14
0 0 0 0 0 5 5 5 5 9 10 10 12 13 13

§ Using the path matrix: distance of a node from every other node.

Note that the maximum distance between two nodes is to climb all the way to the top node, and then climb down:
dmax ← depth(a) + depth(b)
If we know the lowest common ancestor of two nodes, then the distance of one node to another is:
dcorrect ← dist(a, lca(a, b)) + dist(b, lca(a, b))
So, we can compute the depth as:
dcorrect ← dist(a, lca(a, b)) + dist(lca(a, b), b)
 = dist(a, lca(a, b)) + depth(lca(a, b)) +
   dist(b, lca(a, b)) + depth(lca(a, b)) +
   -2 * depth(lca(a, b))
 = depth(a) +
   depth(b) +
   -2 * depth (lca(a, b))
[TODO: picture ] [TODO: finish writing this ]

§ Parent vector representation

A parent vector is a vector of length n where Parent[i] denotes an index into Parent. Hence, the following condition will return 1 if V is a parent vector. For example, for our given example, here is the parent vector:
d ← (0  1  2  1  2  3  2  1  2  3  3  2  3  3  2) │ depths
    (∘  a  p  b  q  v  r  c  s  w  x  t  y  z  u) │ values
p ← (∘  ∘  a  ∘  b  q  b  ∘  c  s  s  c  t  t  c) │ parents
    (0  1  2  3  4  5  6  7  8  9 10 11 12 13 14) | indexes
P ← (0  0  1  0  3  4  3  0  7  8  8  7 11 11  7) │ parent indices



              ∘:0                               0
┌──────────┬──┴─────────────────┐
a:1        b:3                 c:7              1
│      ┌───┴───┐     ┌──────────┼───────┐
p:2    q:4     r:6   s:8        t:11    u:14    2
       │             │          │
       │          ┌──┴──┐     ┌─┴───┐
       v:5        w:9   x:10  y:12  z:13        3
The condition a parent vector must satisfy is:
∧/V ∊(⍳≢V) ⍝ [All elements of V belong in the list [1..len(V)] ]
  • V ∊ (⍳≢V) will be a list of whether each element in v belongs ( ) to the list (⍳≢V) = [1..len(V)]
  • Recall that / is for reduction, and ∧/ is a boolean AND reduction. Hence, we compute whether each element of the vector V is in the range [1..len(V)].
  • We add the constraint that root notes that don't have a parent simply point to themselves. This allows us to free ourselves from requiring some kind of nullptr check.
The root node (parent of all elements) can be found using the fixpoint operator ( ):
I←{(⊂⍵)⌷⍺} ⍝ index into the left hand side param using right hand side param
I⍣≡⍨p ⍝ compute the fixpoint of the I operator using ⍨ and apply it to p

§ Converting from depth vector to parent vector, Take 1

As usual, let's consider our example:
d ← (0  1  2  1  2  3  2  1  2  3  3  2  3  3  2) │ depths
    (∘  a  p  b  q  v  r  c  s  w  x  t  y  z  u) │ values
p ← (∘  ∘  a  ∘  b  q  b  ∘  c  s  s  c  t  t  c) │ parents
    (0  1  2  3  4  5  6  7  8  9 10 11 12 13 14) | indexes
P ← (0  0  1  0  3  4  3  0  7  8  8  7 11 11  7) │ parent indices



              ∘:0                               0
┌──────────┬──┴─────────────────┐
a:1        b:3                 c:7              1
│      ┌───┴───┐     ┌──────────┼───────┐
p:2    q:4     r:6   s:8        t:11    u:14    2
       │             │          │
       │          ┌──┴──┐     ┌─┴───┐
       v:5        w:9   x:10  y:12  z:13        3
Note that the depth vector already encodes parent-child information.
  • The parent of node i is a node j such that d[j] = d[i] - 1 and j is the closest index to the left of i such that this happens.
For example, to compute the parent of t:11, notice that it's at depth 2. So we should find all the nodes from d[0..11] which have depths equal to 2, and then pick the rightmost one. This translates to the expression:
$ d ← (0  1  2  1  2  3  2  1  2  3  3  2  3  3  2)
$ t ← 11   ⍝ target node
$ ixs ← ⍳t   ⍝ array indexes upto this node
  0 1 2 3 4 5 6 7 8 9 10
$ d[ixs]   ⍝ depths of nodes to the left of the given node t
  0 1 2 1 2 3 2 1 2 3 3
$ d[ixs]  = d[t]-1  ⍝ boolean vector of nodes whose depth is that of t's parent
  0 1 0 1 0 0 0 1 0 0 0
$ eqds ← ⍸ (d[ixs] = d[t]-1)  ⍝ array indexes of nodes whose depth is that of t's parent
  1 3 7
$ ⌽ eqds ⍝ reverse of array indexes to extract `7`
  7 3 1
$ ⊃ ⌽ eqds ⍝ first of the reverse of the array indexes to extract `7`
  7
$ (⌽⍸(d[⍳t] = d[t]-1))[0]  ⍝ APL style one-liner of the above
While this is intuitive, this does not scale: It does not permit us to find the parent of all the nodes at once --- ie, it is not parallelisable over choices of t.

§ Converting from depth vector to parent vector, Take 2 (Or scan idiom)

Imagine we have a list of 0s and 1s, and we want to find the index of the rightmost 1 value. For example, given:
       0 1 2 3 4 5 6 7 8 9 10 11 12
$ a ← (0 0 1 0 0 0 1 0 1 0  0  0  0)
we want the answer to be f a = 8. We saw an implementation in terms of f←{(⌽⍸⍵)[0]} in Take 1. (recall that is the symbol for the right-hand-side argument of a function). We're going to perform the same operation slightly differently. Let's consider the series of transformations:
⍝      0 1 2 3 4 5 6 7 8 9 10 11 12
$ a ← (0 0 1 0 0 0 1 0 1 0  0  0  0) ⍝ original array

$ ⌽a  ⍝ reverse of a
  0 0 0 0 1 0 1 0 0 0 1 0 0

$ ∨\ ⌽a ⍝ prefix scan(\) using the OR(∨) operator. Turn all
        ⍝ entries after the first 1 into a 1
  0 0 0 0 1 1 1 1 1 1 1 1 1

$ +/ (∨\ ⌽a)  ⍝ sum over the previous list, counting number of 1s
  9

$ ¯1 +  (+/ (∨\ ⌽a))  ⍝ subtract 1 from the previous number
  8
Why the hell does this work? Well, here's the proof:
  • On running ⌽a, we reverse the a. The last 1 of a at index iibecomes the first 11 of ⌽a at index inii' \equiv n-i.
  • On running ∨\ ⌽a, numbers including and after the first 1 become 1. That is, all indexes jij \geq i' have 1 in them.
  • On running +/ (∨\ ⌽a), we sum up all 1s. This will give us ni+1n-i'+1 1s. That is, ni+1=n(ni)+1=i+1n-i'+1 = n-(n-i)+1 =i+1.
  • We subtract a 11 to correctly find the ii from i+1i+1.
This technique will work for every row of a matrix . This is paramount, since we can now repeat this for the depth vector we were previously interested in for each row, and thereby compute the parent index!

§ Converting from depth vector to parent vector, Take 3 (full matrix)

We want to extend the previous method we hit upon to compute the parents of all nodes in parallel. To perform this, we need to run the moral equivalent of the following:
$ ⎕IO ← 0 ⍝ 0 indexing
$ d ← (0  1  2  1  2  3  2  1  2  3  3  2  3  3  2) ⍝ depth vector
$ t ← 11 ⍝ node we are interested in
$ a←d[⍳t]=d[t]-1  ⍝ boolean vector of nodes whose depth is that of t's parent
  0 1 0 1 0 0 0 1 0 0 0
$ ¯1 +  (+/ (∨\ ⌽a)) ⍝ index of last 0 of boolean vector
7
for every single choice of t . To perform this, we can build a 2D matrix of d[⍳t]=d[t]-1 where t ranges over [0..len(d)-1] (ie, it ranges over all the nodes in the graph). We begin by using:
$ ⎕IO ← 0 ⋄ d ← (0  1  2  1  2  3  2  1  2  3  3  2  3  3  2) ⍝ depths
$ ]display ltdepth ← d ∘.> d ⍝ find `d[i] > d[j]` for all i, j.
┌→────────────────────────────┐
↓0 0 0 0 0 0 0 0 0 0 0 0 0 0 0│
│1 0 0 0 0 0 0 0 0 0 0 0 0 0 0│
│1 1 0 1 0 0 0 1 0 0 0 0 0 0 0│
│1 0 0 0 0 0 0 0 0 0 0 0 0 0 0│
│1 1 0 1 0 0 0 1 0 0 0 0 0 0 0│
│1 1 1 1 1 0 1 1 1 0 0 1 0 0 1│
│1 1 0 1 0 0 0 1 0 0 0 0 0 0 0│
│1 0 0 0 0 0 0 0 0 0 0 0 0 0 0│
│1 1 0 1 0 0 0 1 0 0 0 0 0 0 0│
│1 1 1 1 1 0 1 1 1 0 0 1 0 0 1│
│1 1 1 1 1 0 1 1 1 0 0 1 0 0 1│
│1 1 0 1 0 0 0 1 0 0 0 0 0 0 0│
│1 1 1 1 1 0 1 1 1 0 0 1 0 0 1│
│1 1 1 1 1 0 1 1 1 0 0 1 0 0 1│
│1 1 0 1 0 0 0 1 0 0 0 0 0 0 0│
└~────────────────────────────┘
  • Note that gt[i][j] = 1 iff d[j] < d[i]. So, for a given row ( i = fixed), the 1snodes that are at lower depth (ie, potential parents).
  • If we mask this to only have those indeces where j <= i, then the last one in each row will be such that d[last 1] = d[i] - 1. Why? Because the node that is closest to us with a depth less than us must be our parent, in the preorder traversal.
$ ⎕IO ← 0 ⋄ d ← (0  1  2  1  2  3  2  1  2  3  3  2  3  3  2) ⍝ depths
$ ]display left ←  (⍳3) ∘.> (⍳3) ⍝ find `i > j` for all i, j.
┌→────┐
↓0 0 0│
│1 0 0│
│1 1 0│
└~────┘
Combining the three techniques, we can arrive at:
$ ⎕IO ← 0 ⋄ d ← (0  1  2  1  2  3  2  1  2  3  3  2  3  3  2) ⍝ depths
$ ltdepth ← d ∘.> d ⍝ find `d[i] > d[j]` for all i, j.
$ preds ←  (⍳≢d) ∘.> (⍳≢d) ⍝ predecessors: find `i > j` for all i, j.
$ pred_higher ←  ltdepth ∧ left   ⍝ predecessors tht are higher in the tree
$  parents_take_3 ← ¯1 +  +/∨\⌽pred_higher  ⍝ previous idiom for finding last 1.
¯1 0 1 0 3 4 3 0 7 8 8 7 11 11 7
For comparison, the actual value is:
    (0   1  2  3  4  5  6  7  8  9 10 11 12 13 14)  | indexes
d ← (0   1  2  1  2  3  2  1  2  3  3  2  3  3  2)  │ depths
P ← (0   0  1  0  3  4  3  0  7  8  8  7 11 11  7)  │ parent indices
    (¯1  0  1  0  3  4  3  0  7  8  8  7 11 11  7) | parents, take 3

              ∘:0                               0
┌──────────┬──┴─────────────────┐
a:1        b:3                 c:7              1
│      ┌───┴───┐     ┌──────────┼───────┐
p:2    q:4     r:6   s:8        t:11    u:14    2
       │             │          │
       │          ┌──┴──┐     ┌─┴───┐
       v:5        w:9   x:10  y:12  z:13        3
We have an off-by-one error for the 0 node! That's easily fixed, we simply perform a maximum with 0 to move ¯1 -> 0:
$  parents_take_3 ← 0⌈  ¯1 +  +/∨\⌽pred_higher
0 0 1 0 3 4 3 0 7 8 8 7 11 11 7
So, that's our function:
parents_take_3 ← 0⌈  ¯1 +  +/∨\⌽ ((d∘.>d) ∧ (⍳≢d)∘.>(⍳≢d))
0 0 1 0 3 4 3 0 7 8 8 7 11 11 7
Note that the time complexity for this is dominated by having to calculate the outer products, which even given infinite parallelism, take O(n)O(n) time. We will slowly chip away at this, to be far better.

§ Converting from depth vector to parent vector, Take 4 (log critial depth)

We will use the Key( ) operator which allows us to create key value pairs.
$ d ← 0 1 2 1 2 3 2 1 2 3 3 2 3 3 2
$ ]disp (⍳≢d) ,¨ d ⍝ zip d with indexes
┌→──┬───┬───┬───┬───┬───┬───┬───┬───┬───┬────┬────┬────┬────┬────┐
│0 0│1 1│2 2│3 1│4 2│5 3│6 2│7 1│8 2│9 3│10 3│11 2│12 3│13 3│14 2│
└~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~─→┴~──→┴~──→┴~──→┴~──→┴~──→┘
$ d ← 0 1 2 1 2 3 2 1 2 3 3 2 3 3 2
$ ]display b ← {⍺ ⍵}⌸d  ⍝ each row i has tuple (i, js): d[js] = i
┌→──────────────────┐
↓   ┌→┐             │
│ 0 │0│             │
│   └~┘             │
│   ┌→────┐         │
│ 1 │1 3 7│         │
│   └~────┘         │
│   ┌→────────────┐ │
│ 2 │2 4 6 8 11 14│ │
│   └~────────────┘ │
│   ┌→───────────┐  │
│ 3 │5 9 10 12 13│  │
│   └~───────────┘  │
└∊──────────────────┘
In fact, it allows us to apply an arbitrary function to combine keys and values. We will use a function that simply returns all the values for each key.
$ d ← 0 1 2 1 2 3 2 1 2 3 3 2 3 3 2
$ ]display b ← {⍵}⌸d ⍝ each row i contains values j such that d[j] = i.
┌→──────────────┐
↓0 0  0  0  0  0│
│1 3  7  0  0  0│
│2 4  6  8 11 14│
│5 9 10 12 13  0│
└~──────────────┘
Our first try doesn't quite work: it winds up trying to create a numeric matrix, which means that we can't have different rows of different sizes. So, the information that only index 0 is such that d[0] = 0 is lost. What we can to is to wrap the keys to arrive at:
$ d ← 0 1 2 1 2 3 2 1 2 3 3 2 3 3 2
$ ]display b ← {⊂⍵}⌸d ⍝ d[b[i]] = i
┌→───────────────────────────────────────────┐
│ ┌→┐ ┌→────┐ ┌→────────────┐ ┌→───────────┐ │
│ │0│ │1 3 7│ │2 4 6 8 11 14│ │5 9 10 12 13│ │
│ └~┘ └~────┘ └~────────────┘ └~───────────┘ │
└∊───────────────────────────────────────────┘
Consider the groups b[2] = (2 4 6 8 11 14) and b[3] = (5 9 10 12 13). All of 3's parents are present in 2. Every element in 3 fits at some location in 2. Here is what the fit would look like:
b[2]  2 4 _ 6 8 _  _ 11 __ __ 14   (nodes of depth 2)
b[3]      5     9  10   12 13      (leaf nodes)
          4     8   8   11 11      (parents: predecessor of b[3] in b[2])

              ∘:0                               0
┌──────────┬──┴─────────────────┐
a:1        b:3                 c:7              1
│      ┌───┴───┐     ┌──────────┼───────┐
p:2    q:4     r:6   s:8        t:11    u:14    2
       │             │          │
       │          ┌──┴──┐     ┌─┴───┐
       v:5        w:9   x:10  y:12  z:13        3
We use the Interval Index( ) operator to solve the problem of finding the parent / where we should sqeeze a node from b[3] into b[2] (This is formally known as the predecessor problem )
⍝ left[a[i]] is closest number < right[i]
⍝ left[a[i]] is the predecessor of right[i] in left[i].
$ a ← (1 10 100 1000) ⍸ (1 2000 300 50 2 )
0 3 2 1 0
Now, we can use the technology of predecessor to find parents of depth 3 nodes among the depth 2 nodes:
$ depth2 ← 2 4 6 8 11 14
$ depth3 ← 5 9 10 12 13 ⍝ parents (from chart): 4 8 8 11 11
$ depth3parentixs ← depth2 ⍸ depth3
$ depth3parents  ← depth2[depth3parentixs]
4 8 8 11 11

              ∘:0                               0
┌──────────┬──┴─────────────────┐
a:1        b:3                 c:7              1
│      ┌───┴───┐     ┌──────────┼───────┐
p:2    q:4     r:6   s:8        t:11    u:14    2
       │             │          │
       │          ┌──┴──┐     ┌─┴───┐
       v:5        w:9   x:10  y:12  z:13        3
We need to know one-more APL-ism: the 2-scan. When we write a usual scan operation, we have:
$ ⍳5
1 2 3 4 5
$ +/⍳5 ⍝ reduce
15
$ 2+/⍳5 ⍝ apply + to _pairs_ (2 = pairs)
3 5 7 9 ⍝ (1+2) (2+3) (3+4) (4+5)
$ 3+/⍳5 ⍝  apply + to 3-tuples
6 9 12 ⍝ (1+2+3) (2+3+4) (3+4+5)
We begin by assuming the parent of i is i by using p←⍳≢d.
$ d ← (0 1 2 1 2 3 2 1 2 3 3 2 3 3 2)
$ d2nodes ← {⊂⍵}⌸d
┌→┬─────┬─────────────┬─────────────┐
│1│2 4 8│3 5 7 9 12 15│6 10 11 13 14│
└→┴~───→┴~───────────→┴~───────────→┘
$ p←⍳≢d
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Now comes the biggie:
$ findparent ← {parentixs ← ⍺⍸⍵ ⋄ p[⍵]←⍺[parentixs]}
  • is the list of parent nodes.
  • is the list of current child nodes.
  • We first find the indexes of our parent nodes by using the pix ← parent ⍸ child idiom.
  • Then, we find the actual parents by indexing into the parent list: pix[parentixs].
  • We write these into the parents of the child using: p[children] ← parent[parent ⍸ child]
This finally culminates in:
$ d←0 1 2 1 2 3 2 1 2 3 3 2 3 3 2
$ p←⍳≢d ⋄ d2nodes←{⊂⍵}⌸d ⋄ findp←{pix ← ⍺⍸⍵ ⋄ p[⍵]←⍺[pix]} ⋄ 2findp/d2nodes ⋄ p
0 0 1 0 3 4 3 0 7 8 8 7 11 11 7


    (0   1  2  3  4  5  6  7  8  9 10 11 12 13 14)  | indexes
d ← (0   1  2  1  2  3  2  1  2  3  3  2  3  3  2)  │ depths
P ← (0   0  1  0  3  4  3  0  7  8  8  7 11 11  7)  │ parent indices
              ∘:0                               0
┌──────────┬──┴─────────────────┐
a:1        b:3                 c:7              1
│      ┌───┴───┐     ┌──────────┼───────┐
p:2    q:4     r:6   s:8        t:11    u:14    2
       │             │          │
       │          ┌──┴──┐     ┌─┴───┐
       v:5        w:9   x:10  y:12  z:13        3
Which can be further golfed to:
$ p⊣2{p[⍵]←⍺[⍺⍸⍵]}⌿⊢∘⊂⌸d⊣p←⍳≢d
0 0 1 0 3 4 3 0 7 8 8 7 11 11 7
The total time complexity of this method assuming infinite parallelism is as follows:
$ p←⍳≢d ⋄ d2nodes←{⊂⍵}⌸d ⋄ findp←{pix ← ⍺⍸⍵ ⋄ p[⍵]←⍺[pix]} ⋄ 2findp/d2nodes ⋄ p
  • (p←⍳≢d) can be filled in O(1) time.
  • (d2nodes←{⊂⍵}⌸d) is searching for keys in a small integer domain, so this is O(#nodes) using radix sort as far as I know. However, the thesis mentions that this can be done in O(log(|#nodes|)). I'm not sure how, I need to learn this.
  • For each call of findp, the call (pix ← ⍺⍸⍵) can be implemented using binary search leading to a logarthmic complexity in the size of (since we are looking up for predecessors of in ).
  • The time complexity of the fold 2findp/d2nodes can be done entirely in parallel since all the writes into the p vector are independent: we only write the parent of the current node we are looking at.

§ 3.4: Computing nearest Parent by predicate

I'm going to simplify the original presentation by quite a bit.
     a b c d e f g h i  | names
     0 1 2 3 4 5 6 7 8  | indexes
P ← (0 0 1 2 0 4 5 6 7) | parents
X ← (0 1 0 0 1 1 0 0 0) | marked nodes

     a:0
┌────┴───┐
b:1(X)   e:4(X)
|        |
c:2      f:5(X)
|        |
d:3      g:6
         │
         h:7
         |
         i:8
We want to find nodes marked as X that are the closest parents to a given node. The X vector is a boolean vector that has a 1 at the index of each X node: (b, e, f). So, the indexes (1, 4, 5) are 1 in the X vector. The output we want is the vector:
      0 1 2 3 4 5 6 7 8  | indexes
      a b c d e f g h i  | names
PX ← (0 0 1 1 0 4 5 5 5) | closest X parent index
      a a b b a e f f f  | closest X parent name

    a:0
┌────┴───┐
b:1(X)   e:4(X)
|        |
c:2      f:5(X)
|        |
d:3      g:6
         │
         h:7
         |
         i:8
The incantation is:
$ I←{(⊂⍵)⌷⍺} ⍝ index LHS by RHS | (100 101 102 103)[(3 1 2)] := 103 101 102
$ PX ← P I@{X[⍵]≠1} ⍣ ≡ P
0 0 1 1 0 4 5 5 5
TODO. At any rate, since this does not require any writes and purely reads, and nor does it need any synchronization, this is fairly straightforward to implement on the GPU.

§ 3.5: Lifting subtrees to the root

Once we have marked our X nodes, we now wish to lift entire subtrees of X up to the root.
  • This pass displays how to lift subtrees and add new nodes to replace the subtree's original nodes.
  • Luckily, there are no sibling relationships that need to be maintained since we are uprooting an entire subtree.
  • There are no ordering constraints on how the subtrees should be arranged at the top.
  • Hence, we can simply add new nodes to the end of the tree (in terms of the preorder traversal). Adding to the middle of the tree will be discussed later.
There is some good advice in the thesis:
When using APL primitives this way, it may be good to map their names and definitions to the domain of trees. For example, the primitive ⍸Predicate is read as "the nodes where Predicate holds" and not as "the indexes where Predicate is 1".
For example, given the tree:
      0 1 2 3 4 5  | indexes
      a b c d e f  | names
P  ← (0 0 1 0 3 4) | parents
X  ← (0 1 0 1 1 0) | X nodes
PX ← (0 0 1 0 3 4) | closest X parent index

    a:0
┌────┴───┐
b:1(X)   d:3(X)
|        |
c:2      e:4(X)
         |
         f:5
we want the transformed tree to be:
    a:0
┌────┴───┐
bp:1(X)   ep:4(X)
---------
b:1(X)
|
c:2
---------
e:4
|
fp:5
---------
f:5(X)
|
g:6
We first look for nodes that need to be lifted. There are:
  • Non-root nodes (ie, nodes whose parents are not themselves: p≠(⍳≢p))
  • Which have the property X.
nodes←⍸(X ∧ p≠(⍳≢p))  ⍝ ⍸:pick indexes.

§ 3.6: Wrapping Expressions

§ 3.7: Lifting Guard Test Exprsessions

§ 3.8: Couting rank of index operators

§ 3.9: Flattening Expressions

§ 3.10: Associating Frame slots and variables

§ 3.11: Placing frames into a lexical stack

§ 3.12: Recording Exported names

§ 3.13: Lexical Resolution

§ 5.2.1 Traversal Idioms

§ 5.2.2 Edge Mutation Idioms

§ 5.2.3 Node Mutation Idioms