ยง Lazy GPU programming
- All laziness is a program analysis problem, where we need to strictify.
- Lazy vectorization is a program analysis problem where we need to find "strict blocks/ strict chains". Something like "largest block of values that can be forced at once". Seems coinductive?
- Efficiency of lazy program is that of clairvoyant call by value, so we need to know how to force.
- In the PRAM model, efficiency of parallel lazy program is that of clairvoyanl call by parallel or something. We need to know how to run in parallel, such that if one diverges, then all diverges. This means it's safe to run together!
- What is parallel STG?
- PRAM: try to parallelize writes as much as possible
- PSTG: try to parallelize forcing as much as possible
-
Reads (free) ~ forces (conflicts in ||)
-
Write (conflict in ||) ~ create new data (free)
- What are equivalents of common pram models?
- Force is Boolean: either returns or does not return
- We need a finer version. Something like returns in k cycles or does not return?
- Old forced values: forced values with T; wait = Infinity; Old unobserved value = forced values with Twait = -1
- Think of call by push value, but can allocate forces and call "tick" which ticks the clock. The Twait is clocked wrt this tick.
- Tick controls live ranges. Maybe obviates GC.
- Tick 1 is expected to be known/forced.
- Optimize in space-time? Looking up a recurrence versus computing a recurrence. One is zero space infinite time, other is zero time infinite space.
- Another way to think about it: application says how many people need to ask for thunk to get value. Unused values say infinity, used values say zero
- Maybe think of these as deadlines for the compiler to meet? So it's telling the compiler to guarantee access in a certain number of ticks. This gives control over (abstract) time, like imperative gives control over abstract space?
- TARDIS autodiff is key example. As is fib list. Maybe frac.
- Thinking about design in the data constructor side:
- Twait = 0 in data structure means is present at compile time. Twait = 1 is strict. Twait = infty is function pointer. What is Twait = 2? Mu
- Can fuse kernels for all computations in the same parallel force. If one of them gets stuck, all of them get stuck. So parallel force is a syntactic way to ask for kernel fusion.
- Can we use UB to express things like "this list will be finite, thus map can be safely parallelised" or something?
- Have quantitative:
0,1,fin,inf
?