A Universe of Sorts

§ Training a custom model for Lean4

Bert: 111m (0.1BN)
Gato : 1.2 BN
GPT 2: 1.5 billion
DALL-e: 12 billion
Codex: 12 billion
GPT 3: 172 BN
Leela zero: 2 GB --- 0.5 million
Jax
GPT 3 on CS-2
Zero algorithm
composer versus deepspeed versus fairscale
linformer for lean4? can attend to the entire file? 10^6 attention from 10^3, a million. That's number of lines in mathlib. But how do we generate correct proofs of weird looking compilers statements? what do we start with?