ยง Training a custom model for Lean4
- Bert: 111m (0.1BN)
- Gato : 1.2 BN
- GPT 2: 1.5 billion
- DALL-e: 12 billion
- Codex: 12 billion
- GPT 3: 172 BN
- Leela zero: 2 GB --- 0.5 million
- Jax
- GPT 3 on CS-2
- Zero algorithm
- composer versus deepspeed versus fairscale
- linformer for lean4? can attend to the entire file? 10^6 attention from 10^3, a million. That's number of lines in mathlib. But how do we generate correct proofs of weird looking compilers statements? what do we start with?