The compiler that
compiles itself.
souc is written in Sounio. Bootstrap starts with a C-compiled seed binary, runs through three stages, and terminates when Stage 2 and Stage 3 produce SHA-256-identical outputs. The fixed point is the proof of correctness.
Three-stage bootstrap.
No trust in the seed binary is required. Reproducibility is the guarantee:
if gen2.elf and gen3.elf differ, the compiler has a bug.
Stage 0
C-compiled seed
A minimal C implementation of the Sounio parser and emitter. Its sole
job is to compile lean_single.sio into a working binary.
Not trusted beyond Stage 1.
Stage 1
First self-hosted
The seed binary compiles lean_single.sio into
gen1.elf. This is the first Sounio binary compiled by Sounio.
Stage 2 ≡ Stage 3
Fixed-point verified
gen1.elf → gen2.elf, then
gen2.elf → gen3.elf. The SHA-256
of both must be identical. This is the compiler's self-consistency proof.
Run it yourself
# Stage 0 → 1: C seed compiles the Sounio compiler
./artifacts/self-hosted/souc-self-hosted-x86_64 \
self-hosted/compiler/lean_single.sio gen1.elf
# Stage 1 → 2: first self-hosted compile
./gen1.elf self-hosted/compiler/lean_single.sio gen2.elf
# Stage 2 → 3: fixed-point check
./gen2.elf self-hosted/compiler/lean_single.sio gen3.elf
# These two SHA-256 digests must be identical.
sha256sum gen2.elf gen3.elf Four compiler phases.
Each phase is a separate module in self-hosted/compiler/.
Epistemic types are tracked through every phase; the confidence lattice
is never erased between HLIR and emit.
Parse
parser/items.sio
Recursive-descent parser. Produces an unresolved AST including
effect annotations, where clauses, and algebra declarations.
No implicit token recovery — ill-formed input is a hard error.
Check
check/mod.sio
Type checker with bidirectional inference. Resolves Knowledge<T>
widening, effect-row unification, and confidence gate constraints.
All epistemic obligations are discharged here or rejected with a typed error.
HLIR
hlir/mod.sio
High-level intermediate representation. E-graph rewriting with 1000+ rules runs here. GUM uncertainty propagation is lowered into explicit variance-carrying nodes before emit.
Emit
emit/mod.sio
Backend dispatcher. Routes to x86_64 native (System V ABI), ARM64 (verified under QEMU), or K-AXI (Kretikos assembler → PTX → CUDA). Linear types enforce that every resource is consumed exactly once.
Three emit targets.
All three targets share the same HLIR and the same type checker. The epistemic layer is not a runtime library — it is lowered into the same machine code that carries the computation.
x86_64
Shipped
System V ABI. The primary development and production target. souc itself runs here. Knowledge<T> confidence is laid out as a struct field; GUM propagation compiles to scalar arithmetic.
ARM64
Verified
AAPCS64 ABI. Cross-compiled from x86_64 and verified under QEMU user-static. Branch encodings and calling conventions audited against the ARM Architecture Reference Manual.
PTX / K-AXI
Active
GPU path via Kretikos: Sounio HLIR → K-AXI IR → PTX assembly → ptxas → CUDA. Epistemic variance propagation runs inside GPU kernels. Validated on RTX A5000 (24 GB). Effect annotations gate kernel dispatch.
Lean 4 for the hard proofs.
Where the type checker enforces, Lean verifies. The two key properties
of the epistemic type system — GUM conservativity and confidence
monotonicity — are stated as theorems and proved in
formal/EpistemicEffects.lean.
gum_conservativity
Proved
GUM uncertainty propagation through arithmetic never increases confidence. A composition of epistemic operations cannot manufacture certainty that wasn't present in the inputs.
confidence_monotonicity
Proved
Widening a Knowledge<T> value (increasing ε) cannot
produce a narrower confidence interval. The ordering on the confidence
lattice is monotone through all type operations.
substitution_lemma
Open obligation
Substitution of epistemic values in effect-typed terms preserves the effect row. Proof structure is in place; one indexed-induction obligation remains.
Source:
formal/lean4/EpistemicEffects.lean
(589 lines, 0 sorry).