Compiler

The compiler that
compiles itself.

souc is written in Sounio. Bootstrap starts with a C-compiled seed binary, runs through three stages, and terminates when Stage 2 and Stage 3 produce SHA-256-identical outputs. The fixed point is the proof of correctness.

A

Three-stage bootstrap.

No trust in the seed binary is required. Reproducibility is the guarantee: if gen2.elf and gen3.elf differ, the compiler has a bug.

Stage 0

C-compiled seed

A minimal C implementation of the Sounio parser and emitter. Its sole job is to compile lean_single.sio into a working binary. Not trusted beyond Stage 1.

Stage 1

First self-hosted

The seed binary compiles lean_single.sio into gen1.elf. This is the first Sounio binary compiled by Sounio.

Stage 2 ≡ Stage 3

Fixed-point verified

gen1.elf → gen2.elf, then gen2.elf → gen3.elf. The SHA-256 of both must be identical. This is the compiler's self-consistency proof.

Run it yourself

# Stage 0 → 1: C seed compiles the Sounio compiler
./artifacts/self-hosted/souc-self-hosted-x86_64 \
    self-hosted/compiler/lean_single.sio gen1.elf

# Stage 1 → 2: first self-hosted compile
./gen1.elf self-hosted/compiler/lean_single.sio gen2.elf

# Stage 2 → 3: fixed-point check
./gen2.elf self-hosted/compiler/lean_single.sio gen3.elf

# These two SHA-256 digests must be identical.
sha256sum gen2.elf gen3.elf

B

Four compiler phases.

Each phase is a separate module in self-hosted/compiler/. Epistemic types are tracked through every phase; the confidence lattice is never erased between HLIR and emit.

Parse

parser/items.sio

Recursive-descent parser. Produces an unresolved AST including effect annotations, where clauses, and algebra declarations. No implicit token recovery — ill-formed input is a hard error.

Check

check/mod.sio

Type checker with bidirectional inference. Resolves Knowledge<T> widening, effect-row unification, and confidence gate constraints. All epistemic obligations are discharged here or rejected with a typed error.

HLIR

hlir/mod.sio

High-level intermediate representation. E-graph rewriting with 1000+ rules runs here. GUM uncertainty propagation is lowered into explicit variance-carrying nodes before emit.

Emit

emit/mod.sio

Backend dispatcher. Routes to x86_64 native (System V ABI), ARM64 (verified under QEMU), or K-AXI (Kretikos assembler → PTX → CUDA). Linear types enforce that every resource is consumed exactly once.

C

Three emit targets.

All three targets share the same HLIR and the same type checker. The epistemic layer is not a runtime library — it is lowered into the same machine code that carries the computation.

x86_64

Shipped

System V ABI. The primary development and production target. souc itself runs here. Knowledge<T> confidence is laid out as a struct field; GUM propagation compiles to scalar arithmetic.

ARM64

Verified

AAPCS64 ABI. Cross-compiled from x86_64 and verified under QEMU user-static. Branch encodings and calling conventions audited against the ARM Architecture Reference Manual.

PTX / K-AXI

Active

GPU path via Kretikos: Sounio HLIR → K-AXI IR → PTX assembly → ptxas → CUDA. Epistemic variance propagation runs inside GPU kernels. Validated on RTX A5000 (24 GB). Effect annotations gate kernel dispatch.

D

Lean 4 for the hard proofs.

Where the type checker enforces, Lean verifies. The two key properties of the epistemic type system — GUM conservativity and confidence monotonicity — are stated as theorems and proved in formal/EpistemicEffects.lean.

gum_conservativity

Proved

GUM uncertainty propagation through arithmetic never increases confidence. A composition of epistemic operations cannot manufacture certainty that wasn't present in the inputs.

confidence_monotonicity

Proved

Widening a Knowledge<T> value (increasing ε) cannot produce a narrower confidence interval. The ordering on the confidence lattice is monotone through all type operations.

substitution_lemma

Open obligation

Substitution of epistemic values in effect-typed terms preserves the effect row. Proof structure is in place; one indexed-induction obligation remains.

Source: formal/lean4/EpistemicEffects.lean (589 lines, 0 sorry).

The compiler thatcompiles itself.

Three-stage bootstrap.

Four compiler phases.

Three emit targets.

Lean 4 for the hard proofs.

The compiler that
compiles itself.