GPU Programming

The checked GPU artifact, verified GPU syntax, and attested backend status.

GPU Programming

Sounio has two GPU compiler paths in this repo snapshot:

  • souc-linux-x86_64 (the default self-hosted artifact): compiles GPU syntax, runs serial CPU fallback, and lets kretikos build predefined PTX/CUBIN artifact templates.
  • souc-linux-x86_64-gpu (the GPU-facing artifact): separate binary with broader PTX emission via build --backend gpu.

Self-hosted GPU path (CPU fallback + artifact templates)

The default compiler (bin/souc or bin/souc-linux-x86_64) accepts GPU kernel syntax and runs a deterministic serial CPU fallback. The Kretikos CLI uses that compiler to build tiny in-tree emitter drivers, then writes predefined GPU artifact templates:

# Emit predefined PTX templates (self-hosted, no GPU required)
kretikos emit-ptx vec_add     -o /tmp/kernel.ptx
kretikos emit-ptx vec_sub     -o /tmp/kernel.ptx
kretikos emit-ptx vec_mul     -o /tmp/kernel.ptx
kretikos emit-ptx vec_div     -o /tmp/kernel.ptx
kretikos emit-ptx vec_add_f64 -o /tmp/kernel.ptx
kretikos emit-ptx fma         -o /tmp/kernel.ptx
kretikos emit-ptx fma_f64     -o /tmp/kernel.ptx
kretikos emit-ptx store_u32_const -o /tmp/kernel.ptx

# Emit predefined Metal/MSL templates (self-hosted, no macOS required)
kretikos emit-metal vec_add          -o /tmp/kernel.metal
kretikos emit-metal ossm_oct_step    -o /tmp/kernel.metal
kretikos emit-metal sedenion_cd_step -o /tmp/kernel.metal

# Emit predefined CUBIN templates
kretikos emit-cubin exit_only -o /tmp/kernel.cubin
kretikos emit-cubin store_u32_const -o /tmp/kernel.cubin
kretikos emit-cubin vec_add_f32 -o /tmp/kernel.cubin
kretikos emit-cubin epistemic_dual_f32 -o /tmp/kernel.cubin

# Emit a structural artifact bundle: PTX, CUBIN, hashes, boundaries
kretikos bundle -o /tmp/kretikos-bundle

# Optionally record assembler/disassembler and CUDA Driver API evidence
kretikos bundle -o /tmp/kretikos-validated-bundle --validate-toolchain --validate-runtime

What that proves today:

  • The self-hosted compiler compiles kernel fn, with GPU, GPU.launch, and GPU.sync.
  • The CPU fallback runs kernels serially with deterministic thread/block IDs.
  • The kretikos CLI wrapper emits predefined PTX text and NVIDIA CUDA ELF/CUBIN byte templates from in-tree Sounio code.
  • The kretikos bundle command writes a machine-readable sounio.kretikos.bundle.v1 manifest with artifact hashes, structural checks, optional ptxas/nvdisasm and CUDA Driver API validation, and explicit non-claims.
  • This path does not yet lower arbitrary user kernel source into PTX/CUBIN.
  • Toolchain and runtime validation apply only to the predefined PTX/CUBIN templates selected by kretikos; they do not validate arbitrary user-written kernels.

GPU artifact path (broader PTX emission)

The checked GPU artifact is a separate binary with broader pattern support:

export SOUC_GPU_BIN="$(pwd)/artifacts/omega/souc-bin/souc-linux-x86_64-gpu"
export SOUNIO_STDLIB_PATH="$(pwd)/stdlib"

"$SOUC_GPU_BIN" info
"$SOUC_GPU_BIN" check examples/gpu.sio
"$SOUC_GPU_BIN" check tests/run-pass/gpu_launch_surface.sio
"$SOUC_GPU_BIN" build examples/kernel_matmul.sio --backend gpu -o /tmp/kernel_matmul.ptx

Public surface versus source-tree surface

The checked public GPU artifact currently accepts:

  • kernel fn
  • with GPU
  • perform GPU.launch(...)
  • perform GPU.sync()
  • PTX emission through build --backend gpu

The self-hosted compiler (bin/souc) also accepts the same syntax and additionally routes:

  • predefined PTX templates through kretikos emit-ptx
  • predefined CUBIN templates through kretikos emit-cubin
  • structural PTX/CUBIN artifact bundles through kretikos bundle
  • optional fail-closed toolchain/runtime checks through kretikos bundle --require-toolchain and kretikos bundle --require-runtime

The checked public GPU artifact does not yet resolve the older gpu.* intrinsic namespace from historical sketches:

  • gpu.thread_id.*
  • gpu.block_id.*
  • gpu.block_dim.*
  • gpu.alloc<T>(...)

Those names still matter to the implementation story, but they are not yet the recommended public syntax.

Backend evidence

The strongest GPU evidence in the repo is under artifacts/omega/:

  • gpu_codegen_parity.v1.json
  • gpu_binary_attestation.v1.json
  • gpu_runtime_attest_gate.v1.json
  • gpu_public_contract.v1.json

Current attested compute lanes:

  • CUDA: cuda-sm80
  • ROCm: rocm-gfx942

Where the bigger GPU implementation lives

  • self-hosted/gpu/ contains PTX, SPIR-V, Metal, runtime, tensor, and tuning work.
  • docs/features/GPU_RUNTIME.md is the repo-native explanation of the current contract.
  • The self-hosted tree still contains an internal gpu-emit path, but the checked public CLI path is build --backend gpu.