GPU Programming
The checked GPU artifact, verified GPU syntax, and attested backend status.
GPU Programming
Sounio has two GPU compiler paths in this repo snapshot:
souc-linux-x86_64(the default self-hosted artifact): compiles GPU syntax, runs serial CPU fallback, and letskretikosbuild predefined PTX/CUBIN artifact templates.souc-linux-x86_64-gpu(the GPU-facing artifact): separate binary with broader PTX emission viabuild --backend gpu.
Self-hosted GPU path (CPU fallback + artifact templates)
The default compiler (bin/souc or bin/souc-linux-x86_64) accepts GPU kernel syntax and runs a deterministic serial CPU fallback. The Kretikos CLI uses that compiler to build tiny in-tree emitter drivers, then writes predefined GPU artifact templates:
# Emit predefined PTX templates (self-hosted, no GPU required)
kretikos emit-ptx vec_add -o /tmp/kernel.ptx
kretikos emit-ptx vec_sub -o /tmp/kernel.ptx
kretikos emit-ptx vec_mul -o /tmp/kernel.ptx
kretikos emit-ptx vec_div -o /tmp/kernel.ptx
kretikos emit-ptx vec_add_f64 -o /tmp/kernel.ptx
kretikos emit-ptx fma -o /tmp/kernel.ptx
kretikos emit-ptx fma_f64 -o /tmp/kernel.ptx
kretikos emit-ptx store_u32_const -o /tmp/kernel.ptx
# Emit predefined Metal/MSL templates (self-hosted, no macOS required)
kretikos emit-metal vec_add -o /tmp/kernel.metal
kretikos emit-metal ossm_oct_step -o /tmp/kernel.metal
kretikos emit-metal sedenion_cd_step -o /tmp/kernel.metal
# Emit predefined CUBIN templates
kretikos emit-cubin exit_only -o /tmp/kernel.cubin
kretikos emit-cubin store_u32_const -o /tmp/kernel.cubin
kretikos emit-cubin vec_add_f32 -o /tmp/kernel.cubin
kretikos emit-cubin epistemic_dual_f32 -o /tmp/kernel.cubin
# Emit a structural artifact bundle: PTX, CUBIN, hashes, boundaries
kretikos bundle -o /tmp/kretikos-bundle
# Optionally record assembler/disassembler and CUDA Driver API evidence
kretikos bundle -o /tmp/kretikos-validated-bundle --validate-toolchain --validate-runtime
What that proves today:
- The self-hosted compiler compiles
kernel fn,with GPU,GPU.launch, andGPU.sync. - The CPU fallback runs kernels serially with deterministic thread/block IDs.
- The
kretikosCLI wrapper emits predefined PTX text and NVIDIA CUDA ELF/CUBIN byte templates from in-tree Sounio code. - The
kretikos bundlecommand writes a machine-readablesounio.kretikos.bundle.v1manifest with artifact hashes, structural checks, optionalptxas/nvdisasmand CUDA Driver API validation, and explicit non-claims. - This path does not yet lower arbitrary user kernel source into PTX/CUBIN.
- Toolchain and runtime validation apply only to the predefined PTX/CUBIN templates selected by
kretikos; they do not validate arbitrary user-written kernels.
GPU artifact path (broader PTX emission)
The checked GPU artifact is a separate binary with broader pattern support:
export SOUC_GPU_BIN="$(pwd)/artifacts/omega/souc-bin/souc-linux-x86_64-gpu"
export SOUNIO_STDLIB_PATH="$(pwd)/stdlib"
"$SOUC_GPU_BIN" info
"$SOUC_GPU_BIN" check examples/gpu.sio
"$SOUC_GPU_BIN" check tests/run-pass/gpu_launch_surface.sio
"$SOUC_GPU_BIN" build examples/kernel_matmul.sio --backend gpu -o /tmp/kernel_matmul.ptx
Public surface versus source-tree surface
The checked public GPU artifact currently accepts:
kernel fnwith GPUperform GPU.launch(...)perform GPU.sync()- PTX emission through
build --backend gpu
The self-hosted compiler (bin/souc) also accepts the same syntax and additionally routes:
- predefined PTX templates through
kretikos emit-ptx - predefined CUBIN templates through
kretikos emit-cubin - structural PTX/CUBIN artifact bundles through
kretikos bundle - optional fail-closed toolchain/runtime checks through
kretikos bundle --require-toolchainandkretikos bundle --require-runtime
The checked public GPU artifact does not yet resolve the older gpu.*
intrinsic namespace from historical sketches:
gpu.thread_id.*gpu.block_id.*gpu.block_dim.*gpu.alloc<T>(...)
Those names still matter to the implementation story, but they are not yet the recommended public syntax.
Backend evidence
The strongest GPU evidence in the repo is under artifacts/omega/:
gpu_codegen_parity.v1.jsongpu_binary_attestation.v1.jsongpu_runtime_attest_gate.v1.jsongpu_public_contract.v1.json
Current attested compute lanes:
- CUDA:
cuda-sm80 - ROCm:
rocm-gfx942
Where the bigger GPU implementation lives
self-hosted/gpu/contains PTX, SPIR-V, Metal, runtime, tensor, and tuning work.docs/features/GPU_RUNTIME.mdis the repo-native explanation of the current contract.- The self-hosted tree still contains an internal
gpu-emitpath, but the checked public CLI path isbuild --backend gpu.