What this route is arguing
Kretikos is Sounio's named GPU compiler lane: checked source-level thread builtins, deterministic serial CPU fallback, in-tree PTX/CUBIN templates, and artifact bundles with hashes.
GPU compiler lane with checked thread builtins, serial CPU fallback, in-tree PTX/CUBIN templates, and artifact bundles.
What this route is arguing
Kretikos is Sounio's named GPU compiler lane: checked source-level thread builtins, deterministic serial CPU fallback, in-tree PTX/CUBIN templates, and artifact bundles with hashes.
A skeptical reader should ask
A skeptical reader should be able to run the public commands, inspect emitted artifacts, verify the bundle manifest, and see where CPU fallback stops short of GPU execution.
Boundary
The claim is not total GPU completeness or parallel CPU simulation. The claim is a bounded GPU compiler lane whose public story stops where the artifact trail stops.
Scope note: Kretikos is an active research lane. Clinical pharmacology (vancomycin ε gates, PBPK dissertation demo) is the primary public demo surface on this website — not GPU storefront features.
Kretikos (Κρητικός) = Cretan. Like Ariadne’s thread through the Labyrinth of Knossos, this is the named path for Sounio’s GPU compiler work: source-level thread builtins, serial CPU fallback, in-tree PTX/CUBIN templates, and artifact bundles.
Most languages wave at GPU ambition and hide the current boundary. Kretikos does the opposite: it exposes a bounded GPU surface as a first-class language feature, then ties each claim to an inspectable artifact or command.
gpu_thread_id_x(), gpu_block_id_x(), gpu_block_dim_x(), gpu_sync_threads()kretikos emit-ptx using the in-tree self-hosted/gpu/ptx.sio emitterkretikos emit-cubin using the in-tree self-hosted/gpu/nvidia_bare.sio emitterkretikos bundle, with PTX/CUBIN hashes and explicit non-claimsbin/souc/souc-linux-x86_64 compiler to build its small emitter driversself-hosted/gpu/A language serious about scientific software must not delegate GPU honesty to a black-box runtime. Kretikos demonstrates that Sounio can:
kernel fn vec_add(n: i64) with GPU {
let tid = gpu_thread_id_x()
let bid = gpu_block_id_x()
let bdim = gpu_block_dim_x()
let i = bid * bdim + tid
if i >= n { return }
// This thread owns element i.
}
fn main() with GPU, IO {
let grid = (16, 1, 1)
let block = (64, 1, 1)
perform GPU.launch(vec_add, grid, block)(1024)
perform GPU.sync()
}
This is the checked public shape today: 1D indexing plus deterministic CPU fallback. Broader y/z lowering and richer memory surfaces are separate promotion lanes.
# Compile and run on serial CPU fallback (no GPU hardware required)
souc examples/kernel_source_level.sio /tmp/kretikos_demo.elf
/tmp/kretikos_demo.elf
# Emit predefined PTX templates (self-hosted, no GPU required)
kretikos emit-ptx vec_add -o /tmp/kretikos.ptx
kretikos emit-ptx vec_sub -o /tmp/kretikos.ptx
kretikos emit-ptx vec_mul -o /tmp/kretikos.ptx
kretikos emit-ptx vec_div -o /tmp/kretikos.ptx
kretikos emit-ptx vec_add_f64 -o /tmp/kretikos.ptx
kretikos emit-ptx fma -o /tmp/kretikos.ptx
kretikos emit-ptx fma_f64 -o /tmp/kretikos.ptx
kretikos emit-ptx store_u32_const -o /tmp/kretikos.ptx
# Emit predefined Metal/MSL templates (self-hosted, no macOS required)
kretikos emit-metal vec_add -o /tmp/kretikos.metal
kretikos emit-metal ossm_oct_step -o /tmp/kretikos.metal
kretikos emit-metal sedenion_cd_step -o /tmp/kretikos.metal
# Emit the predefined vec_add_f32 CUBIN template
kretikos emit-cubin vec_add_f32 -o /tmp/kretikos.cubin
# Emit a structural artifact bundle with hashes and boundaries
kretikos bundle -o /tmp/kretikos-bundle
# Add optional toolchain/runtime validation when the host exposes those tools
kretikos bundle -o /tmp/kretikos-validated-bundle --validate-toolchain --validate-runtime
# Build PTX with the checked GPU artifact (broader pattern support)
export SOUC_GPU_BIN="$(pwd)/artifacts/omega/souc-bin/souc-linux-x86_64-gpu"
"$SOUC_GPU_BIN" build examples/kernel_source_level.sio --backend gpu -o /tmp/kretikos.ptx
GPU work is where language marketing goes to die. It is easy to sketch intrinsics or promise tensor-core support. It is much harder to keep a self-hosted compiler, a bounded GPU surface, and a public-facing narrative aligned while real bugs surface in stack alignment, launch parameters, and PTX module loading.
Kretikos is valuable not only because it accelerates compute, but because it forces the language to reveal whether its honesty survives contact with backend artifacts.
gpu_thread_id_x, gpu_block_id_x, gpu_block_dim_x, gpu_sync_threads — self-hosted compiler0; y/z block dimensions return 1kretikos emit-ptx with 6 patterns (vec_add, vec_sub, vec_mul, vec_div, vec_add_f64, fma); broader PTX emission through build --backend gpu (checked GPU artifact)kretikos emit-metal with 3 patterns (vec_add, ossm_oct_step, sedenion_cd_step) from in-tree self-hosted emitterkretikos emit-cubin (in-tree self-hosted emitter)kretikos bundle emits PTX+CUBIN plus sounio.kretikos.bundle.v1--validate-toolchain records ptxas/nvdisasm evidence, and --validate-runtime attempts the CUDA Driver API rung with exact not_run reasons on non-GPU hostsself-hosted/gpu/ptx.sio, self-hosted/gpu/ptx_advanced.sio, self-hosted/gpu/nvidia_bare.sio, self-hosted/gpu/spirv_lower.siosm_80, ROCm gfx942Kretikos does not claim:
gpu.alloc<T>() or shared-memory abstractions are checked public surfaceThe honest claim is narrower and therefore stronger: there is a GPU compiler lane here, and the current Kretikos CLI artifact bundle is a predefined-template surface, not arbitrary user-kernel lowering. Toolchain and runtime validation apply only to those selected templates; they do not validate arbitrary user-written GPU kernels.