GPU Computing with Sounio
Executive reading
This route is not here to imply that Sounio already owns every GPU backend or every runtime lane. It is here to make a much tighter argument:
- there is a real, checked GPU surface in the public repo
- that surface emits PTX today
- the artifact trail is inspectable by a skeptical reader
- the language is trying to be honest about where acceleration is real and where the boundary still is
That matters because most language websites wave at GPU ambition. They rarely show the exact commands, the exact artifact, and the exact boundary in the same place.
What this route is trying to prove
Sounio should be able to accelerate a lane without turning into mythology.
The claim is not “all GPU programming is solved.” The claim is that a language serious about scientific software must be able to expose a bounded acceleration surface and keep its story coherent under scrutiny:
- source-level kernels should be legible
- emitted PTX should exist as an artifact, not as a promise
- runtime pressure should reveal bugs early instead of being hidden behind aspirational syntax
- public claims should stop exactly where the current contract stops
This is why the GPU route belongs on the storefront. It is one of the clearest places where Sounio either proves it can survive contact with hard systems work, or fails publicly.
A skeptical reader should ask
- Does the public surface compile today, or is it just syntax?
- Is there emitted PTX I can inspect, or only backend source code in a branch?
- Are the runtime claims backed by attested lanes and real artifacts?
- Does the language admit what is still experimental?
If the answer to those questions is weak, this route should not be on the homepage at all.
Checked public syntax
kernel fn vector_add(n: i64) with GPU {
}
kernel fn scale_vector(factor: f64, n: i64) with GPU, Div {
}
fn main() with GPU, IO {
let grid = (16, 1, 1)
let block = (64, 1, 1)
perform GPU.launch(vector_add, grid, block)(1024)
perform GPU.launch(scale_vector, grid, block)(2.0, 1024)
perform GPU.sync()
}
This surface is intentionally modest. The point is to show a lane that can be named, checked, lowered, and inspected, not to pretend the language has already collapsed every GPU concern into one elegant abstraction.
What is genuinely verified today
export SOUC_GPU_BIN="$(pwd)/artifacts/omega/souc-bin/souc-linux-x86_64-gpu"
export SOUNIO_STDLIB_PATH="$(pwd)/stdlib"
"$SOUC_GPU_BIN" check examples/gpu.sio
"$SOUC_GPU_BIN" build examples/kernel_matmul.sio --backend gpu -o /tmp/kernel_matmul.ptx
That is the standard of proof this route should live under: commands you can run, artifacts you can read, and a bounded story about what the result means.
Why this matters to the language
GPU work is where a lot of language marketing goes to die.
It is easy to sketch intrinsics, mention tensor cores, or promise multiple backends. It is much harder to keep a self-hosted compiler, a runtime path, a public command surface, and a storefront narrative aligned while real bugs are being found in stack alignment, launch parameter layout, PTX module loading, or runtime dispatch.
That is exactly why this route is useful. It demonstrates pressure. The GPU surface is valuable not only because it accelerates compute, but because it forces the language to reveal whether its honesty survives code generation and runtime contact.
Support tier
- attested compute evidence today: CUDA
cuda-sm80, ROCmrocm-gfx942 - public emission path today:
build --backend gpu - larger source-tree GPU work exists under
self-hosted/gpu/, including Metal, SPIR-V, tensor, and runtime layers - older
gpu.*intrinsic sketches are still implementation work, not the checked public surface
Where the boundary still is
This route does not claim:
- that every GPU backend is production-complete
- that every runtime path is equally mature
- that the public kernel surface is broader than the currently checked contract
- that “GPU support” should be read as a blanket maturity claim for the whole backend tree
The honest claim is narrower and therefore stronger: there is a real acceleration surface here, and it is bounded by artifacts instead of wishful copy.