Lexer & Parser

How `.sio` source becomes an AST: Logos tokens, spans, and the recursive descent parser.

Lexer & Parser

The lexer and parser are no longer something you should primarily explain through old Rust crate paths. The current implementation story is in the self-hosted tree, with thin compiler-level wrappers and many small files that split the syntax work into focused modules.

Current source map

  • self-hosted/lexer/ contains the tokenization stack: cursor.sio, reader.sio, tables.sio, token.sio, numparse.sio, and related support files.
  • self-hosted/parser/ contains syntax-layer modules such as ast.sio, exprs.sio, stmts.sio, patterns.sio, items.sio, types.sio, and recovery support.
  • self-hosted/compiler/lexer.sio and self-hosted/compiler/parser.sio provide a higher-level compiler-facing entry point into those lower-level modules.

What to document as stable

  • The repo actively maintains a real lexer and parser in Sounio itself.
  • The tree structure shows deliberate separation between tokenization, AST building, statement parsing, and pattern parsing.
  • The safest syntax claims are still the ones backed by current fixtures and by direct souc check validation.

Useful implementation landmarks

self-hosted/lexer/token.sio
self-hosted/parser/ast.sio
self-hosted/parser/exprs.sio
self-hosted/parser/items.sio
self-hosted/compiler/parser.sio

Documentation guidance

  • Do not anchor public compiler docs to crates/souc/src/lexer or parser as the primary current explanation; those paths no longer describe the active tree accurately.
  • When explaining grammar, use small checked examples and then point curious contributors to the self-hosted parser modules.
  • Treat recovery behavior and edge-case parsing claims conservatively unless you have current tests for them.