Lexer & Parser
How `.sio` source becomes an AST: Logos tokens, spans, and the recursive descent parser.
Lexer & Parser
The lexer and parser are no longer something you should primarily explain through old Rust crate paths. The current implementation story is in the self-hosted tree, with thin compiler-level wrappers and many small files that split the syntax work into focused modules.
Current source map
self-hosted/lexer/contains the tokenization stack:cursor.sio,reader.sio,tables.sio,token.sio,numparse.sio, and related support files.self-hosted/parser/contains syntax-layer modules such asast.sio,exprs.sio,stmts.sio,patterns.sio,items.sio,types.sio, and recovery support.self-hosted/compiler/lexer.sioandself-hosted/compiler/parser.sioprovide a higher-level compiler-facing entry point into those lower-level modules.
What to document as stable
- The repo actively maintains a real lexer and parser in Sounio itself.
- The tree structure shows deliberate separation between tokenization, AST building, statement parsing, and pattern parsing.
- The safest syntax claims are still the ones backed by current fixtures and by direct
souc checkvalidation.
Useful implementation landmarks
self-hosted/lexer/token.sio
self-hosted/parser/ast.sio
self-hosted/parser/exprs.sio
self-hosted/parser/items.sio
self-hosted/compiler/parser.sio
Documentation guidance
- Do not anchor public compiler docs to
crates/souc/src/lexerorparseras the primary current explanation; those paths no longer describe the active tree accurately. - When explaining grammar, use small checked examples and then point curious contributors to the self-hosted parser modules.
- Treat recovery behavior and edge-case parsing claims conservatively unless you have current tests for them.