Tagged 2026-05-15 (commit 655aebfd). 37 commits since v0.3.0. The release that closes the tokenizer-primitive stack and lands the first slice of sum types.
Shipped
byte_at(text, index) native
- What — reads one byte at zero-based offset, returns a number 0–256, fail-closed bounds.
- Why — a scanner reads character-by-character to find token boundaries. Without this, no in-language tokenizer.
- Choice — native ELF emit, not interpreter-only. The interpreter path proves the semantic; the native path means real binaries can tokenize without a runtime.
fold_bytes(text, init, acc, byte, idx => body) native
- What — byte-level iteration primitive. The body sees three bindings:
acc(accumulator),byte(0–256),idx(zero-based position). Both interpreter and native backends. - Why — variable-length scanning (find first digit, skip whitespace, classify byte) is the workhorse pattern in a tokenizer. Needed a real iteration primitive, not unrolled patterns.
- Choice — fold-style instead of explicit loops. The shape composes with the verifier’s existing termination guarantee — every fold terminates in
length(text)steps, statically bounded.
Short-circuit and / or native
- What —
length(s) > N and byte_at(s, N) == 45no longer evaluates both sides eagerly. Left first,test rax, rax,jz/jnz .skip_right, then right only if needed. - Why — guard-then-probe is the canonical scanning pattern. With eager
and, the guard couldn’t actually protect the probe — abort on too-short input even when the guard would have caught it. - Choice — emit at the AST → native level, not as a peephole later. Normalized 0/1 bools mean the skip path is already correct without extra masking.
Phase A slice 1 — sum-type concept declarations
-
What — concepts can now declare
variants:instead offields:. Payloaded variants (Int of (value : number)) or unit (Eof). Mutually exclusive withfields:at parse time.concept TokenKind @intention: "discriminated union of token types" variants: Ident of (name : text) Int of (value : number) Op of (sym : text) Eof -
Why — a tokenizer’s output type is a discriminated union by construction. Without sum types, you encode them as flat records with a
kindstring field — the type system stops helping you. -
Choice — declarations only in v0.4.0 (parse + verify). Construction (A.2) and pattern match (A.3) come post-tag, because the type-system surface is bigger than the runtime surface and ships separately.
Phase 2G — inline calls in text primitives
- What — every text-consuming primitive accepts a rule
Calldirectly:length(Call),starts_with(Call, needle),contains(Call, needle),json_escape(Call),parse_int(Call). - Why — without inline, every composed call needed a
letbinding. Verbose programs end up writing the same plumbing repeatedly. - Choice — inline at the call site, no stack frame, no let workaround. Zero-alloc in every case.
Maintenance
Other landings without a primary surface change:
Result(Record, text)native — Ok arm emits a JSON record, Err arm writes to stderr + exit 1;if/elseinside Ok arm supported- Multiline
if/then/elseparser — NEWLINE legal between structural keywords, INDENT/DEDENT depth-balanced inside branch bodies - Concat-literal optimizer fold — all-literal
concat(...)collapses to a single text at compile time, cascades throughlength(concat(...))andjson_escape(concat(...)) - Transitive resource reads through
match_resultchains — verifier + native both extended - cidx CI/CD adopted — four phases (security + code + test + build), container-based
rust:alpine - Generator: Claude Sonnet at 10/10 first-try on hold-out (was 9/10 at v0.3.0)
- Two composite demos:
order_intake.verbose(143 lines, 2674B ELF) andpolicy_proxy.verbose(80 lines, 2519B ELF)
Tests: 244 → 310 (+66 in 37 commits).
What we learned
Two lessons from the tokenizer work.
The primitive stack is verified by composition examples, not by adding more primitives. iso_date.verbose (static-position ISO 8601 parser, ~2083B ELF) and byte_classifier.verbose (three fold_bytes rules classifying every byte, each < 800B) are the integration proofs that substring + byte_at + fold_bytes + short-circuit and/or compose into a real tokenizer. Without those examples, the primitive set looked complete on paper and could still have missed a gap — they didn’t.
Short-circuit semantics aren’t a “nice-to-have” — they’re load-bearing for the iteration pattern. Before short-circuit landed, every guard-then-probe fold_bytes rule had to be rewritten with explicit bounds checks inside the body, defeating the point of having a guard. The fix isn’t ergonomic; it’s correctness. Without it, the entire scanning pattern is unsafe.
Open
Phase A slices 2 and 3 already landed post-tag (PRs #27 + #28): variant construction and pattern match are now fully operational at the interpreter layer. Native emit for sum types (A.4) is the active frontier — slices 4.1 (scalar) and 4.2 (text) shipped 2026-05-17.
The recursive-types design doc (docs/recursive-types-design.md, also in v0.4.0) lays out the rest: Phases A–C for language work (9–12 months), Phase D for the self-hosting campaign (6–12 months more). v0.4.0 is the first concrete commit on that path.
How to try
git clone https://github.com/verbose-org/verbose
cd verbose
cargo run -- examples/iso_date.verbose --native /tmp/iso_date --run iso_date --stream
echo "2026-05-15" | /tmp/iso_date
Or download the prebuilt binary from the v0.4.0 release page.