Tagged 2026-05-15 (commit 655aebfd). 37 commits since v0.3.0. The release that closes the tokenizer-primitive stack and lands the first slice of sum types.

Shipped

byte_at(text, index) native

  • What — reads one byte at zero-based offset, returns a number 0–256, fail-closed bounds.
  • Why — a scanner reads character-by-character to find token boundaries. Without this, no in-language tokenizer.
  • Choice — native ELF emit, not interpreter-only. The interpreter path proves the semantic; the native path means real binaries can tokenize without a runtime.

fold_bytes(text, init, acc, byte, idx => body) native

  • What — byte-level iteration primitive. The body sees three bindings: acc (accumulator), byte (0–256), idx (zero-based position). Both interpreter and native backends.
  • Why — variable-length scanning (find first digit, skip whitespace, classify byte) is the workhorse pattern in a tokenizer. Needed a real iteration primitive, not unrolled patterns.
  • Choice — fold-style instead of explicit loops. The shape composes with the verifier’s existing termination guarantee — every fold terminates in length(text) steps, statically bounded.

Short-circuit and / or native

  • Whatlength(s) > N and byte_at(s, N) == 45 no longer evaluates both sides eagerly. Left first, test rax, rax, jz/jnz .skip_right, then right only if needed.
  • Why — guard-then-probe is the canonical scanning pattern. With eager and, the guard couldn’t actually protect the probe — abort on too-short input even when the guard would have caught it.
  • Choice — emit at the AST → native level, not as a peephole later. Normalized 0/1 bools mean the skip path is already correct without extra masking.

Phase A slice 1 — sum-type concept declarations

  • What — concepts can now declare variants: instead of fields:. Payloaded variants (Int of (value : number)) or unit (Eof). Mutually exclusive with fields: at parse time.

    concept TokenKind
      @intention: "discriminated union of token types"
      variants:
        Ident of (name : text)
        Int of (value : number)
        Op of (sym : text)
        Eof
  • Why — a tokenizer’s output type is a discriminated union by construction. Without sum types, you encode them as flat records with a kind string field — the type system stops helping you.

  • Choice — declarations only in v0.4.0 (parse + verify). Construction (A.2) and pattern match (A.3) come post-tag, because the type-system surface is bigger than the runtime surface and ships separately.

Phase 2G — inline calls in text primitives

  • What — every text-consuming primitive accepts a rule Call directly: length(Call), starts_with(Call, needle), contains(Call, needle), json_escape(Call), parse_int(Call).
  • Why — without inline, every composed call needed a let binding. Verbose programs end up writing the same plumbing repeatedly.
  • Choice — inline at the call site, no stack frame, no let workaround. Zero-alloc in every case.

Maintenance

Other landings without a primary surface change:

  • Result(Record, text) native — Ok arm emits a JSON record, Err arm writes to stderr + exit 1; if/else inside Ok arm supported
  • Multiline if/then/else parser — NEWLINE legal between structural keywords, INDENT/DEDENT depth-balanced inside branch bodies
  • Concat-literal optimizer fold — all-literal concat(...) collapses to a single text at compile time, cascades through length(concat(...)) and json_escape(concat(...))
  • Transitive resource reads through match_result chains — verifier + native both extended
  • cidx CI/CD adopted — four phases (security + code + test + build), container-based rust:alpine
  • Generator: Claude Sonnet at 10/10 first-try on hold-out (was 9/10 at v0.3.0)
  • Two composite demos: order_intake.verbose (143 lines, 2674B ELF) and policy_proxy.verbose (80 lines, 2519B ELF)

Tests: 244 → 310 (+66 in 37 commits).

What we learned

Two lessons from the tokenizer work.

The primitive stack is verified by composition examples, not by adding more primitives. iso_date.verbose (static-position ISO 8601 parser, ~2083B ELF) and byte_classifier.verbose (three fold_bytes rules classifying every byte, each < 800B) are the integration proofs that substring + byte_at + fold_bytes + short-circuit and/or compose into a real tokenizer. Without those examples, the primitive set looked complete on paper and could still have missed a gap — they didn’t.

Short-circuit semantics aren’t a “nice-to-have” — they’re load-bearing for the iteration pattern. Before short-circuit landed, every guard-then-probe fold_bytes rule had to be rewritten with explicit bounds checks inside the body, defeating the point of having a guard. The fix isn’t ergonomic; it’s correctness. Without it, the entire scanning pattern is unsafe.

Open

Phase A slices 2 and 3 already landed post-tag (PRs #27 + #28): variant construction and pattern match are now fully operational at the interpreter layer. Native emit for sum types (A.4) is the active frontier — slices 4.1 (scalar) and 4.2 (text) shipped 2026-05-17.

The recursive-types design doc (docs/recursive-types-design.md, also in v0.4.0) lays out the rest: Phases A–C for language work (9–12 months), Phase D for the self-hosting campaign (6–12 months more). v0.4.0 is the first concrete commit on that path.

How to try

git clone https://github.com/verbose-org/verbose
cd verbose
cargo run -- examples/iso_date.verbose --native /tmp/iso_date --run iso_date --stream
echo "2026-05-15" | /tmp/iso_date

Or download the prebuilt binary from the v0.4.0 release page.