Protocol

Transformations Vocabulary Conventions

Authoring conventions for the kanonak.org/transformations vocabulary. Complements the foundational rules in kanonak.org/kanonak-protocol@2.0.0 by capturing transformation- specific guidance: when to use SetTransformation vs InstanceTransformation, comparison and boolean primitive idioms, context-aware iteration patterns, content-shaping primitives, and when to opt out of the CLI's default HTML wrapper. These conventions evolve at the cadence of the transformations vocabulary itself; bumps here track new transformation primitives rather than changes to the foundational protocol.

Protocol

Conventions

transformation-modeling

How to author Transformation instances that turn Kanonak data into other formats

Has Recommended Rule#
	Text	Rationale
#	Use SetTransformation when the desired output reads from MORE than one input instance, even if you currently have only one input. Use InstanceTransformation only when the output is genuinely independent per input.	The cardinality of the input binding is the substantive difference between the two subclasses, and authors who pick InstanceTransformation for a list-like task end up emitting one tiny artifact per input with no aggregate view. SetTransformation aggregates by default and falls back to per-key fan-out via partitionBy when the output cardinality should match a discriminator. Picking the right cardinality up-front avoids needing to rewrite the rule tree later.
#	When iteration order matters (time-series, indexes, anything a reader will scan top-to-bottom), declare InputPattern.sortBy explicitly with one or more SortKey entries.	Without sortBy, the runner binds inputs in the SDK's discovery order — deterministic but not semantically meaningful. Authors who depend on implicit order produce artifacts whose sequence shifts when the workspace's filesystem layout changes. Declaring sortBy makes the contract explicit and validatable; the byProperty reference ensures the sort key is a real Property the validator can check.
#	When a SetTransformation should produce one artifact per distinct value of a property (e.g. one page per author, one chart per region), declare SetTransformation.partitionBy referencing that property.	Partitioning at the transformation level is cheaper and clearer than authoring N nearly-identical InstanceTransformations or using PartitionBy expressions to emit M sections inside one artifact. The structural partitionBy keeps each output focused on one group's data while sharing one rule definition.
#	Inside an Expression rule tree, every embedded Expression node SHOULD declare its concrete subclass via type — type tx.Concat, type tx.PropertyRead, type tx.BuildAstNode, etc. Keep AstFieldBinding and other range-matching embeddeds implicit (no type) per the embedding convention's prefer-implicit-embedded-type-rule.	The Expression hierarchy is a wide tagged-by-type tree where every node could be one of many concrete subclasses. Declaring the type at each embedded position makes the rule tree self-documenting and lets the validator catch mistakes (a mistyped property name on a wrong subclass becomes a domain violation). For embeddeds whose parent property has a concrete range (AstFieldBinding under BuildAstNode.set, FormatOverride under formatOverrides), explicit type is redundant — the range supplies it.
#	Use tx.IsSet over a tx.PropertyRead to gate rendering on whether an optional property is populated, regardless of whether the value is a primitive scalar, a reference, an embedded subject, or a non-empty list. IsSet reports presence; it doesn't constrain what kind of value qualifies.	The "render this block iff the optional sub-tree is populated" pattern is common when an ObjectProperty is optional and the artifact should adapt to whether authors supplied a value. Restricting IsSet to primitive-only would force authors into workarounds (always-render-with-empty-content, list-length shims) that produce worse output than honest presence checks. The runtime contract: IsSet returns false for undefined/null, empty string, or empty list, and true otherwise.
#	Inside a SetTransformation iteration, use the URI-segment primitives — tx.SubjectUri (full URI), tx.UriPublisher, tx.UriPackage, tx.UriVersion, tx.UriName (local name) — to derive per-input deep links, source-package paths, cross-package indexes, or per-segment display chips from the matched subjects. The five together cover every segment of the canonical Kanonak URI shape.	Aggregate transformations are valuable when each output entry can correlate back to its source input. Without primitives for URI-segment access from inside the rule tree, authors either parse the rendered output client-side or duplicate identity into ad-hoc instance properties that drift from the URI. The five URI primitives put the canonical attribution inside the typed transformation, where it can be validated alongside everything else and stays in sync with whatever version / publisher / package the matched subject is actually authored under.
#	When a SetTransformation should aggregate matched instances across multiple packages — index pages, sitemaps, federation reports, time-series views over versioned snapshot packages — invoke the runner with one --scope flag per package. kanonak transform run accepts the flag repeatedly; matched subject sets are unioned across all scopes (deduped by URI) before binding to the rule's `inputs`.	Single-scope semantics restrict candidates to subjects defined in the scoped package, which blocks the canonical aggregate use case (one artifact summarizing instances spread across many packages). Repeatable --scope keeps the pattern explicit and validatable per-scope without changing the meaning of "scope" itself — each scope still resolves through the workspace + cache + HTTP tiers exactly as a single scope does, and a single --scope invocation behaves unchanged.

Has Forbidden Rule#
	Text	Rationale
#	An InstanceTransformation with zero matching inputs MUST be treated as an authoring error by the runner. If the desired behavior is "emit something even when there are no inputs," the transformation MUST be a SetTransformation.	The two subclasses differ on this exact contract. Emitting zero artifacts from an InstanceTransformation when zero inputs match is what the author asked for (one artifact per input, none present). Emitting one artifact from a SetTransformation when zero inputs match is also what the author asked for (one artifact summarizing the set, even when empty). Conflating the two means a SetTransformation case ("we want an index page even on day zero") accidentally lands as an InstanceTransformation and never produces the index. Pick the subclass that matches the cardinality semantics.
#	Transformation.outputs MUST reference named OutputFormat instances (tx.html, tx.markdown-with-frontmatter, tx.json, etc.), not string identifiers.	OutputFormat references are validatable — the runner resolves the URI and looks up the registered backend by the format's backendUri. Authoring outputs as plain strings would silently accept typos, drift from the actual backend registry, and require every consumer to maintain its own string-to-backend mapping. The OutputFormat indirection is the contract the runner depends on.
#	Sum, Min, Max, and Average MUST receive a list source whose elements resolve to xsd:integer or xsd:decimal values. Source expressions that yield strings, references, or embedded objects are an authoring error.	The runner enforces strict numeric typing at evaluation time — non-numeric elements raise an error rather than coerce silently. This catches mistakes where an author intended to sum a numeric property but wrote a property whose range is xsd.string, or forgot to wrap a list of subjects in a list-map that extracts the numeric field. Min/Max/Average on an empty list also error, since the contract is undefined; guard with IsSet when an empty input is possible.

per-instance-transformation

Per-instance transformation — one Skill input produces one SKILL.md artifact. The rule sees input (singular) bound to the matching Skill subject; artifactName carries the per- instance filename stem. The full rule body would continue populating the Document via tx.set bindings; the example shows the structural skeleton.

Value: skill-to-skill-md-transformation: type: tx.InstanceTransformation tx.inputPattern: tx.matchesClass: skills.Skill tx.requires: - skills.name - skills.description tx.artifactName: type: tx.PropertyRead tx.readSource: type: tx.VarRef tx.varName: input tx.readProp: skills.name tx.outputs: - tx.markdown-with-frontmatter tx.rule: type: tx.BuildAstNode tx.astClass: docast.Document

aggregate-set-transformation-with-sort

Set transformation with sort — aggregates every WorldviewSnapshot into a single index page. inputs is bound to the sorted list (newest first); the rule typically wraps it in a ForEach to render one row per snapshot. The full rule body would continue with tx.set bindings; the example shows the input-pattern and sort declaration.

Value: snapshot-index: type: tx.SetTransformation tx.inputPattern: tx.matchesClass: wv.WorldviewSnapshot tx.sortBy: - tx.byProperty: wv.observedAt tx.order: tx.descending tx.artifactName: type: tx.StringLiteral tx.stringLiteral: index tx.outputs: [tx.html] tx.rule: type: tx.BuildAstNode tx.astClass: docast.Document

partitioned-set-transformation

Set transformation with fan-out — one artifact per distinct value of partitionBy. inputs is bound to that partition's members; key is bound to the partition value (the thesis name). artifactName uses tx.VarRef with varName=key to interpolate the partition value into the filename.

Value: per-thesis-trajectory: type: tx.SetTransformation tx.inputPattern: tx.matchesClass: wv.WorldviewSnapshot tx.partitionBy: wv.thesisName tx.artifactName: type: tx.Concat tx.parts: - type: tx.StringLiteral tx.stringLiteral: "thesis-" - type: tx.VarRef tx.varName: key tx.outputs: [tx.html] tx.rule: type: tx.BuildAstNode tx.astClass: docast.Document

cross-package-aggregation-invocation

Cross-package aggregation invocation. Runs the aggregate-set-transformation-with-sort transformation from above against two scoped packages; the matched-subject set in inputs includes WorldviewSnapshot instances from both scopes (deduped by URI). Authors typically pair this invocation shape with tx.SubjectUri inside the rule to derive per-input deep links back to each snapshot's source package.

Value: kanonak transform run snapshot-index \ --scope worldview.genval.ai/example-ai-capex \ --scope worldview.genval.ai/example-macro \ --format html \ --out _site/

version-bump-as-time-axis-aggregation

Cross-version aggregation. The version-bump-as-time-axis pattern: a publisher bumps the same package's version each time the underlying observation changes, so each version is one immutable point on a timeline. Three URIs (worldview.genval.ai/snapshot@1.0.0/view, @1.0.1/view, @1.0.2/view) are three distinct subjects because the version is part of identity per the URI structure convention. The runner's multi-scope dedup keys on the full URI including version, so the timeline transformation sees one entry per version. tx.SubjectUri inside the rule emits version-qualified URIs for stable deep links.

Value: kanonak transform run snapshot-timeline \ --scope worldview.genval.ai/snapshot@1.0.0 \ --scope worldview.genval.ai/snapshot@1.0.1 \ --scope worldview.genval.ai/snapshot@1.0.2 \ --format html --out _site/timeline/

per-version-chip-via-uri-version

Per-segment URI access via tx.UriVersion. Renders each timeline card with its package version as a display chip, extracting the version segment directly from the matched subject's URI — no need to author the version as a separate property on every snapshot, no need to parse the rendered HTML client-side. Companion primitives — tx.UriPublisher, tx.UriPackage, tx.UriName — cover the other URI segments and follow the same shape. Example shown is the snippet inside a ForEach over inputs that binds snapshot to each input.

Value: - type: tx.StringLiteral tx.stringLiteral: '<span class="snapshot-version">v' - type: tx.UriVersion tx.uriVersionOf: type: tx.VarRef tx.varName: snapshot - type: tx.StringLiteral tx.stringLiteral: '</span>'

instance-transformation-when-set-was-meant

Wrong subclass — emits one Document per snapshot rather than one combined index. The author wanted aggregation but chose the per-input subclass; the type should be tx.SetTransformation with the rule iterating inputs rather than reading input.

Value: broken-index: type: tx.InstanceTransformation tx.inputPattern: tx.matchesClass: wv.WorldviewSnapshot tx.rule: type: tx.BuildAstNode tx.astClass: docast.Document

magic-string-output

Wrong — outputs must be a reference to a named OutputFormat instance, not a plain string. The validator rejects the string form because it cannot verify a backend exists for it. Use tx.markdown-with-frontmatter (the named instance) instead.

Value: broken-output: type: tx.InstanceTransformation tx.outputs: - markdown-with-frontmatter

cli-default-wrapper-opt-out

Transformations that build a complete output document themselves (their own DOCTYPE, their own CSS, their own page chrome) can declare tx.omitWrapper: true on the relevant tx.FormatOverride to suppress the CLI's default chrome. The HTML backend then emits ONLY the rendered children. JSON and plain-Markdown backends ignore the flag because they have no chrome.

Has Recommended Rule#
	Text	Rationale
#	When a transformation produces a complete HTML document (publisher layout transformations that inline their own CSS, nav, header, footer), declare tx.formatOverrides with a tx.FormatOverride that targets tx.html and sets tx.omitWrapper to true so the CLI's default chrome doesn't double-wrap the output.	The CLI's default HTML wrapper was designed for the "transformation produces a fragment" case. For publisher layouts that produce the complete page, it was actively hostile — authors had to override its CSS with !important and hide its dl.metadata block with display:none. omitWrapper makes the opt-out explicit and idiomatic.

content-shaping-primitives

transformations@3.5.0 ships expression primitives for the content-shaping work that previously forced inline JS into rendered HTML: DateFormat (ISO datetime → human-readable), Add/Subtract/Multiply/Divide (arithmetic for SVG coordinates, evidence counts, sized text positions), and Reverse (newest- first orderings of source-chronological lists).

Has Recommended Rule#
	Text	Rationale
#	When a rendered artifact needs computed content (formatted dates, computed positions, reversed orderings), prefer the matching tx 3.5.0 expression primitive over emitting inline JS to do the work in the consumer's browser.	Inline JS is consumer-side work that locks every consumer of the artifact into a JS-capable runtime, and lives in a place no other tool can discover or reason about. Engine primitives produce the same value at transformation time and ship JS-free artifacts consumable by anything that can read HTML or Markdown. The transformation rule becomes the single source of truth for the computation.

context-aware-iteration

transformations@3.6.0 adds the iterator family that gives bodies access to neighboring elements (PairwiseMap, WindowedMap) or a running accumulator (Scan), with ListItemAt as the indexing helper. Closes the structural gap that previously forced inline JS for any analytical view that needs cross-element context — pairwise diffs, running aggregations, sliding statistics, gap rendering between consecutive events, state-change detection over an audit trail.

Has Recommended Rule#
	Text	Rationale
#	For "render a result for each consecutive pair in a sorted sequence" patterns (changelogs, diff views, day-over-day comparisons, state-transition rendering), use tx.PairwiseMap with named firstVar / secondVar binders rather than emitting raw per-element data and computing pair-context in inline JS. The body composes with the existing iterator family — tx.Filter for set-difference, tx.Subtract for delta computation, tx.Concat for diff-card rendering.	The per-consecutive-pair shape is the dominant context- aware iteration case across analytical Kanonak content. Named binders read cleaner than positional indexing (prev and curr vs. window[0] and window[1]), and the pair-body composition with the existing iterator family covers most realistic analytical needs. Inline JS for this pattern moves analytical logic out of the protocol layer where any consumer can inspect it and into a presentation layer that no other tool can reason about.
#	For windows of size 3 or larger (rolling-3 averages, running statistics, k-element sliding aggregates), use tx.WindowedMap with explicit windowSize plus tx.ListItemAt for positional access into the bound window list.	Rolling-window statistics (smoothing, anomaly detection, multi-step trend analysis) genuinely need access to k elements at once. WindowedMap covers any k; ListItemAt provides the positional access. Stride is fixed at 1 — the dedicated tumbling and anchored window modes are deferred (would ship as tx.windowMode: tumbling or tx.windowMode: anchored in a future minor bump).
#	For cumulative aggregations whose value at each position depends on all prior elements (running sum, cumulative confidence shift, rolling state machine, running min/max), use tx.Scan with explicit tx.initialState, tx.stateVar, tx.elementVar, and tx.accumulate body. Distinct from windowing — Scan sees a running aggregation, WindowedMap sees a window of elements.	Running aggregations can't be expressed via windowing because the value at position N depends on EVERY prior element, not a fixed window. Scan threads the accumulator through; the body returns both the next accumulator AND the per-position emission. Distinct shape; distinct primitive.
#	All three primitives emit an empty result when the source is shorter than required (PairwiseMap on length 0 or 1; WindowedMap on length less than windowSize; Scan on length 0). This matches tx.ForEach over an empty source — silent empty, not an error. Authors who need "no windows" to branch should guard with tx.IsSet over the result.	Aggregate views over publishers with one snapshot (pre-Pairwise era) would error otherwise; silent empty is the more useful default. The no-mocks/no-fallbacks rule still applies — what's NOT silent is that the engine produces ZERO outputs, which is visible to the consumer.

comparison-and-boolean-primitives

transformations@3.7.0 ships the eleven primitives that unblock comparison-based rendering on top of windowed iteration: Equals, four ordering comparisons, Not, And/Or, Contains (list membership), and Abs/Negate. Previously, the only boolean-producing surface for tx.Filter.predicate and tx.When.condition was tx.IsSet and tx.BooleanLiteral, which couldn't express any of the comparisons a diff / threshold / lookup workflow needs.

Has Recommended Rule#
	Text	Rationale
#	Use tx.Equals (with compareLeft / compareRight) for any equality test. URI equality applies to ReferenceKanonak and SubjectKanonak operands (compares publisher + package + name; version ignored — equality is identity- by-name across versions). Value equality applies to string / number / boolean literals. Cross-kind operands return false; either operand undefined returns false.	URI equality is what authors expect when comparing references — two refs to "thesis foo" are equal even if they came from different snapshot versions. Cross-kind false (rather than throwing) keeps Equals usable inside Filter.predicate without needing type guards; authors can chain Equals with Not/And/Or without worrying about runtime exceptions on heterogeneous data. Undefined → false is the RDF spec semantic: missing values aren't equal to anything, even themselves.
#	For "is this needle in this haystack list" lookups, use tx.Contains (with haystack + needle) rather than the equivalent tx.Filter with an Equals predicate followed by tx.Count > 0. Contains is named exactly because it's the right tool for that shape; the longer composition obscures intent.	Authors reach for membership constantly in diff / lookup workflows ("is this thesis in the previous snapshot's hasThesis list?"). Filter+Count works but wastes three nodes on a one-line idea. Contains is ergonomic sugar over the same semantic.
#	tx.And and tx.Or short-circuit on the first non-true / first true operand respectively. Place expensive operands (Traverse, ResolveRef, Filter over a large list) AFTER cheap ones (IsSet, BooleanLiteral) so the short-circuit can skip them when not needed.	Short-circuiting matters when later operands trigger cross-package reference resolution or large list filtering. The order of operands is the author's optimization handle.
#	tx.And over an empty operand list is vacuously true; tx.Or over an empty operand list is vacuously false. Mathematical convention. Authors who iterate to build an operand list (rare) should guard with tx.IsSet on the result if "no operands" should branch.	The vacuous identities make And/Or compose cleanly with iterators that may or may not produce operands. Erroring on empty would break those compositions without offering authors a clear path forward.
#	For magnitude-vs-threshold patterns (e.g. abs(delta) >= 0.005), use tx.Abs rather than the tx.Multiply(x, x) >= threshold * threshold workaround. The intent reads at the call site.	Multiply(x, x) >= threshold * threshold is correct but obscures intent — the reader has to spot the squared- magnitude trick. Abs makes the comparison shape obvious.