Protocol

Kanonak Protocol

The Kanonak Protocol - an open protocol for defining, versioning, and sharing semantic ontologies across distributed publishers

Conventions

#

uri-structure

Kanonak URIs uniquely identify entities using the format publisher/package@version/name

Has Required Rule#
TextRationale
#Kanonak URIs MUST follow the format publisher/package@version/name where all components are requiredStandardized URI format ensures global uniqueness and enables automatic resolution across namespaces and package registries
#Each Kanonak URI MUST uniquely identify exactly one entity across the entire Kanonak ecosystemGuarantees unambiguous entity references and prevents naming conflicts through publisher namespacing
Has Valid Example#
ValueDescription
#kanonak.org/core-rdf@1.0.0/ClassValid Kanonak URI referencing the Class entity from the core-rdf package
Has Invalid Example#
ValueDescription
#mypackage/EntityInvalid - missing publisher domain and version components
#

publisher-naming

Publishers must be domain-based identifiers to establish ownership and enable registry discovery

Has Required Rule#
TextRationale
#Publisher identifiers MUST be valid domain names containing at least one dot characterDomain-based publishers establish clear organizational ownership, enable automatic registry discovery via .well-known endpoints, and prevent naming conflicts through DNS
#Publishers MUST control the domain name used in their publisher identifierEnsures authenticity and prevents namespace squatting by requiring verifiable domain ownership
Has Valid Example#
ValueDescription
#kanonak.orgOfficial Kanonak Protocol publisher using a .org domain
#acme.comCompany publisher using a .com domain
Has Invalid Example#
ValueDescription
#myprojectInvalid - not a domain name. Must use domain-based identifier like myproject.org or myproject.dev
#

package-naming

Package names are lowercase-hyphen identifiers that describe the domain of entities they contain, with plural forms reserved for category packages so the singular form is available for instances inside them

Has Required Rule#
TextRationale
#Package names MUST start with a lowercase letter and contain only lowercase letters, numbers, and hyphens (no periods)Lowercase with hyphens ensures compatibility with file systems, URLs, and OCI registries while maintaining readability. Periods are reserved for alias.resource reference syntax.
Has Recommended Rule#
TextRationale
#Use hyphen notation to create descriptive package names for related ontologiesHyphen notation enables logical grouping of related packages while avoiding conflicts with alias.resource reference syntax
#Packages that define a class or are expected to contain multiple related instances SHOULD use a plural kebab-case noun (protocols, agent-skills, agents, capabilities, github-skills), so the singular form stays available for the instances inside themA category package is a namespace for a family of things; naming it after the family rather than a single member leaves the singular name free to identify one member without a YAML duplicate key collision with the package declaration. This is how the package "protocols" can contain an instance named "mcp" without either the document having a duplicate top-level key or forcing the instance into a non-conforming PascalCase name.
#A package that exists specifically to describe one concrete entity MAY use a singular name that matches the entity it describes (e.g., mcp, a2a, kanonak-protocol), provided the singleton instance inside uses a different kebab-case name so the package declaration and the instance do not collide on the same top-level YAML keyInstance packages are named after the thing they describe, which makes the obvious instance name collide with the package name. Resolving the collision by giving the instance a longer, descriptive kebab-case name (model-context-protocol, agent-to-agent-protocol, kanonak-protocol-spec) keeps the package name short and recognizable while still following the kebab-case-for-instances rule.
Has Forbidden Rule#
TextRationale
#Category packages MUST NOT use a singular name that shadows the class or instance they host - for example, a package named "protocol" that defines the Protocol class, or a package named "skill" that holds a single Skill instanceA singular category package name forces every consuming instance to either rename itself or fight the YAML duplicate-key error. Pluralizing the package eliminates the conflict at the source and makes it obvious from the name that the package is a namespace, not a single thing.
Has Valid Example#
ValueDescription
#protocols, agent-skills, agents, capabilities, skill-capabilities, agent-capabilities, github-skills, github-agentsCategory packages using plural kebab-case names, leaving the singular form free for the classes and instances they host
#mcp (package) + model-context-protocol (instance)A specific-instance package named after its abbreviation, with a longer descriptive kebab-case instance name to avoid the duplicate-key collision
#core-rdfCore RDF vocabulary using hyphen notation
Has Invalid Example#
ValueDescription
#MyPackageInvalid - contains uppercase letters. Must use lowercase only
#protocol (package that defines the Protocol class)Invalid - singular category package shadows the class and forces every consuming instance to work around a name collision
#mcp package + mcp instanceInvalid - the package declaration and the instance both parse as the top-level YAML key "mcp", which is a duplicate key error
#

versioning

Versions follow semantic versioning to communicate compatibility and breaking changes

Has Required Rule#
TextRationale
#Versions MUST follow semantic versioning format major.minor.patch where each component is a non-negative integerSemantic versioning provides a standard way to communicate backward compatibility and breaking changes to package consumers
#Increment major version when making backward-incompatible changes to the packageMajor version increments signal to consumers that manual migration may be required due to breaking changes
#Increment minor version when adding backward-compatible functionality and patch version for backward-compatible bug fixesAllows consumers to safely update within the same major version while preventing unexpected breaking changes
Has Valid Example#
ValueDescription
#2.1.3Valid semantic version with major 2, minor 1, patch 3
Has Invalid Example#
ValueDescription
#v1.0Invalid - missing patch component and has v prefix. Must be 1.0.0
#

import-operators

Import operators, existence, and cycle constraints for package dependencies

Has Required Rule#
TextRationale
#Use exact version operator (=) to lock imports to a specific version for reproducible buildsExact version matching guarantees reproducible builds and prevents unexpected breaking changes from dependency updates
#All declared imports MUST resolve to existing packages in the repository or package cacheMissing imports break transitive resolution and prevent entity lookup. All dependencies must be available either in local workspace or installed package cache.
#Package imports MUST NOT create circular dependency chains between published packagesCircular package dependencies (A imports B, B imports A) create unresolvable import cycles that prevent package managers from building dependency graphs. Kanonak handles cycles gracefully at runtime but published packages cannot have circular dependencies.
Has Recommended Rule#
TextRationale
#Use compatible version operator (~) to allow patch updates within the same minor versionEnables bug fixes and security patches while preventing breaking changes from minor version updates
#Use major version operator (^) to allow minor and patch updates within the same major versionFollows semantic versioning conventions where breaking changes only occur in major version increments
Has Forbidden Rule#
TextRationale
#Avoid any version operator (*) in production packages as it allows all future versions including breaking changesCreates unpredictable behavior and can introduce breaking changes without warning or control
Has Valid Example#
ValueDescription
#core-rdf = 1.0.0Locks to exactly version 1.0.0 of core-rdf
#core-xsd ~ 1.2.3Allows versions 1.2.3 through 1.2.x (patch updates only)
#core-owl ^ 2.0.0Allows versions 2.0.0 through 2.x.x (minor and patch updates within major version 2)
## Package A has no imports (base package) # Package B imports A - one-way dependency One-way dependency chain with no cycles
Has Invalid Example#
ValueDescription
## Package A imports B, Package B imports A # Creates an unresolvable circular dependency A imports B and B imports A - circular dependency
#

version-resolution

Version resolution selects the highest compatible version from available packages

Has Required Rule#
TextRationale
#When resolving an import, Kanonak MUST select the highest version that satisfies the version operator constraintsAutomatic version resolution ensures packages get the latest compatible updates while respecting semantic versioning constraints. This enables bug fixes and patches without manual intervention while preventing breaking changes.
#For major version 0 (pre-1.0), the caret operator (^) MUST behave like tilde (~), allowing only patch updatesSemantic versioning treats 0.x.y as unstable where breaking changes can occur in minor versions. Caret operator for ^0.x.y allows only patches (0.x.z) to prevent unexpected breaking changes during pre-release development.
Has Valid Example#
ValueDescription
#Import: "core-xsd ~ 1.2.3" Available: 1.2.1, 1.2.5, 1.3.0, 2.0.0 Constraints: MinVersion=1.2.3, MaxVersion=1.2.999 Selected: 1.2.5 (highest in compatible range) Version resolution selects 1.2.5 as highest compatible patch version
#

file-naming

Kanonak documents follow a standard file naming pattern for discoverability

Has Required Rule#
TextRationale
#Kanonak document files MUST be named using the pattern package@version.kan.ymlStandardized file naming enables automatic discovery, prevents conflicts, and makes the namespace structure visible in the file system. The .kan.yml extension distinguishes Kanonak files from other YAML files.
Has Recommended Rule#
TextRationale
#Kanonak documents SHOULD be organized in directories matching the publisher nameDirectory structure mirrors namespace organization making it easy to locate packages and understand ownership. Publisher directories prevent naming conflicts and enable per-publisher configuration.
Has Valid Example#
ValueDescription
#kanonak.org/core-rdf@1.0.0.kan.ymlCorrect file naming with publisher directory and version
#mycompany.com/products@2.1.3.kan.ymlCompany package with semantic version
#

yaml-parsing

Kanonak YAML parsing preserves types and detects structural errors

Has Required Rule#
TextRationale
#Kanonak YAML parsers MUST preserve primitive types (integer, boolean, string) from YAML into the Kanonak object modelType preservation ensures that integer values remain integers, booleans remain booleans, and strings remain strings through the parse-serialize round-trip. This maintains semantic meaning and enables accurate code generation.
#Kanonak YAML parsers MUST detect and report duplicate keys with line and column informationDuplicate keys in YAML are ambiguous and indicate authoring errors. Early detection with precise location information helps users quickly fix structural problems before semantic validation.
#Kanonak YAML parsers MUST strip UTF-8 BOM (Byte Order Mark) if present for cross-platform compatibilityWindows editors may add UTF-8 BOM at file start. Stripping BOM ensures files created on Windows parse correctly on Linux/Mac and vice versa, enabling seamless cross-platform collaboration.
Has Valid Example#
ValueDescription
#age: 30 # Parsed as integer (not string "30") isActive: true # Parsed as boolean (not string "true") name: "Alice" # Parsed as string YAML primitive types preserved through parsing
Has Invalid Example#
ValueDescription
#Person: name: "Alice" name: "Bob" # ERROR: Duplicate key 'name' Parser detects duplicate keys and reports error with line number
#

resource-naming

Distinct casing conventions per entity role make role immediately obvious and align with code generation

Has Recommended Rule#
TextRationale
#Kanonak uses three naming conventions based on entity role: classes use PascalCase (Person, OrderStatus), instances use kebab-case (romeo-montague, key-2026-03), and properties use camelCase (subClassOf, hasAddress). This casing rule applies to every instance, including inline dict-keyed embedded instances such as the Convention, Rule, and Example entries inside a Protocol.Distinct casing per role makes it immediately obvious whether an entity is a type definition (PascalCase), a data instance (kebab-case), or a property (camelCase). This improves readability, prevents confusion, and aligns with code generation conventions in target languages. Applying the rule to embedded keys too keeps the rule absolute rather than contextual, so the same entity type always looks the same in YAML no matter how it is authored.
#Inline dict-keyed embedded instances MUST follow the same kebab-case rule as top-level instances. A Convention key inside hasConvention, a Rule key inside hasRequiredRule, an Example key inside hasValidExample, a CapabilityCommand key inside hasCommand, and so on are all instances of their respective classes - they just happen to be authored inline rather than as top-level SubjectKanonaks - and the casing rule applies identically.Embedded instances have the same semantic status as top-level instances - the SDK parses them into EmbeddedKanonak nodes that carry statements just like SubjectKanonaks do. Treating them as a stylistic label rather than an instance muddies the casing rule and leads to inconsistent documents where top-level and embedded instances of the same class look different. Holding the line on kebab-case for all instance keys keeps the authoring surface predictable.
#Resource names SHOULD start with a letter and contain only letters, numbers, hyphens, and underscoresFollowing naming conventions ensures compatibility with code generation targets and URI construction. Names that violate conventions generate warnings to encourage consistency.
Has Forbidden Rule#
TextRationale
#Resource names MUST NOT use reserved RDF/OWL prefixes like 'rdfs:', 'xsd:', 'owl:' - use imports and qualified references insteadPrefixed names (rdfs:Class) are RDF/Turtle syntax, not Kanonak YAML syntax. Kanonak uses imports with aliases for namespace qualification. Prefixes in resource names cause parser errors and violate Kanonak conventions.
Has Valid Example#
ValueDescription
#Person, OrderStatus, SigningKey, BlogPostPascalCase - used for class (type) definitions
#romeo-montague, key-2026-03, commercial-use, alice-johnsonkebab-case - used for instances (data entities), including inline dict-keyed embedded instances
#subClassOf, hasAddress, signingKeyId, characterNamecamelCase - used for property definitions
#hasConvention: uri-structure: summary: ... Inline dict-keyed Convention instance uses kebab-case exactly like a top-level instance would
Has Invalid Example#
ValueDescription
#rdfs-colon-ClassInvalid - uses RDF colon prefix syntax (as in rdfs followed by colon followed by Class). Use plain Class with imports, then reference it as rdfs.Class when disambiguation is needed.
#hasConvention: UriStructure: summary: ... Invalid - embedded Convention instance keys must be kebab-case. Rename to uri-structure.
#

package-structure

Every Kanonak document declares exactly one package with required metadata

Has Required Rule#
TextRationale
#Every Kanonak document MUST contain exactly one Package declaration (a resource with type Package)The Package declaration provides essential metadata (publisher, version) needed for namespace resolution and dependency management. Without it, the document cannot participate in the Kanonak ecosystem.
#Package declarations MUST include 'publisher' and 'version' properties, and import entries MUST have valid publisher, package, match, and version fieldsPublisher and version are required to construct the namespace URI. Import entries must be well-formed to enable dependency resolution.
Has Valid Example#
ValueDescription
#core-rdf: type: Package publisher: kanonak.org version: 1.0.0 Correct package declaration with type, publisher, and version
Has Invalid Example#
ValueDescription
## A document with only entities but no Package declaration Person: type: Class Invalid - no top-level entity with type Package
#

embedding

When to model data as an embedded object versus a named top-level entity

Has Recommended Rule#
TextRationale
#Use embedded objects for data that is tightly coupled to its parent, has no need for independent identity, and is not referenced from anywhere elseEmbedding reduces naming overhead, makes the ownership relationship explicit in the structure, and accurately maps to RDF blank nodes - which represent resources without global identity. Promoting such data to a named top-level entity creates URIs that will never be referenced, cluttering the namespace.
#Use named top-level entities for data that needs a stable identity, is referenced from multiple places, or represents a significant concept with an independent lifecycleNamed entities receive a globally unique URI (publisher/package@version/name) that enables cross-document references, stable citation, and independent versioning. Any data that tooling or other entities need to point to must be named so that the pointer is well-defined.
#An embedded object SHOULD NOT declare an explicit 'type' property when the type would equal the parent property's range - let the range supply the type implicitlyWhen the embedded type matches what the parent property's range already specifies, the explicit type adds nothing beyond visual noise. Omitting it keeps documents shorter, makes the common case the default, and signals that explicit type declarations on embeddeds carry meaning - they only appear when narrowing the range to a more specific subclass. A reader who sees 'type:' on an embedded immediately knows it's communicating a non-default subtype.
Has Forbidden Rule#
TextRationale
#When an embedded object declares an explicit 'type' property, that type MUST be a subclass of (or equal to) the parent property's rangeA declared type that is not a subclass of the parent's range is a contradiction - the parent property says "this slot holds Conditions" and the declared type says "this is not a Condition." Permitting it would silently break range-based reasoning, type checking, and code generation. Restricting declared types to range subclasses preserves the inference model while still allowing tagged-union / discriminated-union authoring with proper OWL subclass semantics (for example, declaring 'type' as 'OrCondition' on an embedded under a property whose range is 'Condition').
Has Valid Example#
ValueDescription
#alice: type: Person hasAddress: street: "123 Main St" city: "Springfield" Address is embedded because it is owned by alice and never referenced elsewhere - becomes an RDF blank node. The embedded inherits its type from the parent property's range, so no 'type:' is needed.
#wrapper: type: Wrapper condition: type: OrCondition operands: - type: ThresholdCondition observable: temperature threshold: 90 - type: EventCondition event: sensor-fault The 'condition' property's range is 'Condition'. Each embedded declares a type that is a strict subclass of 'Condition' (OrCondition, ThresholdCondition, EventCondition all subClassOf Condition), which narrows the range to a specific variant. This is the canonical tagged-union authoring form - explicit types appear precisely where the variant matters.
#acme-headquarters: type: Address street: "1 Acme Plaza" city: "Metropolis" alice: type: Employee worksAt: acme-headquarters bob: type: Employee worksAt: acme-headquarters Headquarters is named because multiple employees reference the same address and the entity has identity beyond any single parent
Has Invalid Example#
ValueDescription
#wrapper: type: Wrapper condition: type: Person # FORBIDDEN - Person is not a subclass of Condition name: Alice The 'condition' property's range is 'Condition', so the embedded's declared type must be Condition or a subclass thereof. 'Person' is unrelated to the Condition hierarchy, so this is a range violation.
#alice: type: Person hasAddress: type: Address # DISCOURAGED - Address equals the parent property's range street: "123 Main St" The 'hasAddress' property's range is already 'Address', so declaring 'type' as 'Address' on the embedded is redundant. Not a hard error - the document is still semantically valid - but a reader is left wondering what the type was meant to communicate. Drop the line so the type is inferred implicitly.
#

type-system

Rules governing how entities declare their type and how properties declare their range

Has Required Rule#
TextRationale
#Top-level entities MUST have a 'type' property declaring what kind of entity they areTop-level entities need explicit type declaration to enable type checking, validation, and code generation. Unlike embedded objects, top-level entities have no parent property whose range can supply a type.
#ObjectProperty and DatatypeProperty declarations MUST declare a 'range' specifying the type of valuesThe range defines what values are valid for a property. Without range, type checking is impossible and the property's purpose is unclear. Range is required for both datatype properties (e.g., string, integer) and object properties (e.g., Person, Address).
#When using XSD datatypes (string, integer, boolean, etc.) as property ranges, the core-xsd package MUST be importedXSD datatypes are defined in the core-xsd package. Without importing it, the type resolver cannot verify that range references point to valid datatypes.
#All classes referenced in 'type' properties MUST be defined in the current namespace or imported packagesUsing an undefined class as a type creates entities with unknown structure. The validator ensures every type reference resolves to an actual Class definition.
Has Recommended Rule#
TextRationale
#Properties SHOULD use specific types (DatatypeProperty or ObjectProperty) instead of generic 'Property'Generic Property type provides no semantic information about whether the property holds literal values or entity references. Using specific types enables better validation, code generation, and tooling support.
Has Valid Example#
ValueDescription
#romeo-montague: type: Character characterName: "Romeo" Top-level entity with explicit type declaration
#name: type: DatatypeProperty domain: Person range: string Property with specific DatatypeProperty type and declared range
Has Invalid Example#
ValueDescription
#romeo-montague: characterName: "Romeo" Invalid - top-level entity has no type property
#name: type: DatatypeProperty domain: Person Invalid - property has no range declaration
#

references

Rules for resolving entity references across namespaces, including shadowing and disambiguation

Has Required Rule#
TextRationale
#ObjectProperty values MUST reference entities that exist in the current namespace or imports (including transitive imports)Broken references create semantic gaps where data points to non-existent entities. This prevents typos, missing imports, and data integrity issues. All references must resolve to actual entities.
#When the same name exists in both the local namespace and an imported namespace, unqualified references MUST resolve to the local definition (local shadowing)Local shadowing allows intentional overriding, enables namespace-specific customizations, and prevents breaking changes when dependencies add new entities with colliding names.
#When multiple imported packages define entities with the same name, references MUST use qualified syntax (alias.name) to disambiguateAmbiguous references make it impossible to determine which entity is intended. Qualified references (pkg.Entity) explicitly specify the source package, preventing confusion and ensuring correct type resolution.
Has Valid Example#
ValueDescription
#verona-city: type: City romeo: type: Character livesIn: verona-city Reference to a locally defined entity
#myEntity: type: pkgA.Resource Qualified reference disambiguates between packages that both define Resource
Has Invalid Example#
ValueDescription
#romeo: type: Character livesIn: verona-city # verona-city is not defined or imported Invalid - reference to an entity that does not exist
#

hierarchy

Class and property hierarchies must be acyclic and reference their own kind

Has Required Rule#
TextRationale
#Class inheritance (subClassOf) MUST NOT create circular chainsCircular inheritance (A subClassOf B subClassOf A) is logically impossible and breaks reasoning systems. Class hierarchies must form directed acyclic graphs where every class has a finite path to the root Resource class.
#The subClassOf property MUST reference an entity that is itself a ClasssubClassOf establishes inheritance between classes. Referencing a non-class entity (like a property or instance) is semantically invalid and breaks type system integrity.
#Property inheritance (subPropertyOf) MUST NOT create circular chainsCircular property inheritance (A subPropertyOf B subPropertyOf A) is logically impossible. Property hierarchies must form directed acyclic graphs.
#The subPropertyOf property MUST reference an entity that is itself a PropertysubPropertyOf establishes inheritance between properties. Referencing a non-property entity (like a Class or instance) is semantically invalid.
Has Valid Example#
ValueDescription
#Person: type: Class Character: type: Class subClassOf: Person Correct linear class hierarchy with no cycles
Has Invalid Example#
ValueDescription
#Character: type: Class subClassOf: Hero Hero: type: Class subClassOf: Character Invalid - Character and Hero reference each other creating a cycle
#name: type: DatatypeProperty Character: type: Class subClassOf: name Invalid - subClassOf must reference a Class, not a Property
#

transformation-modeling

How to author Transformation instances that turn Kanonak data into other formats

Has Recommended Rule#
TextRationale
#Use SetTransformation when the desired output reads from MORE than one input instance, even if you currently have only one input. Use InstanceTransformation only when the output is genuinely independent per input.The cardinality of the input binding is the substantive difference between the two subclasses, and authors who pick InstanceTransformation for a list-like task end up emitting one tiny artifact per input with no aggregate view. SetTransformation aggregates by default and falls back to per-key fan-out via partitionBy when the output cardinality should match a discriminator. Picking the right cardinality up-front avoids needing to rewrite the rule tree later.
#When iteration order matters (time-series, indexes, anything a reader will scan top-to-bottom), declare InputPattern.sortBy explicitly with one or more SortKey entries.Without sortBy, the runner binds inputs in the SDK's discovery order — deterministic but not semantically meaningful. Authors who depend on implicit order produce artifacts whose sequence shifts when the workspace's filesystem layout changes. Declaring sortBy makes the contract explicit and validatable; the byProperty reference ensures the sort key is a real Property the validator can check.
#When a SetTransformation should produce one artifact per distinct value of a property (e.g. one page per author, one chart per region), declare SetTransformation.partitionBy referencing that property.Partitioning at the transformation level is cheaper and clearer than authoring N nearly-identical InstanceTransformations or using PartitionBy expressions to emit M sections inside one artifact. The structural partitionBy keeps each output focused on one group's data while sharing one rule definition.
#Inside an Expression rule tree, every embedded Expression node SHOULD declare its concrete subclass via type — type tx.Concat, type tx.PropertyRead, type tx.BuildAstNode, etc. Keep AstFieldBinding and other range-matching embeddeds implicit (no type) per the embedding convention's prefer-implicit-embedded-type-rule.The Expression hierarchy is a wide tagged-by-type tree where every node could be one of many concrete subclasses. Declaring the type at each embedded position makes the rule tree self-documenting and lets the validator catch mistakes (a mistyped property name on a wrong subclass becomes a domain violation). For embeddeds whose parent property has a concrete range (AstFieldBinding under BuildAstNode.set, FormatOverride under formatOverrides), explicit type is redundant — the range supplies it.
#Use tx.IsSet over a tx.PropertyRead to gate rendering on whether an optional property is populated, regardless of whether the value is a primitive scalar, a reference, an embedded subject, or a non-empty list. IsSet reports presence; it doesn't constrain what kind of value qualifies.The "render this block iff the optional sub-tree is populated" pattern is common when an ObjectProperty is optional and the artifact should adapt to whether authors supplied a value. Restricting IsSet to primitive-only would force authors into workarounds (always-render-with-empty-content, list-length shims) that produce worse output than honest presence checks. The runtime contract: IsSet returns false for undefined/null, empty string, or empty list, and true otherwise.
#Inside a SetTransformation iteration, use the URI-segment primitives — tx.SubjectUri (full URI), tx.UriPublisher, tx.UriPackage, tx.UriVersion, tx.UriName (local name) — to derive per-input deep links, source-package paths, cross-package indexes, or per-segment display chips from the matched subjects. The five together cover every segment of the canonical Kanonak URI shape.Aggregate transformations are valuable when each output entry can correlate back to its source input. Without primitives for URI-segment access from inside the rule tree, authors either parse the rendered output client-side or duplicate identity into ad-hoc instance properties that drift from the URI. The five URI primitives put the canonical attribution inside the typed transformation, where it can be validated alongside everything else and stays in sync with whatever version / publisher / package the matched subject is actually authored under.
#When a SetTransformation should aggregate matched instances across multiple packages — index pages, sitemaps, federation reports, time-series views over versioned snapshot packages — invoke the runner with one --scope flag per package. kanonak transform run accepts the flag repeatedly; matched subject sets are unioned across all scopes (deduped by URI) before binding to the rule's `inputs`.Single-scope semantics restrict candidates to subjects defined in the scoped package, which blocks the canonical aggregate use case (one artifact summarizing instances spread across many packages). Repeatable --scope keeps the pattern explicit and validatable per-scope without changing the meaning of "scope" itself — each scope still resolves through the workspace + cache + HTTP tiers exactly as a single scope does, and a single --scope invocation behaves unchanged.
Has Forbidden Rule#
TextRationale
#An InstanceTransformation with zero matching inputs MUST be treated as an authoring error by the runner. If the desired behavior is "emit something even when there are no inputs," the transformation MUST be a SetTransformation.The two subclasses differ on this exact contract. Emitting zero artifacts from an InstanceTransformation when zero inputs match is what the author asked for (one artifact per input, none present). Emitting one artifact from a SetTransformation when zero inputs match is also what the author asked for (one artifact summarizing the set, even when empty). Conflating the two means a SetTransformation case ("we want an index page even on day zero") accidentally lands as an InstanceTransformation and never produces the index. Pick the subclass that matches the cardinality semantics.
#Transformation.outputs MUST reference named OutputFormat instances (tx.html, tx.markdown-with-frontmatter, tx.json, etc.), not string identifiers.OutputFormat references are validatable — the runner resolves the URI and looks up the registered backend by the format's backendUri. Authoring outputs as plain strings would silently accept typos, drift from the actual backend registry, and require every consumer to maintain its own string-to-backend mapping. The OutputFormat indirection is the contract the runner depends on.
#Sum, Min, Max, and Average MUST receive a list source whose elements resolve to xsd:integer or xsd:decimal values. Source expressions that yield strings, references, or embedded objects are an authoring error.The runner enforces strict numeric typing at evaluation time — non-numeric elements raise an error rather than coerce silently. This catches mistakes where an author intended to sum a numeric property but wrote a property whose range is xsd.string, or forgot to wrap a list of subjects in a list-map that extracts the numeric field. Min/Max/Average on an empty list also error, since the contract is undefined; guard with IsSet when an empty input is possible.
Has Valid Example#
ValueDescription
#skill-to-skill-md-transformation: type: tx.InstanceTransformation tx.inputPattern: tx.matchesClass: skills.Skill tx.requires: - skills.name - skills.description tx.artifactName: type: tx.PropertyRead tx.readSource: type: tx.VarRef tx.varName: input tx.readProp: skills.name tx.outputs: - tx.markdown-with-frontmatter tx.rule: type: tx.BuildAstNode tx.astClass: docast.Document # ... rest of the document-construction rule Per-instance transformation - one Skill input produces one SKILL.md artifact. The rule sees `input` (singular) bound to the matching Skill subject; artifactName carries the per- instance filename stem.
#snapshot-index: type: tx.SetTransformation tx.inputPattern: tx.matchesClass: wv.WorldviewSnapshot tx.sortBy: - tx.byProperty: wv.observedAt tx.order: tx.descending tx.artifactName: type: tx.StringLiteral tx.stringLiteral: index tx.outputs: [tx.html] tx.rule: type: tx.BuildAstNode tx.astClass: docast.Document # ... rule binds `inputs` (plural) to the sorted set Set transformation with sort - aggregates every WorldviewSnapshot into a single index page. inputs is bound to the sorted list (newest first); the rule typically wraps it in a ForEach to render one row per snapshot.
#per-thesis-trajectory: type: tx.SetTransformation tx.inputPattern: tx.matchesClass: wv.WorldviewSnapshot tx.partitionBy: wv.thesisName tx.artifactName: type: tx.Concat tx.parts: - type: tx.StringLiteral tx.stringLiteral: "thesis-" - type: tx.VarRef tx.varName: key tx.outputs: [tx.html] tx.rule: type: tx.BuildAstNode tx.astClass: docast.Document # ... rule sees `inputs` (this partition's snapshots) + `key` (thesis name) Set transformation with fan-out - one artifact per distinct value of partitionBy. inputs is bound to that partition's members; key is bound to the partition value.
## Same SetTransformation as aggregate-set-transformation-with-sort, # invoked across multiple packages. Each --scope is loaded # separately and the matched subject sets are unioned (deduped # by URI) before binding to the rule's `inputs`. # # $ kanonak transform run snapshot-index \ # --scope worldview.genval.ai/example-ai-capex \ # --scope worldview.genval.ai/example-macro \ # --format html \ # --out _site/ Cross-package aggregation - the matched-subject set in `inputs` includes WorldviewSnapshot instances from both scoped packages. Authors typically pair this invocation shape with tx.SubjectUri inside the rule to derive per-input deep links back to each snapshot's source package.
## The version-bump-as-time-axis pattern: a publisher # bumps the same package's version each time the # underlying observation changes, so each version is one # immutable point on a timeline. The local name stays # constant ("view"); the version segment carries the # temporal information. # # worldview.genval.ai/snapshot@1.0.0/view # worldview.genval.ai/snapshot@1.0.1/view # worldview.genval.ai/snapshot@1.0.2/view # # Three URIs, three subjects (the version is part of # identity per the URI structure convention). Aggregate # across all versions with one --scope per version: # # $ kanonak transform run snapshot-timeline \ # --scope worldview.genval.ai/snapshot@1.0.0 \ # --scope worldview.genval.ai/snapshot@1.0.1 \ # --scope worldview.genval.ai/snapshot@1.0.2 \ # --format html --out _site/timeline/ Cross-version aggregation - subjects in different versions of the same package with the same local name are different instances. The runner's multi-scope dedup keys on the full URI including version, so the timeline transformation sees one entry per version. tx.SubjectUri inside the rule emits version-qualified URIs for stable deep links.
## Render each timeline card with its package version as a # display chip. tx.UriVersion extracts the version segment # from the matched subject's URI directly — no need to # author the version as a separate property on every # snapshot, no need to parse the rendered HTML client- # side. # # Inside a ForEach over `inputs`: # # - type: tx.StringLiteral # tx.stringLiteral: '<span class="snapshot-version">v' # - type: tx.UriVersion # tx.uriVersionOf: # type: tx.VarRef # tx.varName: snapshot # - type: tx.StringLiteral # tx.stringLiteral: '</span>' # # Companion primitives — tx.UriPublisher and tx.UriPackage — # cover the other URI segments and follow the same shape. Per-segment URI access via tx.UriPublisher / tx.UriPackage / tx.UriVersion / tx.UriName. Useful for display chips, grouping headers, and any rendering that wants identity data without round-tripping through SubjectUri + string parsing.
Has Invalid Example#
ValueDescription
#broken-index: type: tx.InstanceTransformation tx.inputPattern: tx.matchesClass: wv.WorldviewSnapshot tx.rule: type: tx.BuildAstNode tx.astClass: docast.Document # ... author wrote a per-input rule but expected one combined index Wrong subclass - emits one Document per snapshot rather than one combined index. The author wanted aggregation; the type should be tx.SetTransformation with the rule iterating `inputs` rather than reading `input`.
#broken-output: type: tx.InstanceTransformation tx.outputs: - markdown-with-frontmatter Wrong - outputs must be a reference to a named OutputFormat instance, not a plain string. The validator rejects the string form because it cannot verify a backend exists for it.
#

open-world-augmentation

Any Kanonak package may assert statements about any entity defined in any other package, by referencing the entity through an aliased name in the augmenting document's body. The parser merges all statements about the same canonical URI into one Subject, just as RDF specifies for IRI assertions across documents.

Has Required Rule#
TextRationale
#When two or more packages declare entities whose canonical URI (publisher + package + version + local name) is the same, the parser MUST merge their statements into a single SubjectKanonak. Conflicting statements (different objects for the same predicate) MUST coexist; consumers handle precedence by load order or explicit declaration.RDF semantics are open-world by design. Forcing all statements about a class to live in its defining package would make publisher augmentation impossible and force every cross-cutting concern (universal renderers, peer observations, federated annotations) into a fork-and-bump of the upstream package. Open-world merge unblocks the polymorphic-derivation pattern where universal defaults live in their own package without touching core.
Has Recommended Rule#
TextRationale
#When you need to add statements to an upstream class (e.g. attaching a default renderer, a peer annotation, a commercial extension), prefer declaring those statements in your own package by referencing the class via an aliased name (rdfs.Resource, skill.Skill) rather than forking the upstream package and editing it.Augmentation keeps your concerns in your namespace, preserves upstream version pinning, and composes with other publishers' augmentations. Forking decouples you from upstream updates and forces coordination on every release.
Has Valid Example#
ValueDescription
## Augmenting `rdfs.Resource` (defined in core-rdf) with # a `derivations` property from a separate package. # # universal-derivations: { type: Package, ..., imports: # [{ publisher: kanonak.org, packages: [{ package: core-rdf, # version: 1.0.0, alias: rdfs }] }] } # # Then in the same document: # # rdfs.Resource: # derivation.derivations: [...] # # No `type:` is asserted on rdfs.Resource here — we are # NOT redefining it, we are augmenting it. The parser # merges these statements with the original definition # in core-rdf because both resolve to the same canonical # URI: kanonak.org/core-rdf@1.0.0/Resource.Valid - the alias prefix `rdfs.` resolves through the document's imports; the entity name `rdfs.Resource` canonicalizes to `kanonak.org/core-rdf@1.0.0/Resource`, and the merge happens at parse time.
#

derivation-bindings

A class declares the transformations that produce its derived artifacts via a derivations list. Each Derivation entry binds a (format, variant) pair to a TransformationReference. Discovery walks the class hierarchy bottom-up to find the first matching binding.

Has Recommended Rule#
TextRationale
#Declare `derivation.derivations` on the class whose instances should be derivable into a given format. Use named instances of `formats.Format` and `derivation.Variant` for the format and variant fields — never magic strings.Magic strings drift (`js` vs `javascript`, `csharp` vs `c-sharp`). Named instances are URI-comparable, validate in the type system, and let publishers extend the vocabulary by defining their own format / variant instances without coordinating with anyone.
#Prefer declaring derivations on the class. Only declare them on a specific instance when that instance needs a shape no other instance of the class wants — a one-off override.Class-level bindings inherit through subclasses and apply to every instance for free. Per-instance bindings are full replacements (not merges with the class), so an instance that overrides loses access to all the class's other bindings unless it re-declares them.
#

derivation-override-semantics

Override semantics differ by level. Class-hierarchy override merges by (format, variant) key — a subclass overriding html/default keeps its parent's markdown/summary. Per-instance override replaces entirely — an instance that declares its own derivations sees no class-level bindings.

Has Required Rule#
TextRationale
#When discovery walks the class hierarchy, statements at DIFFERENT classes for the SAME (format, variant) pair are resolved by closest-class-wins (subclass takes precedence over superclass). Statements at DIFFERENT classes for DIFFERENT (format, variant) pairs ALL apply — they merge.Authors expect inheritance to behave like CSS specificity or method override in OO: a subclass override replaces the specific binding while inheriting the others. The merge-by-key behavior is what makes universal defaults useful — you override what you want to customize, inherit the rest.
#When an instance declares its own `derivations`, those REPLACE all class-level bindings for that instance — no merge. Instances that override take full ownership of all (format, variant) pairs they want available.Instance-level overrides are deliberate "this resource is special" statements. Merging them with class bindings would re-introduce silent fallback behavior and mask which fields the author intended to override at the leaf. Replace semantics align with the no-mocks/no-fallbacks rule — failure to declare a binding is visible, not silently filled.
#

universal-default-derivations

The kanonak.org/universal-derivations@1.0.0 package augments rdfs.Resource with default derivations so EVERY Kanonak resource has at least HTML / Markdown / JSON artifacts available out of the box, without per-publisher action. Publisher classes override per (format, variant) only where they have something domain-specific to say.

Has Recommended Rule#
TextRationale
#Authors of new classes SHOULD NOT declare derivations for formats covered by universal-derivations (html, markdown, json) unless they have a publisher-specific shape to render. Letting the universal default apply is the path of least surprise — consumers get consistent rendering across publishers.The Kanonak Browser model is the consumer story: point it at any URI, walk the class hierarchy, find the right derivation, render. Per-class redeclaration of already-universal bindings is noise and divergence; override only when the publisher has earned the right to differ.
#New publisher vocabulary packages MAY ship without any `derivations` declarations. Their instances will still render via the universal defaults inherited from rdfs.Resource. Add publisher-specific derivations later as the rendering needs of those instances become clear.Removes the adoption tax. A publisher who declares a new class today gets HTML / Markdown / JSON renderings for free; they can add domain-specific transformations as a later iteration without blocking initial usability.
#

site-curation

Publisher-curated content (StaticPage, Asset, AggregateView) is authored as Kanonak resources and materialized via kanonak site build. StaticPage holds about-page-style content; Asset holds shared CSS / JS / nav HTML referenced by layout transformations; AggregateView wraps a SetTransformation + scope so cross-resource pages (timelines, indexes, changes) are declarative instead of CI-script logic.

Has Recommended Rule#
TextRationale
#Pages that are NOT derived from another resource (about pages, privacy notices, documentation landings, glossaries) SHOULD be authored as `site.StaticPage` instances rather than embedded in CI heredocs or hand-written HTML files. The page's content lives in the Kanonak graph; the publisher's HTML layout transformation renders it.The about-page-as-heredoc pattern was the canonical symptom of "this content has nowhere to live in the ontology." StaticPage gives it a home; universal defaults render it for free; publisher overrides give it the site-styled chrome. The content stops being divorced from the rest of the publisher's data graph.
#CSS / JS / shared HTML fragments used by multiple layout transformations SHOULD be authored as `site.Asset` instances (mediaType + content) and pulled into transformations via `tx.PropertyRead` against `site.content` (typically wrapped in `tx.ResolveRef + tx.UriLiteral`). Replaces copy-paste of style/script blocks across N transformations.Per-transformation duplication of style/script blocks was the worst class of drift in the worldview pain points (theme variables had to be edited in 5 places for one CSS variable change). Centralizing assets as Kanonak resources lets one author edit propagate everywhere without coordination.
#Pages that aggregate over multiple resources of the same class (timelines, indexes, changes pages, registries) SHOULD be authored as `site.AggregateView` instances — scopeClass + scopeSources + transformation + outputPath. `kanonak site build` materializes them; consumers stop authoring per-aggregate workflow steps.The "scope enumeration in CI shell" anti-pattern was the second-worst pain point in worldview's workflow. Each aggregate page required an explicit bash loop to list packages and pass them as `--scope` arguments. AggregateView captures the same intent declaratively; the `pkg@*` wildcard handles the time-axis case (every version of a snapshot package).
#

cli-default-wrapper-opt-out

Transformations that build a complete output document themselves (their own <!DOCTYPE html>, their own CSS, their own page chrome) can declare tx.omitWrapper: true on the relevant tx.FormatOverride to suppress the CLI's default chrome. The HTML backend then emits ONLY the rendered children. JSON / plain-Markdown backends ignore the flag because they have no chrome.

Has Recommended Rule#
TextRationale
#When a transformation produces a complete HTML document (publisher layout transformations that inline their own CSS, nav, header, footer), declare `tx.formatOverrides: [{ tx.formatTarget: tx.html, tx.omitWrapper: true }]` so the CLI's default chrome doesn't double-wrap the output.The CLI's default HTML wrapper was designed for the "transformation produces a fragment" case. For publisher layouts that produce the complete page, it was actively hostile — authors had to override its CSS with `!important` and hide its `<dl class="metadata">` with display:none. omitWrapper makes the opt-out explicit and idiomatic.
#

content-shaping-primitives

transformations@3.5.0 ships expression primitives for the content-shaping work that previously forced inline JS into rendered HTML: DateFormat (ISO datetime → human-readable), Add/Subtract/Multiply/Divide (arithmetic for SVG coordinates, evidence counts, sized text positions), and Reverse (newest- first orderings of source-chronological lists).

Has Recommended Rule#
TextRationale
#When a rendered artifact needs computed content (formatted dates, computed positions, reversed orderings), prefer the matching tx 3.5.0 expression primitive over emitting inline JS to do the work in the consumer's browser.Inline JS is consumer-side work that locks every consumer of the artifact into a JS-capable runtime, and lives in a place no other tool can discover or reason about. Engine primitives produce the same value at transformation time and ship JS-free artifacts consumable by anything that can read HTML / Markdown. The transformation rule becomes the single source of truth for the computation.