Protocol

Kanonak Protocol

The Kanonak Protocol — an open protocol for defining, versioning, and sharing semantic ontologies across distributed publishers. 2.0.0 separates foundational protocol concerns from vocabulary- specific guidance: the latter now lives in sibling *-conventions packages alongside each vocabulary (kanonak.org/transformations-conventions, kanonak.org/site-conventions). This package documents only the foundational rules every Kanonak document obeys regardless of which vocabulary it uses. 2.1.0 documents the canonical URL form (URI ↔ URL bijection that lets the same content serve a web browser AND a Kanonak Browser from one source), URI fragments for addressing embedded resources (dot-bracket path notation through the parent's parsed graph), and URI query strings for UI / consumer state (Browser tab selectors, derivation variants, view filters). Together these finalize the addressing model: identity in the path, sub-address in the fragment, view state in the query — all three deterministic and round-trippable across protocols (kan: was considered and explicitly dropped — https URLs work everywhere). 2.2.0 adds the canonical structural hash — a representation- independent SHA-256 over the package's parsed object form. Today's kanonak.lock hashes the raw YAML bytes, which couples package identity to authoring choices that don't affect meaning (resource ordering, quoting style, comments). The canonical hash decouples identity from byte representation: the same logical package hashes identically whether it's read from a `.kan.yml` file, reconstructed from a DynamoDB row, or re-serialized through any alternative format. Implementation lives in the SDK (`canonicalForm`, `canonicalHash`) and the CLI (`kanonak hash`); the protocol rules below normatively define the canonical form so any future implementation in another language matches byte-for-byte.

Protocol

Conventions

uri-structure

Kanonak URIs uniquely identify entities using the format publisher/package@version/name

Has Required Rule#
	Text	Rationale
#	Kanonak URIs MUST follow the format publisher/package@version/name where all components are required	Standardized URI format ensures global uniqueness and enables automatic resolution across namespaces and package registries
#	Each Kanonak URI MUST uniquely identify exactly one entity across the entire Kanonak ecosystem	Guarantees unambiguous entity references and prevents naming conflicts through publisher namespacing

Has Valid Example#
	Value	Description
#	kanonak.org/core-rdf@1.0.0/Class	Valid Kanonak URI referencing the Class entity from the core-rdf package

Has Invalid Example#
	Value	Description
#	mypackage/Entity	Invalid - missing publisher domain and version components

canonical-url-form

Every Kanonak URI deterministically maps to a canonical https URL of the form https://publisher/package/version/name (no trailing slash). The URL is the public-web representation; the URI is the protocol- native form. Both forms round-trip via structural pattern matching; tools interconvert without consulting publisher infrastructure or any per-publisher routing tables.

Has Required Rule#
	Text	Rationale
#	The canonical URL MUST be derivable from any Kanonak URI by structural substitution: the URI `publisher/package@version/name` maps to the URL `https://publisher/package/version/name`. Inversely, any URL matching this structural pattern MUST parse back to its source URI without consulting any publisher infrastructure.	Bijective URI/URL mapping is what lets the same content serve a traditional web browser AND a Kanonak Browser from one source: a link emitted as `https://...` works in a web browser by following the URL to the published static rendering, AND in a Kanonak Browser by parsing the URL back to a URI and navigating internally. No per-tool link form, no scheme registration, no publisher-side routing config — just the protocol's structural guarantee.
#	The canonical URL MUST NOT include a trailing slash. Fragments (see uri-fragments) append directly: `https://publisher/package/version/name#path`.	Trailing slash + fragment composes awkwardly (`.../name/#path` reads as if the fragment lives inside a directory). Without the slash, `.../name#path` reads as the embedded thing inside `name` — which is the actual semantic. Static-site hosting handles the no-slash form via extension stripping, extensionless routing, or server-side rewrites; that's a deployment choice, not a protocol decision.

Has Forbidden Rule#
	Text	Rationale
#	Publishers MUST NOT introduce non-canonical URL paths that require per-publisher routing tables to map back to URIs. Pretty paths come from THOUGHTFUL PACKAGE NAMING, not from custom routing — a publisher who wants `/timeline` names their package `timeline` and addresses the canonical URL `/timeline/<version>/<name>`.	Non-canonical paths break URL → URI deterministic mapping. Any tool that would navigate via such a URL has to either guess (wrong) or fetch and interpret the publisher's routing config (slow and brittle). The constraint is generative — it forces publishers to model their site shape into their package names, which is exactly the modeling discipline the protocol exists to enable. Custom-routing escape hatches were considered and explicitly deferred during the original URL-convention design.

Has Valid Example#
	Value	Description
#	URI: worldview.genval.ai/snapshot@1.0.6/view URL: https://worldview.genval.ai/snapshot/1.0.6/view	Direct structural mapping. The URL is web-portable (works in any browser); the URI is compact for in-protocol references.
#	URL: https://worldview.genval.ai/snapshot/1.0.6/view#header.title	With no trailing slash, the fragment appends directly. See uri-fragments for the fragment grammar.

Has Invalid Example#
	Value	Description
#	https://worldview.genval.ai/snapshot/1.0.6/view/	Wrong - trailing slash conflicts with fragment composition and introduces directory semantics not native to Kanonak's resource model.
#	https://worldview.genval.ai/snapshots/latest/	Wrong - non-canonical paths break URL → URI deterministic mapping. Pretty paths come from thoughtful package naming, not custom routing tables.

uri-fragments

URI fragments (after #) address embedded resources within a parent. The fragment payload is a path expressed in dot-bracket notation: dot for dict-keyed (named) embedded properties, [N] for positional list items by 0-based index. Tools resolve fragments by traversing the parent's parsed graph (SubjectKanonak.statement[]).

Has Required Rule#
	Text	Rationale
#	Fragment payloads MUST follow dot-bracket path notation with two traversal mechanisms: - `name`: traverse by dict-key name (for dict-keyed embeddeds) or by property name (for properties on a subject or embedded). - `[N]`: index into a list value by 0-based position. Works for ANY list value, whether items are dict-keyed or positional. - `.`: separates path segments. The two mechanisms compose freely. For lists where items have dict-key names, BOTH `name` and `[N]` resolve to the same item; the name form is preferred for stability (an index shifts when items reorder; a dict-key name does not).	Mirrors JSONPath / jq-style addressing, which is familiar to anyone who's worked with structured-data tools. Supporting both traversal mechanisms gives consumers the right tool for the situation: name-based addressing for stability when sharing and bookmarking, position-based addressing when the user knows "the first item" but not its name (or when items have no names at all).
#	Tools resolving a fragment MUST traverse the parent resource's parsed graph (`SubjectKanonak.statement[]`) following the fragment's path segments to locate the target EmbeddedKanonak. UI tools that highlight or scroll-to embedded content SHOULD support prefix-based parent-highlighting — a parent container with fragment path `address` highlights when the user navigates to `address.city`.	Fragment navigation is meaningful only if it surfaces visually: scrolling to the right place, highlighting it, expanding any collapsed parents. Prefix-based parent matching makes the highlight follow the path naturally as users drill in.

Has Forbidden Rule#
	Text	Rationale
#	Fragments MUST NOT carry UI / consumer state (view selectors, filter state, pagination, locale). All such state belongs in the query string (see uri-query-strings). The fragment IS the protocol-level resource address — nothing else.	Mixing state into the fragment would break the addressing invariant: stripping the fragment must always leave the URL pointing at the parent resource, never at a different resource or a different sub-address. Keeping state in `?...` and addressing in `#...` makes both round-trip cleanly through sharing, copying, and bookmarking.

Has Valid Example#
	Value	Description
#	https://acme.com/employees/1.0.0/alice#address	Fragment "address" addresses the embedded address resource within the alice top-level resource.
#	https://acme.com/employees/1.0.0/alice#address.city	Two-level traversal: alice's address embedded, then its city property within.
#	https://acme.com/teams/1.0.0/engineering#members[0].name	First member positionally, then their name property. Mixed positional + named composition.
#	https://portfolio.genval.ai/strategies/1.0.0/managed-futures-trend#hasFitnessScore.fit-quiet.scoreValue	Three-level dict-keyed traversal — typical of property-bag ontologies with nested named embeddeds.
#	https://portfolio.genval.ai/strategies/1.0.0/managed-futures-trend#hasFitnessScore.fit-quiet.scoreValue https://portfolio.genval.ai/strategies/1.0.0/managed-futures-trend#hasFitnessScore[0].scoreValue	Both URLs resolve to the same scoreValue. The first form traverses by dict-key name (`fit-quiet`), the second by positional index (`[0]`). Either is valid; the name form survives item reordering, the index form is useful when the consumer knows position but not name.

Has Invalid Example#
	Value	Description
#	https://worldview.genval.ai/snapshot/1.0.6/view#view=html	Wrong - "view=html" is UI state, not a resource address. Belongs in the query string (`?view=html`), not the fragment.

uri-query-strings

Query strings (after ?) carry UI and consumer state — Browser tab selector, derivation variant, pagination, filter state, locale, theme. They MUST NOT carry resource addressing; the URI plus its fragment is the complete resource address.

Has Required Rule#
	Text	Rationale
#	Query string keys MUST express UI / consumer state, never resource addressing. Stripping the query string MUST leave the URL pointing at the same resource at the same level of detail.	Resource identity belongs in the path + fragment so URIs and URLs round-trip cleanly and so deep links survive when consumers strip `?...` state for canonical sharing. Mixing addressing into the query (e.g. `?id=...`) would re-introduce per-publisher routing tables that canonical-url-form's forbidden rule explicitly rules out.

Has Valid Example#
	Value	Description
#	https://worldview.genval.ai/snapshot/1.0.6/view?view=markdown	`?view=markdown` selects which Browser tab to open (HTML, Markdown, Source, etc.). Same resource, different consumer rendering.
#	https://worldview.genval.ai/snapshot/1.0.6/view?variant=compact	`?variant=compact` selects a different derivation variant for the same (resource, format) pair. Resource identity unchanged.
#	https://acme.com/employees/1.0.0/alice#address.city?view=html	Both fragment (resource sub-address) and query (UI state) compose cleanly. Stripping the query leaves a valid resource URL; stripping the fragment leaves a valid parent URL.

Has Invalid Example#
	Value	Description
#	https://worldview.genval.ai/?resource=snapshot@1.0.6/view	Wrong - resource addressing in the query string. Everything that identifies WHICH resource MUST live in the URL path + fragment. The query string is for state ABOUT the consumer's view, not WHICH resource is being viewed.

publisher-naming

Publishers must be domain-based identifiers to establish ownership and enable registry discovery

Has Required Rule#
	Text	Rationale
#	Publisher identifiers MUST be valid domain names containing at least one dot character	Domain-based publishers establish clear organizational ownership, enable automatic registry discovery via .well-known endpoints, and prevent naming conflicts through DNS
#	Publishers MUST control the domain name used in their publisher identifier	Ensures authenticity and prevents namespace squatting by requiring verifiable domain ownership

Has Valid Example#
	Value	Description
#	kanonak.org	Official Kanonak Protocol publisher using a .org domain
#	acme.com	Company publisher using a .com domain

Has Invalid Example#
	Value	Description
#	myproject	Invalid - not a domain name. Must use domain-based identifier like myproject.org or myproject.dev

package-naming

Package names are lowercase-hyphen identifiers that describe the domain of entities they contain, with plural forms reserved for category packages so the singular form is available for instances inside them

Has Required Rule#
	Text	Rationale
#	Package names MUST start with a lowercase letter and contain only lowercase letters, numbers, and hyphens (no periods)	Lowercase with hyphens ensures compatibility with file systems, URLs, and OCI registries while maintaining readability. Periods are reserved for alias.resource reference syntax.

Has Recommended Rule#
	Text	Rationale
#	Use hyphen notation to create descriptive package names for related ontologies	Hyphen notation enables logical grouping of related packages while avoiding conflicts with alias.resource reference syntax
#	Packages that define a class or are expected to contain multiple related instances SHOULD use a plural kebab-case noun (protocols, agent-skills, agents, capabilities, github-skills), so the singular form stays available for the instances inside them	A category package is a namespace for a family of things; naming it after the family rather than a single member leaves the singular name free to identify one member without a YAML duplicate key collision with the package declaration. This is how the package "protocols" can contain an instance named "mcp" without either the document having a duplicate top-level key or forcing the instance into a non-conforming PascalCase name.
#	A package that exists specifically to describe one concrete entity MAY use a singular name that matches the entity it describes (e.g., mcp, a2a, kanonak-protocol), provided the singleton instance inside uses a different kebab-case name so the package declaration and the instance do not collide on the same top-level YAML key	Instance packages are named after the thing they describe, which makes the obvious instance name collide with the package name. Resolving the collision by giving the instance a longer, descriptive kebab-case name (model-context-protocol, agent-to-agent-protocol, kanonak-protocol-spec) keeps the package name short and recognizable while still following the kebab-case-for-instances rule.

Has Forbidden Rule#
	Text	Rationale
#	Category packages MUST NOT use a singular name that shadows the class or instance they host - for example, a package named "protocol" that defines the Protocol class, or a package named "skill" that holds a single Skill instance	A singular category package name forces every consuming instance to either rename itself or fight the YAML duplicate-key error. Pluralizing the package eliminates the conflict at the source and makes it obvious from the name that the package is a namespace, not a single thing.

Has Valid Example#
	Value	Description
#	protocols, agent-skills, agents, capabilities, skill-capabilities, agent-capabilities, github-skills, github-agents	Category packages using plural kebab-case names, leaving the singular form free for the classes and instances they host
#	mcp (package) + model-context-protocol (instance)	A specific-instance package named after its abbreviation, with a longer descriptive kebab-case instance name to avoid the duplicate-key collision
#	core-rdf	Core RDF vocabulary using hyphen notation

Has Invalid Example#
	Value	Description
#	MyPackage	Invalid - contains uppercase letters. Must use lowercase only
#	protocol (package that defines the Protocol class)	Invalid - singular category package shadows the class and forces every consuming instance to work around a name collision
#	mcp package + mcp instance	Invalid - the package declaration and the instance both parse as the top-level YAML key "mcp", which is a duplicate key error

resource-naming

Distinct casing conventions per entity role make role immediately obvious and align with code generation

Has Recommended Rule#
	Text	Rationale
#	Kanonak uses three naming conventions based on entity role: classes use PascalCase (Person, OrderStatus), instances use kebab-case (romeo-montague, key-2026-03), and properties use camelCase (subClassOf, hasAddress). This casing rule applies to every instance, including inline dict-keyed embedded instances such as the Convention, Rule, and Example entries inside a Protocol.	Distinct casing per role makes it immediately obvious whether an entity is a type definition (PascalCase), a data instance (kebab-case), or a property (camelCase). This improves readability, prevents confusion, and aligns with code generation conventions in target languages. Applying the rule to embedded keys too keeps the rule absolute rather than contextual, so the same entity type always looks the same in YAML no matter how it is authored.
#	Inline dict-keyed embedded instances MUST follow the same kebab-case rule as top-level instances. A Convention key inside hasConvention, a Rule key inside hasRequiredRule, an Example key inside hasValidExample, a CapabilityCommand key inside hasCommand, and so on are all instances of their respective classes - they just happen to be authored inline rather than as top-level SubjectKanonaks - and the casing rule applies identically.	Embedded instances have the same semantic status as top-level instances - the SDK parses them into EmbeddedKanonak nodes that carry statements just like SubjectKanonaks do. Treating them as a stylistic label rather than an instance muddies the casing rule and leads to inconsistent documents where top-level and embedded instances of the same class look different. Holding the line on kebab-case for all instance keys keeps the authoring surface predictable.
#	Resource names SHOULD start with a letter and contain only letters, numbers, hyphens, and underscores	Following naming conventions ensures compatibility with code generation targets and URI construction. Names that violate conventions generate warnings to encourage consistency.

Has Forbidden Rule#
	Text	Rationale
#	Resource names MUST NOT use reserved RDF/OWL prefixes like 'rdfs:', 'xsd:', 'owl:' - use imports and qualified references instead	Prefixed names (rdfs:Class) are RDF/Turtle syntax, not Kanonak YAML syntax. Kanonak uses imports with aliases for namespace qualification. Prefixes in resource names cause parser errors and violate Kanonak conventions.

Has Valid Example#
	Value	Description
#	Person, OrderStatus, SigningKey, BlogPost	PascalCase - used for class (type) definitions
#	romeo-montague, key-2026-03, commercial-use, alice-johnson	kebab-case - used for instances (data entities), including inline dict-keyed embedded instances
#	subClassOf, hasAddress, signingKeyId, characterName	camelCase - used for property definitions
#	hasConvention: uri-structure: summary: ...	Inline dict-keyed Convention instance uses kebab-case exactly like a top-level instance would

Has Invalid Example#
	Value	Description
#	rdfs-colon-Class	Invalid - uses RDF colon prefix syntax (as in rdfs followed by colon followed by Class). Use plain Class with imports, then reference it as rdfs.Class when disambiguation is needed.
#	hasConvention: UriStructure: summary: ...	Invalid - embedded Convention instance keys must be kebab-case. Rename to uri-structure.

versioning

Versions follow semantic versioning to communicate compatibility and breaking changes

Has Required Rule#
	Text	Rationale
#	Versions MUST follow semantic versioning format major.minor.patch where each component is a non-negative integer	Semantic versioning provides a standard way to communicate backward compatibility and breaking changes to package consumers
#	Increment major version when making backward-incompatible changes to the package	Major version increments signal to consumers that manual migration may be required due to breaking changes
#	Increment minor version when adding backward-compatible functionality and patch version for backward-compatible bug fixes	Allows consumers to safely update within the same major version while preventing unexpected breaking changes

Has Valid Example#
	Value	Description
#	2.1.3	Valid semantic version with major 2, minor 1, patch 3

Has Invalid Example#
	Value	Description
#	v1.0	Invalid - missing patch component and has v prefix. Must be 1.0.0

file-naming

Kanonak documents follow a standard file naming pattern for discoverability

Has Required Rule#
	Text	Rationale
#	Kanonak document files MUST be named using the pattern package@version.kan.yml	Standardized file naming enables automatic discovery, prevents conflicts, and makes the namespace structure visible in the file system. The .kan.yml extension distinguishes Kanonak files from other YAML files.

Has Recommended Rule#
	Text	Rationale
#	Kanonak documents SHOULD be organized in directories matching the publisher name	Directory structure mirrors namespace organization making it easy to locate packages and understand ownership. Publisher directories prevent naming conflicts and enable per-publisher configuration.

Has Valid Example#
	Value	Description
#	kanonak.org/core-rdf@1.0.0.kan.yml	Correct file naming with publisher directory and version
#	mycompany.com/products@2.1.3.kan.yml	Company package with semantic version

import-operators

Import operators, existence, and cycle constraints for package dependencies

Has Required Rule#
	Text	Rationale
#	Use exact version operator (=) to lock imports to a specific version for reproducible builds	Exact version matching guarantees reproducible builds and prevents unexpected breaking changes from dependency updates
#	All declared imports MUST resolve to existing packages in the repository or package cache	Missing imports break transitive resolution and prevent entity lookup. All dependencies must be available either in local workspace or installed package cache.
#	Package imports MUST NOT create circular dependency chains between published packages	Circular package dependencies (A imports B, B imports A) create unresolvable import cycles that prevent package managers from building dependency graphs. Kanonak handles cycles gracefully at runtime but published packages cannot have circular dependencies.

Has Recommended Rule#
	Text	Rationale
#	Use compatible version operator (~) to allow patch updates within the same minor version	Enables bug fixes and security patches while preventing breaking changes from minor version updates
#	Use major version operator (^) to allow minor and patch updates within the same major version	Follows semantic versioning conventions where breaking changes only occur in major version increments

Has Forbidden Rule#
	Text	Rationale
#	Avoid any version operator (*) in production packages as it allows all future versions including breaking changes	Creates unpredictable behavior and can introduce breaking changes without warning or control

Has Valid Example#
	Value	Description
#	core-rdf = 1.0.0	Locks to exactly version 1.0.0 of core-rdf
#	core-xsd ~ 1.2.3	Allows versions 1.2.3 through 1.2.x (patch updates only)
#	core-owl ^ 2.0.0	Allows versions 2.0.0 through 2.x.x (minor and patch updates within major version 2)
#	package-a: type: Package publisher: example.com version: 1.0.0 package-b: type: Package publisher: example.com version: 1.0.0 imports: - publisher: example.com packages: - package: package-a match: ^ version: 1.0.0	One-way dependency chain - package-b imports package-a; package-a has no imports referencing package-b. No cycle exists.

Has Invalid Example#
	Value	Description
#	package-a: type: Package publisher: example.com version: 1.0.0 imports: - publisher: example.com packages: - package: package-b match: ^ version: 1.0.0 package-b: type: Package publisher: example.com version: 1.0.0 imports: - publisher: example.com packages: - package: package-a match: ^ version: 1.0.0	Invalid - package-a imports package-b and package-b imports package-a. The import graph contains a cycle the resolver cannot break.

version-resolution

Version resolution selects the highest compatible version from available packages

Has Required Rule#
	Text	Rationale
#	When resolving an import, Kanonak MUST select the highest version that satisfies the version operator constraints	Automatic version resolution ensures packages get the latest compatible updates while respecting semantic versioning constraints. This enables bug fixes and patches without manual intervention while preventing breaking changes.
#	For major version 0 (pre-1.0), the caret operator (^) MUST behave like tilde (~), allowing only patch updates	Semantic versioning treats 0.x.y as unstable where breaking changes can occur in minor versions. Caret operator for ^0.x.y allows only patches (0.x.z) to prevent unexpected breaking changes during pre-release development.

Has Valid Example#
	Value	Description
#	Import: "core-xsd ~ 1.2.3" Available: 1.2.1, 1.2.5, 1.3.0, 2.0.0 Constraints: MinVersion=1.2.3, MaxVersion=1.2.999 Selected: 1.2.5 (highest in compatible range)	Version resolution selects 1.2.5 as the highest version satisfying the tilde constraint - patch updates only within minor version 1.2

yaml-parsing

Kanonak YAML parsing preserves types and detects structural errors

Has Required Rule#
	Text	Rationale
#	Kanonak YAML parsers MUST preserve primitive types (integer, boolean, string) from YAML into the Kanonak object model	Type preservation ensures that integer values remain integers, booleans remain booleans, and strings remain strings through the parse-serialize round-trip. This maintains semantic meaning and enables accurate code generation.
#	Kanonak YAML parsers MUST detect and report duplicate keys with line and column information	Duplicate keys in YAML are ambiguous and indicate authoring errors. Early detection with precise location information helps users quickly fix structural problems before semantic validation.
#	Kanonak YAML parsers MUST strip UTF-8 BOM (Byte Order Mark) if present for cross-platform compatibility	Windows editors may add UTF-8 BOM at file start. Stripping BOM ensures files created on Windows parse correctly on Linux/Mac and vice versa, enabling seamless cross-platform collaboration.

Has Valid Example#
	Value	Description
#	age: 30 isActive: true name: "Alice"	After parsing, age remains an integer (not the string "30"), isActive remains a boolean (not the string "true"), and name is a string. Primitive types round-trip through the Kanonak object model without coercion.

Has Invalid Example#
	Value	Description
#	Person: name: "Alice" name: "Bob"	Invalid - the key 'name' is declared twice on Person. The parser reports a duplicate-key error with the line and column of the second declaration.

package-structure

Every Kanonak document declares exactly one package with required metadata

Has Required Rule#
	Text	Rationale
#	Every Kanonak document MUST contain exactly one Package declaration (a resource with type Package)	The Package declaration provides essential metadata (publisher, version) needed for namespace resolution and dependency management. Without it, the document cannot participate in the Kanonak ecosystem.
#	Package declarations MUST include 'publisher' and 'version' properties, and import entries MUST have valid publisher, package, match, and version fields	Publisher and version are required to construct the namespace URI. Import entries must be well-formed to enable dependency resolution.

Has Valid Example#
	Value	Description
#	core-rdf: type: Package publisher: kanonak.org version: 1.0.0	Correct package declaration with type, publisher, and version

Has Invalid Example#
	Value	Description
#	Person: type: Class	Invalid - the document declares a Person Class but no Package resource. Without a Package declaration the document has no namespace URI and cannot participate in the Kanonak ecosystem.

type-system

Rules governing how entities declare their type and how properties declare their range

Has Required Rule#
	Text	Rationale
#	Top-level entities MUST have a 'type' property declaring what kind of entity they are	Top-level entities need explicit type declaration to enable type checking, validation, and code generation. Unlike embedded objects, top-level entities have no parent property whose range can supply a type.
#	ObjectProperty and DatatypeProperty declarations MUST declare a 'range' specifying the type of values	The range defines what values are valid for a property. Without range, type checking is impossible and the property's purpose is unclear. Range is required for both datatype properties (e.g., string, integer) and object properties (e.g., Person, Address).
#	When using XSD datatypes (string, integer, boolean, etc.) as property ranges, the core-xsd package MUST be imported	XSD datatypes are defined in the core-xsd package. Without importing it, the type resolver cannot verify that range references point to valid datatypes.
#	All classes referenced in 'type' properties MUST be defined in the current namespace or imported packages	Using an undefined class as a type creates entities with unknown structure. The validator ensures every type reference resolves to an actual Class definition.

Has Recommended Rule#
	Text	Rationale
#	Properties SHOULD use specific types (DatatypeProperty or ObjectProperty) instead of generic 'Property'	Generic Property type provides no semantic information about whether the property holds literal values or entity references. Using specific types enables better validation, code generation, and tooling support.

Has Valid Example#
	Value	Description
#	romeo-montague: type: Character characterName: "Romeo"	Top-level entity with explicit type declaration
#	name: type: DatatypeProperty domain: Person range: string	Property with specific DatatypeProperty type and declared range

Has Invalid Example#
	Value	Description
#	romeo-montague: characterName: "Romeo"	Invalid - top-level entity has no type property
#	name: type: DatatypeProperty domain: Person	Invalid - property has no range declaration

embedding

When to model data as an embedded object versus a named top-level entity

Has Recommended Rule#
	Text	Rationale
#	Use embedded objects for data that is tightly coupled to its parent, has no need for independent identity, and is not referenced from anywhere else	Embedding reduces naming overhead, makes the ownership relationship explicit in the structure, and accurately maps to RDF blank nodes - which represent resources without global identity. Promoting such data to a named top-level entity creates URIs that will never be referenced, cluttering the namespace.
#	Use named top-level entities for data that needs a stable identity, is referenced from multiple places, or represents a significant concept with an independent lifecycle	Named entities receive a globally unique URI (publisher/package@version/name) that enables cross-document references, stable citation, and independent versioning. Any data that tooling or other entities need to point to must be named so that the pointer is well-defined.
#	An embedded object SHOULD NOT declare an explicit 'type' property when the type would equal the parent property's range — let the range supply the type implicitly	When the embedded type matches what the parent property's range already specifies, the explicit type adds nothing beyond visual noise. Omitting it keeps documents shorter, makes the common case the default, and signals that explicit type declarations on embeddeds carry meaning - they only appear when narrowing the range to a more specific subclass. A reader who sees 'type:' on an embedded immediately knows it's communicating a non-default subtype.

Has Forbidden Rule#
	Text	Rationale
#	When an embedded object declares an explicit 'type' property, that type MUST be a subclass of (or equal to) the parent property's range	A declared type that is not a subclass of the parent's range is a contradiction - the parent property says "this slot holds Conditions" and the declared type says "this is not a Condition." Permitting it would silently break range-based reasoning, type checking, and code generation. Restricting declared types to range subclasses preserves the inference model while still allowing tagged-union / discriminated-union authoring with proper OWL subclass semantics (for example, declaring 'type' as 'OrCondition' on an embedded under a property whose range is 'Condition').

Has Valid Example#
	Value	Description
#	alice: type: Person hasAddress: street: "123 Main St" city: "Springfield"	Address is embedded because it is owned by alice and never referenced elsewhere - becomes an RDF blank node. The embedded inherits its type from the parent property's range, so no 'type:' is needed.
#	wrapper: type: Wrapper condition: type: OrCondition operands: - type: ThresholdCondition observable: temperature threshold: 90 - type: EventCondition event: sensor-fault	The 'condition' property's range is 'Condition'. Each embedded declares a type that is a strict subclass of 'Condition' (OrCondition, ThresholdCondition, EventCondition all subClassOf Condition), which narrows the range to a specific variant. This is the canonical tagged-union authoring form - explicit types appear precisely where the variant matters.
#	acme-headquarters: type: Address street: "1 Acme Plaza" city: "Metropolis" alice: type: Employee worksAt: acme-headquarters bob: type: Employee worksAt: acme-headquarters	Headquarters is named because multiple employees reference the same address and the entity has identity beyond any single parent

Has Invalid Example#
	Value	Description
#	wrapper: type: Wrapper condition: type: Person name: Alice	Forbidden - the 'condition' property's range is 'Condition', so the embedded's declared type must be Condition or a subclass thereof. Person is unrelated to the Condition hierarchy, so this is a range violation that breaks type checking.
#	alice: type: Person hasAddress: type: Address street: "123 Main St"	Discouraged - the 'hasAddress' property's range is already 'Address', so declaring 'type' as 'Address' on the embedded is redundant. Not a hard error - the document is still semantically valid - but a reader is left wondering what the type was meant to communicate. Drop the line so the type is inferred implicitly.

references

Rules for resolving entity references across namespaces, including shadowing and disambiguation

Has Required Rule#
	Text	Rationale
#	ObjectProperty values MUST reference entities that exist in the current namespace or imports (including transitive imports)	Broken references create semantic gaps where data points to non-existent entities. This prevents typos, missing imports, and data integrity issues. All references must resolve to actual entities.
#	When the same name exists in both the local namespace and an imported namespace, unqualified references MUST resolve to the local definition (local shadowing)	Local shadowing allows intentional overriding, enables namespace-specific customizations, and prevents breaking changes when dependencies add new entities with colliding names.
#	When multiple imported packages define entities with the same name, references MUST use qualified syntax (alias.name) to disambiguate	Ambiguous references make it impossible to determine which entity is intended. Qualified references (pkg.Entity) explicitly specify the source package, preventing confusion and ensuring correct type resolution.

Has Valid Example#
	Value	Description
#	verona-city: type: City romeo: type: Character livesIn: verona-city	Reference to a locally defined entity
#	myEntity: type: pkgA.Resource	Qualified reference disambiguates between packages that both define Resource

Has Invalid Example#
	Value	Description
#	romeo: type: Character livesIn: verona-city	Invalid - romeo.livesIn references verona-city, but no entity named verona-city is defined in this document or any imported package. The validator reports an unresolved-reference error.

hierarchy

Class and property hierarchies must be acyclic and reference their own kind

Has Required Rule#
	Text	Rationale
#	Class inheritance (subClassOf) MUST NOT create circular chains	Circular inheritance (A subClassOf B subClassOf A) is logically impossible and breaks reasoning systems. Class hierarchies must form directed acyclic graphs where every class has a finite path to the root Resource class.
#	The subClassOf property MUST reference an entity that is itself a Class	subClassOf establishes inheritance between classes. Referencing a non-class entity (like a property or instance) is semantically invalid and breaks type system integrity.
#	Property inheritance (subPropertyOf) MUST NOT create circular chains	Circular property inheritance (A subPropertyOf B subPropertyOf A) is logically impossible. Property hierarchies must form directed acyclic graphs.
#	The subPropertyOf property MUST reference an entity that is itself a Property	subPropertyOf establishes inheritance between properties. Referencing a non-property entity (like a Class or instance) is semantically invalid.

Has Valid Example#
	Value	Description
#	Person: type: Class Character: type: Class subClassOf: Person	Correct linear class hierarchy with no cycles

Has Invalid Example#
	Value	Description
#	Character: type: Class subClassOf: Hero Hero: type: Class subClassOf: Character	Invalid - Character and Hero reference each other creating a cycle
#	name: type: DatatypeProperty Character: type: Class subClassOf: name	Invalid - subClassOf must reference a Class, not a Property

open-world-augmentation

Any Kanonak package may assert statements about any entity defined in any other package, by referencing the entity through an aliased name in the augmenting document's body. The parser merges all statements about the same canonical URI into one Subject, just as RDF specifies for IRI assertions across documents.

Has Required Rule#
	Text	Rationale
#	When two or more packages declare entities whose canonical URI (publisher + package + version + local name) is the same, the parser MUST merge their statements into a single SubjectKanonak. Conflicting statements (different objects for the same predicate) MUST coexist; consumers handle precedence by load order or explicit declaration.	RDF semantics are open-world by design. Forcing all statements about a class to live in its defining package would make publisher augmentation impossible and force every cross-cutting concern (universal renderers, peer observations, federated annotations) into a fork-and-bump of the upstream package. Open-world merge unblocks the polymorphic-derivation pattern where universal defaults live in their own package without touching core.

Has Recommended Rule#
	Text	Rationale
#	When you need to add statements to an upstream class (e.g. attaching a default renderer, a peer annotation, a commercial extension), prefer declaring those statements in your own package by referencing the class via an aliased name (rdfs.Resource, skill.Skill) rather than forking the upstream package and editing it.	Augmentation keeps your concerns in your namespace, preserves upstream version pinning, and composes with other publishers' augmentations. Forking decouples you from upstream updates and forces coordination on every release.

Has Valid Example#
	Value	Description
#	universal-derivations: type: Package publisher: kanonak.org version: 1.0.0 imports: - publisher: kanonak.org packages: - package: core-rdf match: ^ version: 1.0.0 alias: rdfs - package: derivation match: ^ version: 1.0.0 alias: derivation rdfs.Resource: derivation.derivations: - format: html transformation: any-resource-to-html	Augmenting rdfs.Resource (defined in core-rdf) with a derivations property from a separate package. No 'type:' is asserted on rdfs.Resource here — this is augmentation, not redefinition. The parser merges these statements with the original definition in core-rdf because both resolve to the same canonical URI: kanonak.org/core-rdf@1.0.0/Resource. The alias prefix rdfs. resolves through the document's imports; the entity name rdfs.Resource canonicalizes to the same URI as core-rdf's own declaration; merge happens at parse time.

package-partitioning

Large datasets SHOULD be partitioned across multiple packages along a stable, immutable axis (alphabetic bucket of an immutable identifier, year, geographic region of origin) rather than published as a single oversized package or accessed through any runtime paging / query primitive. A dedicated schema package holds class definitions; separate partition packages hold instance data. The open-world parser merges them into one logical graph at load time, so consumers load only the partitions they need without losing the schema contract.

Has Recommended Rule#
	Text	Rationale
#	Class definitions (the ontology — Class, ObjectProperty, DatatypeProperty declarations) SHOULD live in a dedicated schema package. Instance data SHOULD live in separate partition packages that import the schema and assert instances of its classes. The two have very different lifecycles and SHOULD version independently.	A class definition changes when the model evolves (rare); instance data churns continuously as the world it describes changes. Bundling them forces every data update to ship a new schema version too, which is noisy and breaks consumer pinning. Separating them lets each evolve at its own pace, lets multiple partition packages coexist against the same schema, and aligns with the open-world model — the schema is the contract; partitions are participants. The pattern composes with derivation bindings: derivations declared on the schema's classes apply automatically to every instance in every partition. A package may legitimately mix both (e.g. a vocabulary that defines classes plus a small set of canonical instances), but at scale the separation pays off.
#	When a dataset would exceed roughly a few thousand instances or a few megabytes as a single package, partition it across multiple packages named after the partition value (employees-a, employees-b, ...; stocks-2024, stocks-2025; customers-us, customers-eu). Consumers load only the partitions they need; partitioning at the package layer replaces any need for runtime pagination, filtering, or query primitives.	Static hosting (GitHub Pages, S3, IPFS) cannot execute runtime queries; any pagination scheme requiring server-side execution would foreclose those publishers from the protocol. Partitioning at the package level is the protocol's pagination — each partition is addressable by URL via the canonical convention, cacheable independently, and version-bumpable on its own cadence. Consumers traverse the graph by loading partitions as the navigation crosses them, never by issuing queries against publisher infrastructure.
#	The partition axis SHOULD be an immutable attribute of the data (alphabetic bucket of an immutable identifier, year of creation, geographic region of origin, hash bucket of a stable ID). Do NOT partition on attributes that change over the data's lifetime (current employer, current owner, current status, current team).	URI stability is the protocol's foundational guarantee (see uri-structure). A resource's URI includes its package name; moving an instance between partitions changes its URI; downstream references break. Stable axes prevent this. For datasets without any naturally stable axis, partition by an immutable surrogate — hash bucket of a stable identifier, year of creation, immutable region of origin. The cost of choosing the wrong axis is paid forever in broken cross-package references.
#	A publisher MAY ship a small index package (e.g. employees-index@1.0.0) that enumerates the partition packages comprising a logical dataset. Consumers and the Kanonak Browser can use the manifest for partition discovery and faceted navigation. The manifest is purely additive — partitions remain self-contained and usable without it.	Partitions work standalone (a consumer who knows which partition contains the data they want loads it directly). But cross-partition discovery — "what partitions of the employees dataset exist?" — requires either a directory listing convention (publisher cooperation, not always available) or an explicit manifest. Manifests are the cleaner answer and stay an opt-in publisher curation choice rather than a protocol requirement.

Has Forbidden Rule#
	Text	Rationale
#	The Kanonak protocol MUST NOT introduce a runtime query, paging, or filtering primitive — no Collection class with pageSize, no $filter URL convention, no built-in SPARQL endpoint contract. Cross-resource access at scale is solved by package partitioning, which works on dumb static hosting.	A runtime query primitive would couple the protocol to publisher-side execution, foreclose static hosting, and introduce every server-side complexity (rate limiting, federation, indexing, search) the protocol exists to avoid. Static partitioning solves the same access pattern with zero server requirements. Individual publishers MAY offer query layers ON TOP of Kanonak as opt-in services (their own SPARQL or GraphQL endpoint over their data, for example), but those are publisher infrastructure decisions, not protocol contracts. Tools across the ecosystem can rely on packages being addressable as static files forever.

Has Valid Example#
	Value	Description
#	employees: type: Package publisher: acme.com version: 1.0.0 imports: - publisher: kanonak.org packages: - package: core-rdf match: ^ version: 1.0.0 alias: rdfs - package: core-xsd match: ^ version: 1.0.0 alias: xsd - package: core-owl match: ^ version: 2.0.0 alias: owl Employee: type: rdfs.Class name: type: owl.DatatypeProperty domain: Employee range: xsd.string	Schema package — defines the Employee class and its properties. Bumped only when the model evolves (rarely). A reader sees that the body declares a Class and DatatypeProperty with no instances, so the package's role as the schema/ontology layer is self-evident.
#	employees-a: type: Package publisher: acme.com version: 1.0.0 imports: - publisher: acme.com packages: - package: employees match: ^ version: 1.0.0 alias: e alice-anderson: type: e.Employee e.name: "Alice Anderson" adrian-aoki: type: e.Employee e.name: "Adrian Aoki"	Partition package — instance data only, surnames starting A. Imports the schema package and asserts instances of e.Employee. A reader sees the import + instance assertions and the role as a partition is self-evident. Bumped as employees in this bucket join, leave, or get edited; independent of the schema's lifecycle and of other partitions (employees-b, employees-c, ...).
#	stocks: type: Package publisher: acme.com version: 1.0.0 stocks-2024: type: Package publisher: acme.com version: 1.0.0 imports: - publisher: acme.com packages: - package: stocks match: ^ version: 1.0.0 alias: s stocks-2025: type: Package publisher: acme.com version: 1.0.0 imports: - publisher: acme.com packages: - package: stocks match: ^ version: 1.0.0 alias: s	Time-axis partitioning along year-of-occurrence. Year is immutable, so URIs are stable forever. Older partitions stop bumping once their year closes; only the current partition churns. A consumer viewing 2025 data loads stocks@1.0.0 + stocks-2025@1.0.0; nothing else needs to be present. Body of each partition (omitted) holds the year's stock instance data.

Has Invalid Example#
	Value	Description
#	employees: type: Package publisher: acme.com version: 1.0.0 Employee: type: rdfs.Class alice-anderson: { type: Employee, name: "Alice Anderson" } bob-anderson: { type: Employee, name: "Bob Anderson" } carlos-cruz: { type: Employee, name: "Carlos Cruz" }	Wrong at scale - one package mixing the schema (the Employee class) and tens or hundreds of thousands of instances. Forces every consumer to download all instances to read any one. Browsers struggle to load; parsers slow; every change requires republishing the whole graph. Lift the Employee class into a separate schema package and partition the instances into employees-a, employees-b, etc. (Three instances are shown for brevity; the smell appears as N grows.)
#	employees-engineering: type: Package publisher: acme.com version: 1.0.0 employees-marketing: type: Package publisher: acme.com version: 1.0.0 employees-sales: type: Package publisher: acme.com version: 1.0.0	Wrong - department membership changes when employees move between teams. Re-partitioning means changing canonical URIs (alice moves from employees-engineering to employees-marketing → her URI changes → every reference to her elsewhere in the graph breaks). Partition by an immutable attribute (surname-bucket, hire-year, employee-id-hash-bucket) instead; treat current-department as a property on each Employee, not as a partition axis.
#	Employee: type: rdfs.Class collection: pageSize: 100 queryEndpoint: https://acme.com/employees/query	Wrong at the protocol level - this would couple every consumer of the Employee class to acme.com's runtime infrastructure. Partition the data into static packages instead. If acme.com wants to offer a faster faceted-query experience over their employees, they can run that as opt-in publisher infrastructure layered ABOVE the partitioned packages, not as a property of the protocol.

documentation-in-the-graph

YAML comments are invisible to every consumer downstream of the parser — the reasoner, the SDK, transformations, the Kanonak Browser, AI agents querying the graph, derivations. If a Kanonak document needs a comment to explain something, the explanation belongs in the graph instead. Treat the urge to comment as a discipline signal: structure first, then existing annotation properties, then new vocabulary as a last resort.

Has Recommended Rule#
	Text	Rationale
#	Avoid YAML comments (lines starting with #) in Kanonak documents. When you reach for one, work down this list and stop at the first answer that fits: (1) Trust the structure. Is the information already visible in what you authored — type relationships, naming conventions, imports, statement structure? If yes, delete the comment; the reader can see it. (2) Use existing annotation properties. rdfs.comment, rdfs.label, rdfs.seeAlso, the entity's description field if it has one. The vocabulary for in-graph documentation already exists. (3) Propose new vocabulary — only when (1) and (2) genuinely don't fit. New ontology is the rarest answer, not the first reflex.	Comments leak knowledge into a layer no consumer can see. Every Kanonak tool — validator, reasoner, browser, derivation engine, AI agent, SDK — operates on the parsed graph, not the source YAML. Information left as a comment cannot be validated, reasoned over, derived from, indexed, surfaced in any rendered artifact, or queried by any consumer. The recurring urge to comment is one of the most reliable ontology-gap signals authors have access to; treat it as a prompt to extend the model or trust the structure, not a prompt to write prose the parser will discard.

Has Valid Example#
	Value	Description
#	Person: type: rdfs.Class rdfs.comment: A human being. rdfs.label: Person	The intended meaning lives on the entity itself via rdfs.comment and rdfs.label. Every consumer of the graph can read these annotations; the SDK exposes them; the Kanonak Browser surfaces them in renderings; AI agents can reason over them.
#	employees: type: Package publisher: acme.com version: 1.0.0 Employee: type: rdfs.Class	A package that declares only a Class and no instances is evidently the schema layer for a dataset. No comment is needed to label it as such — the structure already says it. A reader who knows the partitioning convention recognizes the role on sight.

Has Invalid Example#
	Value	Description
#	Person: type: rdfs.Class	The author wrote a YAML comment above this declaration saying "Person represents a human being." That comment is invisible after parsing — no Kanonak tool can see it. The same information belongs on the entity as rdfs.comment so it lives in the graph.
#	employees-a: type: Package publisher: acme.com version: 1.0.0	The author preceded this with a comment saying "PARTITION package - alphabetic bucket A." That label is exactly what the structure already conveys (the package name is employees-a; the body would import a schema and assert instances). The urge to label reveals laziness, not missing vocabulary; trust the structure.

derivation-bindings

A class declares the transformations that produce its derived artifacts via a derivations list. Each Derivation entry binds a (format, variant) pair to a TransformationReference. Discovery walks the class hierarchy bottom-up to find the first matching binding.

Has Recommended Rule#
	Text	Rationale
#	Declare derivation.derivations on the class whose instances should be derivable into a given format. Use named instances of formats.Format and derivation.Variant for the format and variant fields — never magic strings.	Magic strings drift (js vs javascript, csharp vs c-sharp). Named instances are URI-comparable, validate in the type system, and let publishers extend the vocabulary by defining their own format / variant instances without coordinating with anyone.
#	Prefer declaring derivations on the class. Only declare them on a specific instance when that instance needs a shape no other instance of the class wants — a one-off override.	Class-level bindings inherit through subclasses and apply to every instance for free. Per-instance bindings are full replacements (not merges with the class), so an instance that overrides loses access to all the class's other bindings unless it re-declares them.

derivation-override-semantics

Override semantics differ by level. Class-hierarchy override merges by (format, variant) key — a subclass overriding html/default keeps its parent's markdown/summary. Per-instance override replaces entirely — an instance that declares its own derivations sees no class-level bindings.

Has Required Rule#
	Text	Rationale
#	When discovery walks the class hierarchy, statements at DIFFERENT classes for the SAME (format, variant) pair are resolved by closest-class-wins (subclass takes precedence over superclass). Statements at DIFFERENT classes for DIFFERENT (format, variant) pairs ALL apply — they merge.	Authors expect inheritance to behave like CSS specificity or method override in OO: a subclass override replaces the specific binding while inheriting the others. The merge-by-key behavior is what makes universal defaults useful — you override what you want to customize, inherit the rest.
#	When an instance declares its own derivations, those REPLACE all class-level bindings for that instance — no merge. Instances that override take full ownership of all (format, variant) pairs they want available.	Instance-level overrides are deliberate "this resource is special" statements. Merging them with class bindings would re-introduce silent fallback behavior and mask which fields the author intended to override at the leaf. Replace semantics align with the no-mocks/no-fallbacks rule — failure to declare a binding is visible, not silently filled.

universal-default-derivations

The kanonak.org/universal-derivations@1.0.0 package augments rdfs.Resource with default derivations so EVERY Kanonak resource has at least HTML / Markdown / JSON artifacts available out of the box, without per-publisher action. Publisher classes override per (format, variant) only where they have something domain-specific to say.

Has Recommended Rule#
	Text	Rationale
#	Authors of new classes SHOULD NOT declare derivations for formats covered by universal-derivations (html, markdown, json) unless they have a publisher-specific shape to render. Letting the universal default apply is the path of least surprise — consumers get consistent rendering across publishers.	The Kanonak Browser model is the consumer story: point it at any URI, walk the class hierarchy, find the right derivation, render. Per-class redeclaration of already-universal bindings is noise and divergence; override only when the publisher has earned the right to differ.
#	New publisher vocabulary packages MAY ship without any derivations declarations. Their instances will still render via the universal defaults inherited from rdfs.Resource. Add publisher-specific derivations later as the rendering needs of those instances become clear.	Removes the adoption tax. A publisher who declares a new class today gets HTML / Markdown / JSON renderings for free; they can add domain-specific transformations as a later iteration without blocking initial usability.

canonical-structural-hash

A representation-independent SHA-256 hash that identifies a Kanonak package by its semantic content, invariant under YAML styling, resource ordering, comment additions, and any round-trip through alternative serializations. The hash is computed over a deterministic JSON canonicalization of the package's parsed object form — subjects[] sorted by canonical URI, statements[] within each subject sorted by predicate canonical URI, lists preserved in source order, scalars rendered in a canonical decimal form.

Has Required Rule#
	Text	Rationale
#	The canonical form MUST emit a package's subjects in alphabetical order of their canonical URI `publisher/package@major.minor.patch/name`. Authored order in the source file is not part of the hash input.	Two YAMLs that author the same logical package in different declaration order must hash identically. Stable sort by canonical URI is the simplest representation- independent ordering that round-trips across YAML / JSON / DynamoDB without per-format knowledge.
#	Within each subject, the canonical form MUST emit statements in alphabetical order of the predicate's full canonical URI. The local-name portion alone is NOT sufficient — predicates sharing a local name across packages would collide.	Statement order in a Kanonak document carries no semantic weight (statements are unordered triples), so the hash must be invariant under reordering. Sorting by full URI preserves identity uniqueness across the package boundary.
#	List-valued statements (`ListStatement.object`) MUST be emitted in source order. Lists are the ONLY position in the canonical form where authored order is part of the hash input.	Lists in Kanonak DO carry semantic order — import lists, ordered timeline entries, imports, derivation cascades, authored sequences. Reordering a list IS a semantic change and must change the hash. Sorting list members would conflate semantically-different packages.
#	Numbers MUST emit in canonical decimal form (no leading zeros except `0` itself, no trailing zeros, no negative zero, no scientific notation for magnitudes in the band [10^-6, 10^21)). Booleans MUST emit as `true` / `false`. Strings MUST emit as UTF-8 with standard JSON escaping. References MUST emit as the full canonical URI form `publisher/package@major.minor.patch/name`.	Scalar representations are where cross-implementation divergence is easiest to slip in (locale-dependent number formatting, alternative JSON escape sequences, abbreviated URI forms). Pinning each scalar to one form makes the hash byte-portable across implementations.
#	YAML `#` comments are not part of the canonical form and therefore not part of the hash. Documentation that needs to survive the hash MUST live in `rdfs.comment` (or another statement on the relevant resource).	Comments are invisible to every consumer downstream of the parser. A hash that changed when a comment was added would be unstable under the existing documentation-in-the-graph convention (which already asks authors to lift in-file comments into `rdfs.comment`). The two conventions reinforce each other: comments don't survive tooling and don't affect identity.

Has Forbidden Rule#
	Text	Rationale
#	Implementations MUST NOT use a raw-byte hash of the source YAML as a substitute for the canonical structural hash in any context that crosses representation boundaries (e.g. `kanonak.lock` integrity for a package reconstructed from DynamoDB, OCI-style content addressing, cross-publisher signature verification).	Raw-byte hashing ties identity to authoring choices that don't affect meaning, which silently breaks identity the moment a package is round-tripped through a different serialization. The 2.2.0 hash exists to decouple identity from byte representation; reverting to byte hashing reintroduces the problem.

Has Valid Example#
	Value	Description
#	A package authored with resources in alphabetical order and a package authored with the same resources in reverse order produce the same `sha256:…` value. Tested in the SDK's canonical-hash test suite ("canonical hash is stable across resource order, statement order, quoting choices, and YAML comments").	Demonstrates the representation-independence guarantee.
#	A package with `items: [alpha, beta, gamma]` and a package with `items: [gamma, beta, alpha]` produce DIFFERENT hashes — list reordering is a semantic change.	Demonstrates that list order is preserved (not normalized).

Has Invalid Example#
	Value	Description
#	`sha256(readFileSync('contacts.kan.yml'))` — the raw byte digest of the source. Diverges immediately when the package is re-emitted by any tool: a YAML reformat, a DynamoDB seed, even reordering imports for readability.	The pre-2.2.0 hashing approach in kanonak.lock. Migrated on the next `kanonak install` after consumers adopt 2.2.0.

protocol-evolution

Substantive design rationale for the kanonak-protocol@2.0.0 major version. Captures the decisions worth preserving in the graph rather than trapping them in commit messages or external changelogs that no Kanonak tool can read.

Has Recommended Rule#
	Text	Rationale
#	For idiomatic guidance about a specific vocabulary, consult the sibling *-conventions package alongside that vocabulary (e.g. kanonak.org/transformations-conventions for the transformations vocabulary, kanonak.org/derivation-conventions for derivation authoring patterns including DRY stylesheet styling and override cascade, kanonak.org/site-conventions for the site vocabulary). The kanonak-protocol package documents only foundational protocol concerns that apply regardless of which vocabulary a document uses.	Vocabulary-specific guidance evolves at the cadence of the vocabulary itself. Bundling it with the protocol in 1.x meant kanonak-protocol had to bump every time a new transformation primitive shipped, blurring the distinction between fundamental protocol changes and vocabulary additions. 2.0.0 separated them: the protocol bumps when the protocol itself learns something new; vocabulary conventions bump independently with their respective vocabularies.
#	For datasets too large to publish as one package, partition along a stable axis into multiple packages (see package-partitioning) rather than expecting any runtime paging, filtering, or query primitive in the protocol. The protocol contract is "every package is addressable as a static file"; partitioning is the pagination.	A runtime query primitive would couple every consumer to publisher-side execution, foreclose static hosting (GitHub Pages, S3, IPFS), and introduce every server-side complexity (rate limiting, federation, indexing, search) the protocol exists to avoid. The partition-as-pagination decision was made deliberately in 2.0.0 and is preserved as a forbidden rule in package-partitioning to keep future extensions from silently re-introducing the coupling.
#	The recurring urge to add a YAML comment to a Kanonak document is a discipline signal — not a comfort to indulge. See documentation-in-the-graph for the ordered response (trust structure → existing annotation → new vocabulary as last resort).	Comments are invisible to every consumer downstream of the parser. The 1.x protocol package itself accumulated many examples whose meaning was carried in YAML comments; the 2.0.0 sweep moved that meaning into example description fields where it lives in the graph. The convention exists because authors (including the protocol's own authors) reach for comments under pressure; recognizing the urge as an ontology-gap signal redirects the energy productively.
#	Removing or renaming a public Convention, Rule, or Example from a published kanonak-protocol version is a breaking change and REQUIRES a major version bump. Earlier versions remain on disk archivally and continue to satisfy consumers pinned to them.	The 1.x → 2.0.0 transition exemplifies this. Six sections were extracted to vocabulary-conventions packages; that removal is breaking even if no programmatic consumer existed (consumers reading the human guidance lose surface area; consumers walking hasConvention via the SDK see fewer entries). Editing an existing version's content in place would have violated both the protocol's own semver convention and the standing operational rule that published packages are immutable. The protocol's authority comes from visibly holding itself to its own rules, including under self-imposed pressure to "just clean it up."