docs / data model

The claim model

One table holds every fact donto believes: donto_statement. One row = one claim — subject, predicate, object — scoped to a context, bitemporal, paraconsistent. Everything else (evidence, confidence, arguments, identity) hangs off it in sparse satellite tables. This page documents it column by column, from the live database and the migrations that built it.

live statements 41.76Mever retracted 282contexts 68,325predicates 1,009,054of which freely minted 99.8%audit rows 11.3M

donto_statement — the atom

Created in migration 0001_core.sql. The row is deliberately narrow — the substrate never widens the 41.7M-row table; per-claim metadata lives in keyed side tables instead:

create table donto_statement (
    statement_id  uuid primary key default gen_random_uuid(),
    subject       text not null,
    predicate     text not null,
    object_iri    text,            -- reference …
    object_lit    jsonb,           -- … XOR value
    context       text not null references donto_context(iri),
    tx_time       tstzrange not null default tstzrange(now(), null, '[)'),
    valid_time    daterange not null default daterange(null, null, '[)'),
    flags         smallint not null default 0,
    content_hash  bytea generated always as ( /* sha256, see below */ ) stored,
    constraint donto_statement_object_one_of
        check ((object_iri is not null) <> (object_lit is not null))
);

donto_statement — every column

column	type	meaning
statement_id	uuid PK	The claim's stable identity. 20+ foreign keys point at it. A correction produces a new id (via `donto_correct`) — ids are never reused or rewritten.
subject	text	The subject term — usually a compact IRI (`ex:weaviate-client`), but permissive contexts accept free text (a real live subject: `"Data Types"`). No foreign key: entity identity is a hypothesis resolved later, never a write-time constraint.
predicate	text	The relation IRI — freely invented by whoever asserts. Asserting an unknown predicate in a permissive context auto-registers it as `implicit` in `donto_predicate`. That's how the registry reached ~1M predicates.
object_iri	text	The object as a reference to another entity — these rows are the graph's edges. Indexed by SPO/POS/OSP btrees plus a trigram index for fuzzy lookup.
object_lit	jsonb	The object as a typed value: `{"v": <value>, "dt": "<datatype IRI>", "lang"?}` — e.g. `{"v":"cipher","dt":"xsd:string"}`. A check constraint enforces exactly one of object_iri / object_lit.
context	text FK	The only foreign key on the row — every claim lives in exactly one primary context (the unit of scoping, provenance and trust mode). Writers that don't care land in `donto:anonymous`.
tx_time	tstzrange	Transaction time: when the substrate believed this. Open upper bound = currently believed. Closing it is the only way a claim "goes away" (invariant I3) — see below.
valid_time	daterange	Valid time: when the claim is true in the world, at date granularity. Default unbounded ("no validity bounds claimed"). Memory chunks carry their session date here — what powers time-sliced recall and the benchmark wins on temporal questions.
flags	smallint	Packed polarity + maturity bitmask, decoded below.
content_hash	bytea generated	A stored SHA-256 over the claim's identity fields — the idempotency fingerprint. Re-ingesting the same source is a no-op. Details below.

The bitemporal pair — two independent time axes

tx_time answers "when did the system believe this?"; valid_time answers "when was this true in the world?". They are independent: a claim asserted today can be about 1873, and retracting it tomorrow closes its tx_time without touching the period it addressed. donto_retract() runs a single UPDATE … SET tx_time = [lower, now()) — never a DELETE. Queries default to current belief (upper(tx_time) IS NULL) but can time-travel with p_as_of_tx: the substrate can reconstruct what it believed at any prior moment. Live: just 282 of 41.7M rows have ever been retracted.

Invariant I3 — no destructive overwrite

DELETE FROM donto_statement is not part of any code path. Corrections chain a replacement row; even regulatory "true deletion" (GDPR, Indigenous cultural protocols) is implemented as blob tombstoning (migration 0142) — the source bytes are destroyed, while the fact that a deletion happened stays queryable forever.

flags — polarity and maturity in one bitmask

From migration 0002_flags.sql:

column	type	meaning
bits 0–1	polarity	`0 asserted` · `1 negated` · `2 absent` · `3 unknown`. Polarity is first-class because absence and negation are claims too — "this source denies X" and "this source is silent about X" are different, storable facts.
bits 2–4	maturity	The stored evidence-maturity integer (0–7), decoded by the E-ladder below.
bits 5–15	reserved	—

Live distribution (0.5% sample): 55% asserted E0, 36% asserted E1, ~2.4% E2, ~5% E5, with a long tail including genuinely negated and unknown-polarity claims.

Maturity — the E0…E5 evidence ladder

stored int → E-level (note the storage quirk)

column	type	meaning
0 → E0	Raw	Source or extraction artefact exists; not trusted as a claim.
1 → E1	Candidate	A model, rule, or human proposed a claim.
2 → E2	Evidence-supported	The claim is grounded in a source span / row / timecode.
3 → E3	Reviewed	A domain reviewer accepted, rejected, or qualified it.
5 → E4	Corroborated	Cross-source support, or survives contradiction review.
4 → E5	Certified	Passes formal or highly structured validation.

Storage quirk: stored 4 = E5, stored 5 = E4

The ladder was originally L0–L4; renaming to E0–E5 inserted "E4 Corroborated" without migrating rows, so semantic order ≠ storage order. Always translate via donto_maturity_label() / donto_e_level(flags), never compare raw ints.

Governance is baked into the write path: machine confidence is not maturity (invariant I5) — extractors map confidence to maturity but are hard-capped at E2 (machine_maturity().min(2) in every ingest path). E3+ requires a human review action, and every maturity change fires an audit trigger recording who promoted what from which level to which.

content_hash — idempotent ingestion

A stored, generated SHA-256 over subject ␟ predicate ␟ object ␟ context ␟ polarity-bits ␟ valid_time. Two details matter: only the polarity bits of flags participate (re-asserting the same claim at a different maturity dedups onto the existing row instead of forking it), and the unique index is partial — WHERE upper(tx_time) IS NULL — so the same content can exist many times historically but only once as current belief. donto_assert does ON CONFLICT … DO NOTHING and returns the existing row's id: re-ingesting a source is free.

Real rows from the live database

-- a graph edge with a freely-minted predicate (BEAM-10M extraction)
subject  ex:weaviate-client   predicate  performsAction   object_iri  ex:vector-encryption
context  ctx:claims/beam/cbcc52f9-…   flags 8 (asserted, E2)

-- a typed literal
subject  ex:cipher   predicate  rdfs:label   object_lit {"v":"cipher","dt":"xsd:string"}

-- a memory chunk carrying its session date as valid_time
subject  ctx:memory/episodic/000351d1-…   predicate  mem:episodic/chunk
object_lit {"v":"[Session date: 2023/05/22] User: I'm looking to improve…","dt":"xsd:string"}
valid_time [2023-05-22,)

-- a retracted row — closed tx_time, nothing deleted (I3)
subject ex:carol  predicate ex:loves  object_iri ex:wrong
tx_time ["2026-04-17 11:09:05.66+00","2026-04-17 11:09:05.66+00")

How a claim is born

The single write surface is donto_assert(…) (SQL) / DontoClient::assert(&StatementInput) (Rust). Inside the function: require a context → auto-create it if new → permissive mode auto-registers unknown predicates / curated mode enforces the registry + minting approval → insert with content-hash conflict handling → audit. Idempotent end to end.

select donto_assert(
    p_subject    => 'ex:weaviate-client',
    p_predicate  => 'performsAction',      -- invented freely; auto-registered
    p_object_iri => 'ex:vector-encryption',
    p_object_lit => null,
    p_context    => 'ctx:claims/beam/cbcc52f9-…',
    p_polarity   => 'asserted',
    p_maturity   => 2,                     -- E2: the machine-extractor cap
    p_valid_lo   => null, p_valid_hi => null,
    p_actor      => 'agent:donto-agent'
);  -- returns uuid; no-op if the same open claim already exists

donto_context — the unit of scope

A context is where a claim lives: a named scope carrying provenance grouping, a trust mode, and a position in a hierarchy. Contexts are cheap — donto_ensure_context auto-creates them on first assert — which is why there are 68,325.

donto_context

column	type	meaning
iri	text PK	Path-shaped by convention: `ctx:claims/beam/<chunk>`, `ctx:memory/episodic/session/…`, `ctx:genes/trove-cooktown/reynolds`.
kind	text	`source · snapshot · hypothesis · user · pipeline · trust · derivation · quarantine · custom · system · candidate`. `candidate` holds material below E0 — things an extractor merely suspects are claims; `donto_promote_candidate()` lifts one out (history kept).
parent	text FK→self	Primary parent; the hierarchy drives scope resolution — `donto_resolve_scope()` expands include/exclude sets recursively over the tree. Extra parents live in `donto_context_parent`.
mode	text	permissive (default; any predicate, auto-registered) vs curated (registered-active predicates only; maturity ≥ E2 needs approved minting). Live: 68,185 permissive / 140 curated.
label / metadata	text / jsonb	Human label and free-form metadata.
created_at / closed_at / created_by	timestamptz / text	Lifecycle + attribution.

The big residents: ctx:genealogy/research-db holds 21.8M statements; ctx:claims/* (21,170 contexts) holds the extraction firehose, one context per source chunk; ctx:memory/* (20,767) is the agent-memory overlay.

donto_predicate — the open-world registry

Any predicate IRI may appear on a statement; the registry exists to record, describe and align predicates, not to gate writes. 99.8% of its 1,009,054 rows are status='implicit' — minted by models on first use (performsAction, dependsOn, returns-on-true, …, some seconds old at any given moment). The registry also carries alias structure (canonical_of), shape hints (domain, range_iri), algebraic properties (is_symmetric/transitive/functional), and the minting workflow status for curated contexts. Folding the 1M predicates into usable families is the alignment engine's job — at query time, never at write time.

The satellite tables — metadata without widening the atom

sparse per-statement overlays (all FK → statement_id)

column	type	meaning
donto_stmt_extraction_level	~22.5M rows	How the claim was obtained: `quoted, table_read, example_observed, source_generalization, cross_source_inference, model_hypothesis, human_hypothesis, manual_entry, registry_import, adapter_import`. Governs maturity caps.
donto_stmt_modality	~12.6M rows	Epistemic modality: `descriptive, prescriptive, reconstructed, inferred, elicited, corpus_observed, oral_history, legal_holding, model_output, …`
donto_evidence_link	~2.57M rows	The anchor: statement → document / revision / span. The whole chain is documented in the evidence chain.
donto_stmt_confidence	587 rows	Machine/human/calibrated confidence ∈ [0,1] with source + lens — an overlay, never a column on the atom (invariant I5).
donto_stmt_hypothesis_only	1,286 rows	The I1 marker: explicitly-hypothesis claims, with rationale. Blocks maturity promotion to E2+.
donto_argument	2,424 rows	Typed claim-to-claim edges: `supports, rebuts, undercuts, endorses, supersedes, qualifies, potentially_same, same_referent, same_event` — the contradiction machinery.
donto_stmt_lineage	972 rows	Derived-claim provenance (which statements a derived claim came from).
donto_audit	~11.3M rows	The append-only action trail: `(at, actor, action, statement_id, detail)` for assert / retract / correct / mature.

The invariants this model encodes

column	type	meaning
I1	evidence or honesty	No claim without evidence or explicit hypothesis status — the hypothesis_only overlay caps maturity below E2.
I3	no destructive overwrite	Retract/supersede by closing tx_time; corrections chain replacements; deletes don't exist.
I4	contradictions preserved	Incompatible claims are legal coexisting rows; conflict produces argument edges and review work-items, never failed writes.
I5	confidence ≠ maturity	Machine scores live in an overlay; extractor auto-promotion is capped at E2 in every ingest path.

← previous

Overview

The evidence chain