dontodocs
docs / data model

The claim model

One table holds every fact donto believes: donto_statement. One row = one claim — subject, predicate, object — scoped to a context, bitemporal, paraconsistent. Everything else (evidence, confidence, arguments, identity) hangs off it in sparse satellite tables. This page documents it column by column, from the live database and the migrations that built it.

live statements 41.76Mever retracted 282contexts 68,325predicates 1,009,054of which freely minted 99.8%audit rows 11.3M

donto_statement — the atom

Created in migration 0001_core.sql. The row is deliberately narrow — the substrate never widens the 41.7M-row table; per-claim metadata lives in keyed side tables instead:

create table donto_statement (
    statement_id  uuid primary key default gen_random_uuid(),
    subject       text not null,
    predicate     text not null,
    object_iri    text,            -- reference …
    object_lit    jsonb,           -- … XOR value
    context       text not null references donto_context(iri),
    tx_time       tstzrange not null default tstzrange(now(), null, '[)'),
    valid_time    daterange not null default daterange(null, null, '[)'),
    flags         smallint not null default 0,
    content_hash  bytea generated always as ( /* sha256, see below */ ) stored,
    constraint donto_statement_object_one_of
        check ((object_iri is not null) <> (object_lit is not null))
);
donto_statement — every column
columntypemeaning
statement_iduuid PKThe claim's stable identity. 20+ foreign keys point at it. A correction produces a new id (via donto_correct) — ids are never reused or rewritten.
subjecttextThe subject term — usually a compact IRI (ex:weaviate-client), but permissive contexts accept free text (a real live subject: "Data Types"). No foreign key: entity identity is a hypothesis resolved later, never a write-time constraint.
predicatetextThe relation IRI — freely invented by whoever asserts. Asserting an unknown predicate in a permissive context auto-registers it as implicit in donto_predicate. That's how the registry reached ~1M predicates.
object_iritextThe object as a reference to another entity — these rows are the graph's edges. Indexed by SPO/POS/OSP btrees plus a trigram index for fuzzy lookup.
object_litjsonbThe object as a typed value: {"v": <value>, "dt": "<datatype IRI>", "lang"?} — e.g. {"v":"cipher","dt":"xsd:string"}. A check constraint enforces exactly one of object_iri / object_lit.
contexttext FKThe only foreign key on the row — every claim lives in exactly one primary context (the unit of scoping, provenance and trust mode). Writers that don't care land in donto:anonymous.
tx_timetstzrangeTransaction time: when the substrate believed this. Open upper bound = currently believed. Closing it is the only way a claim "goes away" (invariant I3) — see below.
valid_timedaterangeValid time: when the claim is true in the world, at date granularity. Default unbounded ("no validity bounds claimed"). Memory chunks carry their session date here — what powers time-sliced recall and the benchmark wins on temporal questions.
flagssmallintPacked polarity + maturity bitmask, decoded below.
content_hashbytea generatedA stored SHA-256 over the claim's identity fields — the idempotency fingerprint. Re-ingesting the same source is a no-op. Details below.

The bitemporal pair — two independent time axes

tx_time answers "when did the system believe this?"; valid_time answers "when was this true in the world?". They are independent: a claim asserted today can be about 1873, and retracting it tomorrow closes its tx_time without touching the period it addressed. donto_retract() runs a single UPDATE … SET tx_time = [lower, now()) — never a DELETE. Queries default to current belief (upper(tx_time) IS NULL) but can time-travel with p_as_of_tx: the substrate can reconstruct what it believed at any prior moment. Live: just 282 of 41.7M rows have ever been retracted.

Invariant I3 — no destructive overwrite
DELETE FROM donto_statement is not part of any code path. Corrections chain a replacement row; even regulatory "true deletion" (GDPR, Indigenous cultural protocols) is implemented as blob tombstoning (migration 0142) — the source bytes are destroyed, while the fact that a deletion happened stays queryable forever.

flags — polarity and maturity in one bitmask

From migration 0002_flags.sql:

columntypemeaning
bits 0–1polarity0 asserted · 1 negated · 2 absent · 3 unknown. Polarity is first-class because absence and negation are claims too — "this source denies X" and "this source is silent about X" are different, storable facts.
bits 2–4maturityThe stored evidence-maturity integer (0–7), decoded by the E-ladder below.
bits 5–15reserved

Live distribution (0.5% sample): 55% asserted E0, 36% asserted E1, ~2.4% E2, ~5% E5, with a long tail including genuinely negated and unknown-polarity claims.

Maturity — the E0…E5 evidence ladder

stored int → E-level (note the storage quirk)
columntypemeaning
0 → E0RawSource or extraction artefact exists; not trusted as a claim.
1 → E1CandidateA model, rule, or human proposed a claim.
2 → E2Evidence-supportedThe claim is grounded in a source span / row / timecode.
3 → E3ReviewedA domain reviewer accepted, rejected, or qualified it.
5 → E4CorroboratedCross-source support, or survives contradiction review.
4 → E5CertifiedPasses formal or highly structured validation.
Storage quirk: stored 4 = E5, stored 5 = E4
The ladder was originally L0–L4; renaming to E0–E5 inserted "E4 Corroborated" without migrating rows, so semantic order ≠ storage order. Always translate via donto_maturity_label() / donto_e_level(flags), never compare raw ints.

Governance is baked into the write path: machine confidence is not maturity (invariant I5) — extractors map confidence to maturity but are hard-capped at E2 (machine_maturity().min(2) in every ingest path). E3+ requires a human review action, and every maturity change fires an audit trigger recording who promoted what from which level to which.

content_hash — idempotent ingestion

A stored, generated SHA-256 over subject ␟ predicate ␟ object ␟ context ␟ polarity-bits ␟ valid_time. Two details matter: only the polarity bits of flags participate (re-asserting the same claim at a different maturity dedups onto the existing row instead of forking it), and the unique index is partialWHERE upper(tx_time) IS NULL — so the same content can exist many times historically but only once as current belief. donto_assert does ON CONFLICT … DO NOTHING and returns the existing row's id: re-ingesting a source is free.

Real rows from the live database

-- a graph edge with a freely-minted predicate (BEAM-10M extraction)
subject  ex:weaviate-client   predicate  performsAction   object_iri  ex:vector-encryption
context  ctx:claims/beam/cbcc52f9-…   flags 8 (asserted, E2)

-- a typed literal
subject  ex:cipher   predicate  rdfs:label   object_lit {"v":"cipher","dt":"xsd:string"}

-- a memory chunk carrying its session date as valid_time
subject  ctx:memory/episodic/000351d1-…   predicate  mem:episodic/chunk
object_lit {"v":"[Session date: 2023/05/22] User: I'm looking to improve…","dt":"xsd:string"}
valid_time [2023-05-22,)

-- a retracted row — closed tx_time, nothing deleted (I3)
subject ex:carol  predicate ex:loves  object_iri ex:wrong
tx_time ["2026-04-17 11:09:05.66+00","2026-04-17 11:09:05.66+00")

How a claim is born

The single write surface is donto_assert(…) (SQL) / DontoClient::assert(&StatementInput) (Rust). Inside the function: require a context → auto-create it if new → permissive mode auto-registers unknown predicates / curated mode enforces the registry + minting approval → insert with content-hash conflict handling → audit. Idempotent end to end.

select donto_assert(
    p_subject    => 'ex:weaviate-client',
    p_predicate  => 'performsAction',      -- invented freely; auto-registered
    p_object_iri => 'ex:vector-encryption',
    p_object_lit => null,
    p_context    => 'ctx:claims/beam/cbcc52f9-…',
    p_polarity   => 'asserted',
    p_maturity   => 2,                     -- E2: the machine-extractor cap
    p_valid_lo   => null, p_valid_hi => null,
    p_actor      => 'agent:donto-agent'
);  -- returns uuid; no-op if the same open claim already exists

donto_context — the unit of scope

A context is where a claim lives: a named scope carrying provenance grouping, a trust mode, and a position in a hierarchy. Contexts are cheap — donto_ensure_context auto-creates them on first assert — which is why there are 68,325.

donto_context
columntypemeaning
iritext PKPath-shaped by convention: ctx:claims/beam/<chunk>, ctx:memory/episodic/session/…, ctx:genes/trove-cooktown/reynolds.
kindtextsource · snapshot · hypothesis · user · pipeline · trust · derivation · quarantine · custom · system · candidate. candidate holds material below E0 — things an extractor merely suspects are claims; donto_promote_candidate() lifts one out (history kept).
parenttext FK→selfPrimary parent; the hierarchy drives scope resolution — donto_resolve_scope() expands include/exclude sets recursively over the tree. Extra parents live in donto_context_parent.
modetextpermissive (default; any predicate, auto-registered) vs curated (registered-active predicates only; maturity ≥ E2 needs approved minting). Live: 68,185 permissive / 140 curated.
label / metadatatext / jsonbHuman label and free-form metadata.
created_at / closed_at / created_bytimestamptz / textLifecycle + attribution.

The big residents: ctx:genealogy/research-db holds 21.8M statements; ctx:claims/* (21,170 contexts) holds the extraction firehose, one context per source chunk; ctx:memory/* (20,767) is the agent-memory overlay.

donto_predicate — the open-world registry

Any predicate IRI may appear on a statement; the registry exists to record, describe and align predicates, not to gate writes. 99.8% of its 1,009,054 rows are status='implicit' — minted by models on first use (performsAction, dependsOn, returns-on-true, …, some seconds old at any given moment). The registry also carries alias structure (canonical_of), shape hints (domain, range_iri), algebraic properties (is_symmetric/transitive/functional), and the minting workflow status for curated contexts. Folding the 1M predicates into usable families is the alignment engine's job — at query time, never at write time.

The satellite tables — metadata without widening the atom

sparse per-statement overlays (all FK → statement_id)
columntypemeaning
donto_stmt_extraction_level~22.5M rowsHow the claim was obtained: quoted, table_read, example_observed, source_generalization, cross_source_inference, model_hypothesis, human_hypothesis, manual_entry, registry_import, adapter_import. Governs maturity caps.
donto_stmt_modality~12.6M rowsEpistemic modality: descriptive, prescriptive, reconstructed, inferred, elicited, corpus_observed, oral_history, legal_holding, model_output, …
donto_evidence_link~2.57M rowsThe anchor: statement → document / revision / span. The whole chain is documented in the evidence chain.
donto_stmt_confidence587 rowsMachine/human/calibrated confidence ∈ [0,1] with source + lens — an overlay, never a column on the atom (invariant I5).
donto_stmt_hypothesis_only1,286 rowsThe I1 marker: explicitly-hypothesis claims, with rationale. Blocks maturity promotion to E2+.
donto_argument2,424 rowsTyped claim-to-claim edges: supports, rebuts, undercuts, endorses, supersedes, qualifies, potentially_same, same_referent, same_event — the contradiction machinery.
donto_stmt_lineage972 rowsDerived-claim provenance (which statements a derived claim came from).
donto_audit~11.3M rowsThe append-only action trail: (at, actor, action, statement_id, detail) for assert / retract / correct / mature.

The invariants this model encodes

columntypemeaning
I1evidence or honestyNo claim without evidence or explicit hypothesis status — the hypothesis_only overlay caps maturity below E2.
I3no destructive overwriteRetract/supersede by closing tx_time; corrections chain replacements; deletes don't exist.
I4contradictions preservedIncompatible claims are legal coexisting rows; conflict produces argument edges and review work-items, never failed writes.
I5confidence ≠ maturityMachine scores live in an overlay; extractor auto-promotion is capped at E2 in every ingest path.