The claim model
One table holds every fact donto believes: donto_statement. One row = one claim — subject, predicate, object — scoped to a context, bitemporal, paraconsistent. Everything else (evidence, confidence, arguments, identity) hangs off it in sparse satellite tables. This page documents it column by column, from the live database and the migrations that built it.
donto_statement — the atom
Created in migration 0001_core.sql. The row is deliberately narrow — the substrate never widens the 41.7M-row table; per-claim metadata lives in keyed side tables instead:
create table donto_statement (
statement_id uuid primary key default gen_random_uuid(),
subject text not null,
predicate text not null,
object_iri text, -- reference …
object_lit jsonb, -- … XOR value
context text not null references donto_context(iri),
tx_time tstzrange not null default tstzrange(now(), null, '[)'),
valid_time daterange not null default daterange(null, null, '[)'),
flags smallint not null default 0,
content_hash bytea generated always as ( /* sha256, see below */ ) stored,
constraint donto_statement_object_one_of
check ((object_iri is not null) <> (object_lit is not null))
);| column | type | meaning |
|---|---|---|
| statement_id | uuid PK | The claim's stable identity. 20+ foreign keys point at it. A correction produces a new id (via donto_correct) — ids are never reused or rewritten. |
| subject | text | The subject term — usually a compact IRI (ex:weaviate-client), but permissive contexts accept free text (a real live subject: "Data Types"). No foreign key: entity identity is a hypothesis resolved later, never a write-time constraint. |
| predicate | text | The relation IRI — freely invented by whoever asserts. Asserting an unknown predicate in a permissive context auto-registers it as implicit in donto_predicate. That's how the registry reached ~1M predicates. |
| object_iri | text | The object as a reference to another entity — these rows are the graph's edges. Indexed by SPO/POS/OSP btrees plus a trigram index for fuzzy lookup. |
| object_lit | jsonb | The object as a typed value: {"v": <value>, "dt": "<datatype IRI>", "lang"?} — e.g. {"v":"cipher","dt":"xsd:string"}. A check constraint enforces exactly one of object_iri / object_lit. |
| context | text FK | The only foreign key on the row — every claim lives in exactly one primary context (the unit of scoping, provenance and trust mode). Writers that don't care land in donto:anonymous. |
| tx_time | tstzrange | Transaction time: when the substrate believed this. Open upper bound = currently believed. Closing it is the only way a claim "goes away" (invariant I3) — see below. |
| valid_time | daterange | Valid time: when the claim is true in the world, at date granularity. Default unbounded ("no validity bounds claimed"). Memory chunks carry their session date here — what powers time-sliced recall and the benchmark wins on temporal questions. |
| flags | smallint | Packed polarity + maturity bitmask, decoded below. |
| content_hash | bytea generated | A stored SHA-256 over the claim's identity fields — the idempotency fingerprint. Re-ingesting the same source is a no-op. Details below. |
The bitemporal pair — two independent time axes
tx_time answers "when did the system believe this?"; valid_time answers "when was this true in the world?". They are independent: a claim asserted today can be about 1873, and retracting it tomorrow closes its tx_time without touching the period it addressed. donto_retract() runs a single UPDATE … SET tx_time = [lower, now()) — never a DELETE. Queries default to current belief (upper(tx_time) IS NULL) but can time-travel with p_as_of_tx: the substrate can reconstruct what it believed at any prior moment. Live: just 282 of 41.7M rows have ever been retracted.
DELETE FROM donto_statement is not part of any code path. Corrections chain a replacement row; even regulatory "true deletion" (GDPR, Indigenous cultural protocols) is implemented as blob tombstoning (migration 0142) — the source bytes are destroyed, while the fact that a deletion happened stays queryable forever.flags — polarity and maturity in one bitmask
From migration 0002_flags.sql:
| column | type | meaning |
|---|---|---|
| bits 0–1 | polarity | 0 asserted · 1 negated · 2 absent · 3 unknown. Polarity is first-class because absence and negation are claims too — "this source denies X" and "this source is silent about X" are different, storable facts. |
| bits 2–4 | maturity | The stored evidence-maturity integer (0–7), decoded by the E-ladder below. |
| bits 5–15 | reserved | — |
Live distribution (0.5% sample): 55% asserted E0, 36% asserted E1, ~2.4% E2, ~5% E5, with a long tail including genuinely negated and unknown-polarity claims.
Maturity — the E0…E5 evidence ladder
| column | type | meaning |
|---|---|---|
| 0 → E0 | Raw | Source or extraction artefact exists; not trusted as a claim. |
| 1 → E1 | Candidate | A model, rule, or human proposed a claim. |
| 2 → E2 | Evidence-supported | The claim is grounded in a source span / row / timecode. |
| 3 → E3 | Reviewed | A domain reviewer accepted, rejected, or qualified it. |
| 5 → E4 | Corroborated | Cross-source support, or survives contradiction review. |
| 4 → E5 | Certified | Passes formal or highly structured validation. |
donto_maturity_label() / donto_e_level(flags), never compare raw ints.Governance is baked into the write path: machine confidence is not maturity (invariant I5) — extractors map confidence to maturity but are hard-capped at E2 (machine_maturity().min(2) in every ingest path). E3+ requires a human review action, and every maturity change fires an audit trigger recording who promoted what from which level to which.
content_hash — idempotent ingestion
A stored, generated SHA-256 over subject ␟ predicate ␟ object ␟ context ␟ polarity-bits ␟ valid_time. Two details matter: only the polarity bits of flags participate (re-asserting the same claim at a different maturity dedups onto the existing row instead of forking it), and the unique index is partial — WHERE upper(tx_time) IS NULL — so the same content can exist many times historically but only once as current belief. donto_assert does ON CONFLICT … DO NOTHING and returns the existing row's id: re-ingesting a source is free.
Real rows from the live database
-- a graph edge with a freely-minted predicate (BEAM-10M extraction)
subject ex:weaviate-client predicate performsAction object_iri ex:vector-encryption
context ctx:claims/beam/cbcc52f9-… flags 8 (asserted, E2)
-- a typed literal
subject ex:cipher predicate rdfs:label object_lit {"v":"cipher","dt":"xsd:string"}
-- a memory chunk carrying its session date as valid_time
subject ctx:memory/episodic/000351d1-… predicate mem:episodic/chunk
object_lit {"v":"[Session date: 2023/05/22] User: I'm looking to improve…","dt":"xsd:string"}
valid_time [2023-05-22,)
-- a retracted row — closed tx_time, nothing deleted (I3)
subject ex:carol predicate ex:loves object_iri ex:wrong
tx_time ["2026-04-17 11:09:05.66+00","2026-04-17 11:09:05.66+00")How a claim is born
The single write surface is donto_assert(…) (SQL) / DontoClient::assert(&StatementInput) (Rust). Inside the function: require a context → auto-create it if new → permissive mode auto-registers unknown predicates / curated mode enforces the registry + minting approval → insert with content-hash conflict handling → audit. Idempotent end to end.
select donto_assert(
p_subject => 'ex:weaviate-client',
p_predicate => 'performsAction', -- invented freely; auto-registered
p_object_iri => 'ex:vector-encryption',
p_object_lit => null,
p_context => 'ctx:claims/beam/cbcc52f9-…',
p_polarity => 'asserted',
p_maturity => 2, -- E2: the machine-extractor cap
p_valid_lo => null, p_valid_hi => null,
p_actor => 'agent:donto-agent'
); -- returns uuid; no-op if the same open claim already existsdonto_context — the unit of scope
A context is where a claim lives: a named scope carrying provenance grouping, a trust mode, and a position in a hierarchy. Contexts are cheap — donto_ensure_context auto-creates them on first assert — which is why there are 68,325.
| column | type | meaning |
|---|---|---|
| iri | text PK | Path-shaped by convention: ctx:claims/beam/<chunk>, ctx:memory/episodic/session/…, ctx:genes/trove-cooktown/reynolds. |
| kind | text | source · snapshot · hypothesis · user · pipeline · trust · derivation · quarantine · custom · system · candidate. candidate holds material below E0 — things an extractor merely suspects are claims; donto_promote_candidate() lifts one out (history kept). |
| parent | text FK→self | Primary parent; the hierarchy drives scope resolution — donto_resolve_scope() expands include/exclude sets recursively over the tree. Extra parents live in donto_context_parent. |
| mode | text | permissive (default; any predicate, auto-registered) vs curated (registered-active predicates only; maturity ≥ E2 needs approved minting). Live: 68,185 permissive / 140 curated. |
| label / metadata | text / jsonb | Human label and free-form metadata. |
| created_at / closed_at / created_by | timestamptz / text | Lifecycle + attribution. |
The big residents: ctx:genealogy/research-db holds 21.8M statements; ctx:claims/* (21,170 contexts) holds the extraction firehose, one context per source chunk; ctx:memory/* (20,767) is the agent-memory overlay.
donto_predicate — the open-world registry
Any predicate IRI may appear on a statement; the registry exists to record, describe and align predicates, not to gate writes. 99.8% of its 1,009,054 rows are status='implicit' — minted by models on first use (performsAction, dependsOn, returns-on-true, …, some seconds old at any given moment). The registry also carries alias structure (canonical_of), shape hints (domain, range_iri), algebraic properties (is_symmetric/transitive/functional), and the minting workflow status for curated contexts. Folding the 1M predicates into usable families is the alignment engine's job — at query time, never at write time.
The satellite tables — metadata without widening the atom
| column | type | meaning |
|---|---|---|
| donto_stmt_extraction_level | ~22.5M rows | How the claim was obtained: quoted, table_read, example_observed, source_generalization, cross_source_inference, model_hypothesis, human_hypothesis, manual_entry, registry_import, adapter_import. Governs maturity caps. |
| donto_stmt_modality | ~12.6M rows | Epistemic modality: descriptive, prescriptive, reconstructed, inferred, elicited, corpus_observed, oral_history, legal_holding, model_output, … |
| donto_evidence_link | ~2.57M rows | The anchor: statement → document / revision / span. The whole chain is documented in the evidence chain. |
| donto_stmt_confidence | 587 rows | Machine/human/calibrated confidence ∈ [0,1] with source + lens — an overlay, never a column on the atom (invariant I5). |
| donto_stmt_hypothesis_only | 1,286 rows | The I1 marker: explicitly-hypothesis claims, with rationale. Blocks maturity promotion to E2+. |
| donto_argument | 2,424 rows | Typed claim-to-claim edges: supports, rebuts, undercuts, endorses, supersedes, qualifies, potentially_same, same_referent, same_event — the contradiction machinery. |
| donto_stmt_lineage | 972 rows | Derived-claim provenance (which statements a derived claim came from). |
| donto_audit | ~11.3M rows | The append-only action trail: (at, actor, action, statement_id, detail) for assert / retract / correct / mature. |
The invariants this model encodes
| column | type | meaning |
|---|---|---|
| I1 | evidence or honesty | No claim without evidence or explicit hypothesis status — the hypothesis_only overlay caps maturity below E2. |
| I3 | no destructive overwrite | Retract/supersede by closing tx_time; corrections chain replacements; deletes don't exist. |
| I4 | contradictions preserved | Incompatible claims are legal coexisting rows; conflict produces argument edges and review work-items, never failed writes. |
| I5 | confidence ≠ maturity | Machine scores live in an overlay; extractor auto-promotion is capped at E2 in every ingest path. |