docs

How donto works

These pages document the live system — the real tables, the real pipelines, the real queries — not an idealized design. donto is a contradiction-preserving, evidence-first claim substrate: it stores every claim anyone (human or model) has ever asserted, keeps the disagreements, anchors everything to its source, and works out what is true at query time instead of at write time.

live statements 41.5Mcontexts 65K+freely-minted predicates ~1Mevidence links 2.5M+vectors (predicates + entities + chunks) 1.4M+tables 127

The mental model in four moves

Everything in these docs reduces to four design moves. If you hold these, every table and pipeline below will make sense:

Everything is a claim, and claims are cheap

A claim is subject — predicate → object, where the predicate is invented freely by whoever asserts it. No schema gatekeeps writes. A frontier model can emit hundreds of claims about anything for ~$0.0001 each — donto is built for that firehose, not against it.

Contradictions are kept, not resolved

When two sources disagree, both claims stay live forever (paraconsistency). Deleting a statement is forbidden by invariant I3 — you retract or supersede by closing its transaction-time interval, never by DELETE. Disagreement is data.

Every claim is anchored — or honestly marked

Each claim either links to an evidence span (a character range in a stored, content-addressed source document) or is explicitly flagged as hypothesis. You can always walk from a fact back to the exact sentence that produced it.

Truth is computed at query time

Synonymous predicates fold together by embedding similarity, identity between records is a hypothesis you resolve on demand, and confidence re-ranks as new evidence lands. Nothing is normalized away at write time, so nothing is ever lost.

The life of one fact

Here is the whole system in one trace — what happens when a source document enters donto and a question later finds it:

1. STORE     the source → donto_blob (SHA-256, GCS) → donto_document_revision
2. EXTRACT   an LLM emits free-form claims about it     (donto-extract / donto-agent)
3. CITE      each claim gets an evidence span — or is flagged hypothesis (the citer)
4. ASSERT    claims land as rows in donto_statement     (bitemporal, paraconsistent)
5. EMBED     its predicates / entities / text become 384-d vectors  (the fabric)
6. ALIGN     the daemon proposes predicate folds + identity hypotheses from vectors
7. QUERY     recall = FTS + vectors + alignment closure, fused; evidence attached
8. RE-RANK   contradictions surface side by side; maturity/confidence move over time

Where to go deeper

Steps 4 is the claim model; steps 1–3 are the evidence chain; step 5 is embeddings; step 6 is alignment & identity; steps 7–8 are how it solves things.

What physically exists

One Postgres 16 instance (donto-pg) holds the entire substrate — 127 tables in twelve families: the claim core, contexts, evidence + documents, extraction bookkeeping, embeddings + alignment, identity + coreference, the memory overlay (donto_x_*), frames + events, annotations, policy + access, inference, and ops telemetry. The schema reference documents every one. Around it run a small set of services: the Rust substrate API, the extraction engine, the alignment daemon, the embedding coordinator (plus volunteer machines via donto.org/help), and the consumer APIs (memory, genealogy).

The documentation

The claim model

donto_statement, column by column

The evidence chain

fact → span → revision → blob

Embeddings

what gets embedded, how, and why

Alignment & identity

truth at query time

How it solves things

question → answer, end to end