A claim substrate for the age of generative abundance

Let models say everything.
Let reality decide what survives.

donto is a bitemporal, paraconsistent, evidence-first claim substrate, written in Rust on Postgres. Modern models emit an unbounded firehose of typed claims about anything — inventing the predicates as they go, for a fraction of a cent each. donto holds that firehose without collapsing it: contradictions are legal state, every claim is anchored to its source, and typing, alignment and identity are deferred to query time. A knowledge base that grows in all directions and prunes by reality.

Read the thesis Watch it live →

The living substrate

live

41.6M+

statements

1.0M+

freely-minted predicates

66,777

contexts

2.5M

evidence links

6.4M

contested claim-pairs — held, not deleted

341

consumer namespaces

2,433

argument edges

Predicates the models invented — highest volume right now

ex:knownAs1.1Mmem:episodic/chunk485Kex:knownAtLocation331Kex:normalized_claims/text_span278Kex:datePrecision277Kex:whenText245Kex:meta/description164Kex:locatedIn156K

The thesis

Generative abundance changes the shape of the problem

For fifty years, knowledge graphs were scarce: every triple was expensive to author, so the schema came first and the facts came slowly. Large models invert that economy — GPTKB pulled 105M typed triples out of one mid-tier model at roughly $0.00009 per claim; AutoSchemaKG built a 900M-node graph with no predefined schema at all; and the cost keeps falling roughly an order of magnitude a year. The bottleneck is no longer extraction. It's trust.

Emit freely

No fixed schema, no pre-typed predicates. Models invent the predicate they need (~1M minted so far) and assert in every direction — that's the signature of abundance, not a bug to suppress.

Defer to query time

Typing, alignment, identity resolution and joining are not write-time gates. They're query-time judgments — composed on demand, reversible, never destroying the raw claim.

Prune by reality

The substrate stores contradiction paraconsistently and lets a claim's evidence, corroboration and lifecycle decide its standing. Reality is the verifier — not a curator.

The moat

The claim lifecycle — not the schema — is the product

Anyone can extract facts. The durable advantage is what happens after: how a claim earns or loses standing over time, with its evidence intact and its contradictions preserved. donto is built around an eight-step lifecycle.

step 1
Ingest
Pull any source — a paper, a deed, a resume, a conversation. Register it as an immutable, content-addressed document the substrate can always retrieve.
step 2
Emit free
The model emits unbounded claims, inventing predicates and axes as it goes. The only write-time invariant: an evidence anchor, or an explicit hypothesis flag.
step 3
Hold incompatible
Contradictory claims are stored side-by-side as legal bitemporal state. A conflict is data, not a failed write.
step 4
Hypothesize
Typed relationship hypotheses are proposed where lenses intersect — connections no schema author would have pre-typed.
step 5
Attach evidence
Supports, rebuts, undercuts, qualifies — argument edges carry evidence and counter-evidence with full provenance.
step 6
Rank
Claims are scored by value — information gain, novelty, downstream task-lift — not by accuracy alone.
step 7
Re-rank in time
New evidence re-scores old hypotheses. The bitemporal record means standing compounds; nothing is frozen at write time.
step 8
Explain
Answers are explained only from evidence already attached — faithful by construction, never narrative first.

Non-negotiable

Ten invariants, enforced in code

These aren't aspirations in a slide deck. Each invariant is enforced at the schema and API layer and guarded by 80 dedicated invariant test suites that run in CI — the substrate refuses to be a normal database.

No claim without evidence

or an explicit hypothesis flag — the single write-time gate.

No restricted source without policy

unknown policy defaults to restricted-pending-review, never public.

No destructive overwrite

every correction, retraction and merge is append-only; any past state is reconstructable.

Contradictions are preserved

incompatible claims produce argument edges and review obligations, not failed writes.

Machine confidence is not maturity

a model's self-confidence can never promote a claim; standing is earned by evidence and review.

Governance propagates to derivatives

derived claims inherit the most restrictive policy of their sources.

Schema mappings are typed and scoped

no default 'sameness' — every alignment carries a relation type, scope and safety flags.

Identity is a hypothesis, not a foreign key

same-person, same-place, same-concept are contested claims you query under a lens.

Adapters must report information loss

any import or export that can't carry contradiction, time or governance says so, structurally.

I10

A release is a reproducible view

a named query plus policy, source and checksum manifests — never an ad-hoc export.

Running today

The machinery is live

Every piece of the thesis has a running counterpart on the substrate — watchable in real time at scanner.donto.org.

The alignment engine

“Defer to query time” is shipped, not a slide. An embedding fabric — 919K predicate vectors and 338K entity fingerprints (bge-small, HNSW-indexed) — plus a continuous alignment daemon proposes, adjudicates and materializes a 1M-row predicate closure. killedBy meets murderedBy at cosine 0.95 without anyone maintaining a synonym table.

pgvector1.26M vectors1M closure rowsquery-time folding

The always-on citer

Extraction (what was claimed) and anchoring (where in the source) are separate stages. Every extracted fact is post-processed by a semantic citer that attaches the exact evidence span — or honestly flags the claim as interpretation, never a bogus span. It separates what a source stated from what a model inferred, and doubles as a hallucination filter.

stated vs interpretedzero bogus spans

The gleaning loop

Models stop early by choice, not capacity. The extraction harness re-prompts the same source until saturation — one article went from 511 facts to 3,227 — and stops only after consecutive dry passes, because a count floor makes models pad with garbage. Saturation decides done; meaningful coverage is the goal.

511 → 3,227 factssaturation-stopped

One engine, many lanes

donto-extract is a single extraction engine with eight swappable model lanes — a declarative registry where each lane's caps and failure signatures are data, not if/else. Capped lanes rotate out automatically, pool-aware. The substrate doesn't care which model emitted a claim; the citer and the lifecycle hold every lane to the same evidence standard.

donto-extract8 lanesauto-failoverinjection-hardened

DontoQL

A query language with dimensions SQL doesn't have

Querying contested knowledge needs more than triples. DontoQL — implemented, with a SPARQL 1.1 subset compiling to the same engine — makes the substrate's dimensions first-class:

Predicate expansion — PREDICATES EXPAND folds the learned alignment closure, so a question asked in your vocabulary finds claims minted in any other.
Identity lenses — query under strict, cluster or transitive same-as identity; the merge is a per-query choice, never a destructive write.
Bitemporal travel — AS_OF and TRANSACTION_TIME AS_OF reconstruct what was true, and what the system believed, at any moment.
Policy-aware — POLICY ALLOWS filters by governance before content is ever touched.
Contradiction-ordered — sort by contradiction pressure to surface exactly where sources disagree.

a real query, run against the live substrate

MATCH ?person ex:diedAt ?place
SCOPE include ctx:genealogy
PREDICATES EXPAND
IDENTITY_LENS clusters
POLICY ALLOWS read_content
ORDER_BY contradiction_pressure DESC
LIMIT 25

One query: scoped to a context forest, predicate-expanded through the alignment closure, identity resolved under a chosen lens, policy-filtered, and ordered by where the evidence fights itself.

Measured, not claimed

Benchmarked on LongMemEval — reported honestly

donto-memory — the agent-memory consumer built on the substrate — was run through LongMemEval(ICLR 2025), the standard long-term-memory benchmark, under audited no-leakage conditions. The honest headline: where a whole history fits in a frontier model's context, raw accuracy ties a full-context reader — and the substrate earns its keep on retrieval quality, token cost, knowledge-update and abstention, the things that survive when histories outgrow any context window.

0.98

retrieval hit@10 on LongMemEval_s — up from 0.85 lexical-only; the hybrid vector arm is load-bearing

0.933

answer accuracy on a stratified LongMemEval_s sample — within a point of the 0.946 oracle ceiling

~2×

lower token cost than handing the reader the full history

1.0

abstention on unanswerable questions — evidence-first means knowing when not to answer

Full methodology, baselines and the uncomfortable parts in the LongMemEval study.

What it makes possible

Relationships no one ever thought to type

Point a model at the same entity through ten different lenses — philosophical, linguistic, temporal, causal, social, material — and it will emit properties and edges a hand-built schema would never have anticipated. You don't pre-type them. You let them accumulate, and resolve the joins when a question needs them.

Philosophical

essence, identity-over-time, mereology

Linguistic

sense, register, etymology, translation drift

Temporal

validity intervals, succession, anachronism

Causal

enables, prevents, is-evidence-for

Social

witness, sponsor, neighbor, FAN networks

Material

composition, provenance, location-over-time

Built on donto

One substrate, many consumers

donto stays infrastructure. Everything else is an example of binding a domain to it — proof that the same substrate serves wildly different consumers.

memory.donto.org

Persistent memory for agents

Every message becomes anchored, recallable claims. Hybrid lexical + vector recall, bitemporal knowledge-update, evidence-first abstention — benchmarked on LongMemEval. Speaks MCP, so any agent can plug in.

/memorize/recall/searchMCPopen →

genealogy.donto.org

Evidence-first family research

The hardest test of a claim substrate: contradictory sources, century-old records, identity that is itself a hypothesis. Every fact retains its source snippet and the full resource behind it.

sourcesclaimsresourcesopen →

scanner.donto.org

The substrate, watched live

A real-time monitor of the substrate itself: contexts as sectors, claims arriving as packets, contradictions surfacing as they're detected. The firehose, visible.

live claimscontradictionssubsystemsopen →

For agents

Give any agent a memory that cites its sources

donto-memory ships an MCP server — three tools that turn any MCP-capable agent into one whose memory is anchored, recallable and substrate-wide. Install instructions, agent docs and the manifest live at mcp.donto.org.

donto_recalldonto_searchdonto_memorize

Recall — holder-scoped memory with hybrid lexical + vector retrieval
Search — full-text over the entire substrate, all consumers
Memorize — text in, anchored claims out, evidence spans attached

or speak HTTP directly

# remember something — bitemporal from day one
curl -X POST https://memories.apexpots.com/memorize \
  -H 'content-type: application/json' \
  -d '{"holder": "agent:you",
       "text": "Ada moved the API to Rust in March.",
       "valid_from": "2026-03-01"}'

# recall it — hybrid lexical + vector, holder-scoped
curl -X POST https://memories.apexpots.com/recall \
  -H 'content-type: application/json' \
  -d '{"holder": "agent:you", "query": "what runs the API?"}'

Under the hood

Built like infrastructure, because it is

A Rust workspace over Postgres: bitemporal ranges and content-hash idempotency at the schema layer, content-addressed blobs (SHA-256, GCS-backed) behind every document, a Trust Kernel for policy and attestations, Lean-backed shape validation, and importers for five linguistic corpus formats that report exactly what they couldn't carry.

Rust crates in the workspace

substrate API routes

127

SQL migrations

invariant test suites

Lean 4 modules

native object families

Reading

More at the research index — bakeoffs, deep-extraction studies, the substrate PRD.

Bind your domain to the substrate.

One read-only discovery surface is all it takes to bind a new consumer. No SQL, no schema migration — just claims, evidence, and the lifecycle.

Read the docs Watch the substrate live

Let models say everything.
Let reality decide what survives.

Generative abundance changes the shape of the problem

Emit freely

Defer to query time

Prune by reality

The claim lifecycle — not the schema — is the product

Ingest

Emit free

Hold incompatible

Hypothesize

Attach evidence

Rank

Re-rank in time

Explain

Ten invariants, enforced in code

The machinery is live

The alignment engine

The always-on citer

The gleaning loop

One engine, many lanes

A query language with dimensions SQL doesn't have

Benchmarked on LongMemEval — reported honestly

Relationships no one ever thought to type

One substrate, many consumers

Persistent memory for agents

Evidence-first family research

The substrate, watched live

Give any agent a memory that cites its sources

Built like infrastructure, because it is

The research behind donto

The Abundance Substrate

LongMemEval Study

Extraction Engineering

Bind your domain to the substrate.

Let models say everything.Let reality decide what survives.

Generative abundance changes the shape of the problem

Emit freely

Defer to query time

Prune by reality

The claim lifecycle — not the schema — is the product

Ingest

Emit free

Hold incompatible

Hypothesize

Attach evidence

Rank

Re-rank in time

Explain

Ten invariants, enforced in code

The machinery is live

The alignment engine

The always-on citer

The gleaning loop

One engine, many lanes

A query language with dimensions SQL doesn't have

Benchmarked on LongMemEval — reported honestly

Relationships no one ever thought to type

One substrate, many consumers

Persistent memory for agents

Evidence-first family research

The substrate, watched live

Give any agent a memory that cites its sources

Built like infrastructure, because it is

The research behind donto

The Abundance Substrate

LongMemEval Study

Extraction Engineering

Bind your domain to the substrate.

Let models say everything.
Let reality decide what survives.