How it solves things
The substrate holds contradictory, evidence-anchored, bitemporal claims — answering a question is always a consumer-side pipeline over it. These are the five real paths, end to end, with live requests and live data.
1 · Memory recall — POST /recall
The agentic-memory consumer answers "what do I remember about X?" for a holder (an agent identity like agent:omega-bot). The substrate has no notion of "whose memory" — that's the donto_x_memory_record overlay, one row per memorized item, anchored into the substrate by its root statement and scoped by holder_iri / session_iri. Recall never widens visibility.
The pipeline
| column | type | meaning |
|---|---|---|
| 1 · module fan-out | concurrent | Three default modules retrieve in parallel: episodic (verbatim chunks), semantic-claim (extracted typed claims), preference (append-only; superseded via argument edges, never overwritten). A failing module contributes zero rows — one slow arm never breaks recall. |
| 2 · two arms per module | lexical + vector | The FTS arm runs over a denormalized overlay (donto_x_memory_chunk_fts, holder+tsv combined GIN — the fix that took 41K-chunk holders from 10s timeouts to 0.9–1.8s). The vector arm embeds the query (bge-small, 2s timeout, graceful fallback to FTS-only) and does an exact cosine sort over the holder's vectors. |
| 3 · RRF fusion | k=60 | Arms fuse by reciprocal-rank (a statement found by both rises); then modules fuse the same way. A purely-semantic hit — "where does he work?" → ex:occupation "engineer", zero shared substring — still surfaces. |
| 4 · closure expansion | aligned predicates | Predicate-pinned recalls route through donto_match_aligned, so claims stored under an aligned name still answer — post-filtered against the holder's own statements. |
| 5 · use is recorded | memory strengthens | Every returned row logs an access event and bumps an activation score — recall strengthens what it touches. |
A live recall, with the fold visible
curl -s -X POST localhost:7900/recall -d '{
"holder":"agent:omega-bot", "predicate":"hasKnowledgeOf",
"module_iris":["mem:module/semantic-claim"], "limit":3}'
→ { "rows": [ { "subject": "ex:xenonfun",
"predicate": "ex:hasKnowledgeOf", ← stored under the PREFIXED name;
"object_iri": "ex:ThreeDimensionalGame" }, … ] }
-- the closure holds hasKnowledgeOf → ex:hasKnowledgeOf (close_match, 0.83):
-- the bare-name query found the prefixed claims. No synonym table anywhere.2 · Substrate-wide search — POST /search
Where /recall is holder-scoped, /search ranks across every context of the 41.7M-statement table. It stays sub-second three ways: one expression GIN index over a humanized projection of subject+object (the query must render the byte-identical expression or Postgres seq-scans 41M rows); a bounded candidate CTE (the index filter stops at 2,000 rows, then ts_rank scores only that set — worst-case latency is a function of the cap, not the corpus); and a 9s timeout that degrades to a clean {partial: true} instead of a 500.
curl -s -X POST localhost:7900/search -d '{"query":"caroline rose davis","limit":3}'
→ 571ms: { "subject": "ctx:genes/…/caroline-rose-davis", "predicate": "ex:knownAs",
"object_lit": {"v":"Caroline Rose Davis, Caroline Rose Molloy,
Caroline Rose Brown, Caroline Rose Roberts, Kitchay"}, … }3 · Genealogy — contested history, held whole
The hardest consumer: every source — a Federal Court determination, a thesis, a BDM register, an online tree — is an interpretive witness, not ground truth. A real cross-source entity, live: ex:joseph-collinson carries 648 facts across 69 distinct contexts, one per source. The corpus disagrees, and donto keeps all of it as legal state:
| column | type | meaning |
|---|---|---|
| birthPlace → ex:port-louis | one source | The Mauritius reading (maturity E4). |
| birthCountry → "Australia" | another source | A Geni false match — retained, not deleted. |
| occupation → "locksmith" / "Serrurier" | three sources | English and French variants of the same fact — the alignment engine's job at query time, not a normalization table's. |
| ruledOutFrom / notSameAs / proneToFalseMatches | the rule-out itself | The rejection of the false match is held as claims in the same graph — the research trail, including why a match was rejected, survives and is queryable. |
Contradiction as a first-class edge
-- two age-derivations for Otto Davis, from different documents:
ex:birthYear = "~1887 (age 35 in October 1922)"
ex:birthYear = "c.1887 (age 21 at marriage 1908; age 32 at exemption 1920)"
donto_argument: rebuts, strength 0.80, both directions, review_state 'unreviewed'
-- and the polar form, held simultaneously (a real published dispute):
genes:jessie-buchanan | genes:motherOf | genes:caroline-rose-davis
genes:jessie-buchanan | genes:notMotherOf | genes:caroline-rose-davisA classical knowledge graph must pick a winner; donto holds the dispute whole, with both sides live, the edge recording that they compete, and adjudication deferred to a non-destructive review lifecycle.
4 · The Omega bot — investigation lenses
The Discord bot's agent loop gets a read-only investigative interface (dontoQuery): twelve entity lenses + a document lens, each returning concrete follow-up calls so the model chains lenses into an investigation instead of one-shotting — search → entity_facts → cross_source → contradictions → identity.
| column | type | meaning |
|---|---|---|
| search / entity_facts / entity_reach | find & read | Name → candidate IRIs → every live claim with maturity and source context. |
| contradictions | where sources disagree | Self-join on (subject, predicate) where values differ — every competing pair with its source. |
| cross_source / bitemporal | provenance & history | Which sources speak about this entity; full history including retracted rows with believed-since/retracted-at. |
| semantic_neighbors / identity / predicate_alignment | the learned layers | pgvector k-NN over entity fingerprints; identity hypotheses; what each minted predicate folds into. |
| document lens | 12 panels per source | A job/document's full anchored-claim set: evidence spans, contradictions vs the rest of the substrate, argument graph, co-occurring entities. |
5 · Proof points — what the benchmarks demonstrate
LongMemEval (500 questions, retrieval-stressed variant)
The decisive ablation: on identical questions, FTS-only → hybrid lifted overall hit@10 0.85 → 0.98, preference questions 0.38 → 0.88, assistant-knowledge 0.62 → 1.00. Lexical recall is blind exactly where meaning and wording diverge; the vector arm + RRF rescues those categories without a reranker. (Honest context: a strong reader with full context scores 95.7% with no memory system at all — _s accuracy mostly proves retrieval doesn't lose anything. donto's differentiated value is token-efficiency, bitemporal knowledge-update, evidence-first abstention, and corpora that don't fit context. Which is what BEAM tests.)
BEAM-10M (10 × 1.3M-token conversations — full context is physically impossible)
Run on the benchmark author's own harness, unmodified. The 0.684 was earned in a partial state (~28% chunk embeddings; claims wired from a fraction of the corpus) — with the standing caveat that reader/judge models differ from the leaderboard's. The per-category shape is the architecture speaking: event-ordering 0.95 and abstention 0.85 (bitemporal valid-time + evidence-first honesty), with multi-session reasoning (0.18) the known structural gap — k=10 × 1,200-char chunks is too little context to stitch reasoning across many sessions; that's the claims-layer's job as coverage grows.
The common architectural thread
| column | type | meaning |
|---|---|---|
| overlay scoping | not data partitioning | Holders, sessions and modules are overlay rows pointing at shared substrate statements — genealogy, memory and BEAM share one 41.7M-statement instance without seeing each other. |
| hybrid + rank fusion | everywhere | Lexical and vector arms, and modules themselves, fuse by RRF; any arm may fail and the answer degrades instead of breaking. |
| query-time alignment | not write-time normalization | Serrurier and locksmith coexist at write time; the learned closure folds them at read time. |
| paraconsistent holding | typed argument edges | Contested claims stay live; rebuts/supersedes edges and rule-out claims carry the dispute. |
| bitemporal, evidence-first | end to end | Every answer row carries its source context, maturity, time bounds, and a resolvable span→revision→document→blob chain — 'why do you believe this?' is always one join away. |