World modelspredict. PRISMverifies.
A verifiable world-model agent architecture where prediction errors drive constraint-checked state updates and trace-consolidated memory. The policy never writes state directly — a verified update cycle does.
why current agents silently drift
Policyshould not writestate.
// existing LLM agents — ReAct, Reflexion, Voyager:
st+1 ← πLLM(st, ot)// policy mutates state directly
// PRISM — the policy never writes state:
// prediction error is the only update signal
// every Δs passes admissibility before commit
// the chain is hash-verified before replay
“A model that predicts well
is not yet a model that commits well.”
- I.⨯Policy networkvia token sampling
- II.⨯Structured parservia extraction output
- III.⨯Retriever / RAGvia top-k context
- IV.⨯World-model latentvia raw rollout
- V.⨯Tool callbackvia function return
- VI.⨯Operatorvia manual override
State writes only via the verified update cycle.
∎
The update
cycle,
eight steps.
Every observation passes through the full inference cycle. Enforced as a durable workflow. Scroll to traverse one execution.
The agent's current latent belief. A typed structure with entities, claims, and a posterior over the world.
Six architectural
guarantees.
Each one is checked at compile time and verified at runtime. Remove any one and the architecture degrades to a vanilla retrieval-augmented loop.
State at any t is bit-exactly reconstructible from genesis.
Memory is a hash-chained replay log of typed transitions. Vector store is retrieval — not memory. Graph store is projection — not belief.
Prediction error becomes the update signal.
The latent predictor forecasts ẑ_t+1 before observation. Mismatch enters as one typed error channel — never as direct state write.
Four gates. Then the world changes.
Every action emits a post-execution observation. The agent always closes the loop.
No external API in the update path.
Deterministic audit. Reproducible provenance. Sovereign data.
durable workflow
No skips.
Watch the model
commit, step by step.
Live simulation of PRISM's telemetry. Every transition is hash-chained, every metric streams in real time. Connect your own predictor — same surface.
The console is a live simulation. In production it streams from PostgreSQL ledger via SSE.
Replay-Exact.
Let A be a PRISM-compliant agent. Under assumptions 1–3, for any time t:
- I.
State sₜ is bit-exactly determined
by the genesis state s₀ and the trace sequence (τ₀, τ₁, …, τ_{t−1})
- II.
Every state update has a witness
each transition Δs corresponds to exactly one logged τᵢ — no silent drift
- III.
Tampering is detectable
any modification to τᵢ breaks the chain h_τ = H(h_prev ∥ serialize(τ)) w.h.p.
“Any agent satisfying replay-exact verifiability must satisfy the PRISM separation. The architecture is implied, not chosen.”
- A1commit: S × Δs → S is deterministic and pure
- A2Hash H is collision-resistant
- A3Replay log {τᵢ} is append-only, tamper-evident
T1 guarantees auditability and reconstructibility — not improved task success, not predictor accuracy. PRISM bounds error commit, not error occurrence.
Five axes.
Nine architectures.
One row closes all five.
ReAct & descendants ignore the write boundary entirely. Active Inference and JEPA address prediction but not commit authority. Cognitive architectures satisfy parts. No prior LLM-agent design combines all five.
ReAct
Yao et al. 2023- Write boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
Reflexion
Shinn et al. 2023- Write boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
Voyager
Wang et al. 2023- Write boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
Toolformer
Schick et al. 2023- Write boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
Constitutional AI
Bai et al. 2022- Write boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
Active Inference
Friston 2010- N/AWrite boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
JEPA / V-JEPA
LeCun 2022- N/AWrite boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
Soar / ACT-R
Laird 2019- Write boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
PRISM
this work- Write boundaryseparated
- Predictionrequired
- Typed deltastructurally
- Gated action4× gates
- Replaybit-exact
← drag horizontally · Table 1 · §2 of preprint →
An order of magnitude
less silent drift.
On suites measuring what PRISM was designed for — error-driven update fidelity, silent commit of unsupported claims, replay traceability — the architecture dominates baselines by 10–30×.
Latency is the price.
We pay it openly.
PRISM pays a per-cycle cost. On the B1/B2 suites that cost is approximately 2.4× ReAct p95. For chat-only workloads this is prohibitive. For agents that must replay their own decisions, this is the price of verifiability.
Standard tools.
Verifiable wiring.
PRISM uses production ML components — V-JEPA, GLiNER-Relex, DeBERTa-MNLI, vLLM — as encoders and serving layers, never as state-writing actors. All inference runs locally.
- Python 3.12runtime
- PyTorchcore
- vLLMLLM serving
- SGLangstructured gen
- llama.cppedge
- sentence-transformerstext emb
- V-JEPAvisual latent
- GLiNER-Relexentity / relation
- DeBERTa-v3-MNLIclaim NLI
- TimeMoEtemporal
- PostgreSQLhash-chained log
- Qdrantretrieval index
- Neo4jgraph projection
- RedisSTM cache
- Apache WALdurability
- Temporaldurable workflow
- FastAPIcontrol plane
- Pydantic v2contracts
- OpenTelemetrytracing
- structlogaudit
Read.Replay.Refute.
PRISM v1.0.0 reference implementation, the full B1/B2 benchmark suites, baseline configs, and replay logs from all benchmark runs — open-sourced for independent verification. PRISM is a discipline applied to existing models, not a new model. Plug in your own predictor. Audit the result.
@article{prism2026wmav,
title = {PRISM: Predictive Recurrent Inference
State Machine — A Verifiable World-Model
Agent Architecture},
author = {[Anonymized for Review]},
journal = {Preprint},
year = {2026},
note = {V{\Delta}LK Research,
Institute for Complex Cognitive Systems}
}- ▸long-horizon agents where memory must replay
- ▸multi-turn reasoning where drift is a known failure mode
- ▸regulated domains: finance, healthcare, law, audit
- ▸embodied / robotic systems with safety constraints
- ▸research benchmarks that score traceability