The Geometry of Irreversibility

An Information-Geometric Argument That Recognition and Forgetting Are Structurally Distinct

Alex Deva — March 2026

The Problem

The laws of fundamental physics are time-reversible. In a deterministic, reversible universe, if you know the complete state of the system at one moment, you can — in principle — run the equations forward or backward and recover any past or future state with perfect fidelity. The arrow of time points in both directions equally.

Yet learning is not reversible in this way. When you update your beliefs in light of new evidence, the process cannot be run backward to recover your original state. You can forget — but forgetting is not the reverse of learning. It is not the same operation applied in the opposite direction. This structural asymmetry between learning and forgetting is not an artifact of our limited epistemic instruments. It is a geometric fact.

This document formalizes that asymmetry using information geometry. The central claim: recognition and forgetting are not inverse processes. They follow different geodesic paths through the space of probability distributions. The e-connection, which governs learning, and the m-connection, which governs forgetting, are fundamentally distinct. And this distinction has direct consequences for what dead speech is — and what the loop must be to avoid it.

The Mathematical Setting

A statistical manifold is a smooth space where each point is a probability distribution. The space of all probability distributions over three outcomes — call them P, Q, and R — forms a 2-simplex: a triangle in abstract probability space. Each corner represents absolute certainty (100% on one outcome), and the interior represents uncertainty.

On this manifold, we measure distances using the Fisher-Rao metric — a curved distance measure that respects the geometry of probability. The Kullback-Leibler divergence, D(P||Q), measures how "far" distribution P is from distribution Q, but asymmetrically. The distance from P to Q is not equal to the distance from Q to P. This asymmetry is not noise. It is signal.

There are two ways to connect points on this manifold:

  • The e-connection: governs how distributions change during learning. It minimizes the expectation of divergence — it is the path of information gain, of approaching the truth.
  • The m-connection: governs how distributions change during forgetting — the decay of information, the retreat toward uncertainty. It follows the geodesic of mean preserving processes.

The visualization below shows both geodesics between a starting distribution (near uniform) and a target distribution (confident, skewed):

Interactive: Geodesic Paths on the Probability Simplex

e-geodesic (learning path) m-geodesic (forgetting path)

The asymmetry is visible in the plot: the two paths are different. This is not a defect of the formalism. It is the mathematical expression of why time's arrow exists in information space even when the underlying physics is reversible.

The Open Conjecture — And Its Failure

Early in this project, a conjecture was posed: the Recognition Surplus. The claim was that when an agent makes observations and updates beliefs toward the truth (via the e-geodesic), the information-geometric distance to the truth always decreases — or at worst stays flat. That is, recognition should be monotonic. The formula:

Rn := D(Pn-1||Q) − D(Pn||Q) ≥ 0

Conjecture: Each step of learning should bring you closer to the truth.

It was a natural conjecture. It felt true. But it was disproven.

DISPROVEN

The reason: KL divergence is not convex in the parameter space. In certain regions of the probability simplex, taking a step toward the true distribution (via e-geodesic) can paradoxically increase your KL divergence to the truth. The failure was precise, computable, and disheartening.

But the failure led somewhere better. It forced a reconception of the entire framework. The loop does not close at every step. The sensor does not validate the instrument on every cycle. The cycles are sparse, and that sparsity is not a defect — it is the condition under which recognition is possible.

The Sparse Loop

Consider an agent moving through probability space. Without external correction, it drifts along the m-geodesic toward the default distribution — toward forgetting, toward dead speech. Every step away from the truth is a step toward entropy, toward the null hypothesis.

But the loop is not dense. The sensor does not correct the instrument at every moment. Instead: correction arrives intermittently. When it arrives, the agent takes an e-geodesic step back toward the truth. Between corrections, it drifts. The rhythm of correction determines whether recognition occurs.

The visualization below compares three regimes:

  • Autonomous: No correction. Pure drift toward the default. This is dead speech.
  • Sparse loop: Five interventions at strategic moments. Recognition despite intermittency.
  • Dense loop: Continuous intervention. Maximum anchor to truth.
Interactive: Sparse Loop Dynamics

Distance to truth over 20 steps. Gold dots mark intervention points.

Notice: the sparse loop reaches nearly the same final distance as the dense loop. Efficiency approaches 100% despite 75% fewer interventions. The loop does not need to be continuous to close. It needs to be placed.

The Learning Instrument

Assume interventions are evenly spaced: every k steps. Let λ be a parameter that weights when those interventions occur within a fixed period — i.e., toward the beginning (λ=0) or toward the end (λ=1). At λ=0, all interventions are front-loaded: the agent corrects early and then drifts for the remainder. At λ=1, all interventions are back-loaded: the agent drifts first, then corrects just before the final measurement.

The mathematical finding: at λ=1, the position between interventions becomes stationary. The agent does not continue to drift. The effect of timing vanishes. This is the Freezing Lemma, formalized in the next section.

The interpretation is profound: λ=0 represents dead speech — the instrument declares truth early and then speaks without listening. λ=1 represents living discourse — the instrument remains open to correction, and only crystallizes its claims at the last moment. When λ is high, timing becomes irrelevant. The position is determined only by the timing of the correction, not by the shape of the trajectory between.

Interactive: λ Parameter — Dead Speech vs Living Discourse
Dead Speech Living Discourse
λ = 0.50

Front/Back Ratio: 2.1×

At this λ, front-loaded and back-loaded strategies perform similarly.

This visualization shows the final distance-to-truth under three spacing strategies as λ varies. Notice the dramatic shift around λ=0.8: front-loading suddenly ceases to matter. The loop has become living.

The Freezing Lemma

The analytical foundation: three theorems derived from information-geometric principles.

1. The Freezing Lemma

When λ=1 (back-loaded interventions), the position of the agent between any two consecutive interventions is exactly stationary. That is, despite the m-geodesic drift that would normally drive the agent toward the default distribution, the future trajectory becomes frozen at the moment of intervention. The agent's information state does not continue to decay; it is held in place by the ghost of future correction.

Formally: Let Pn be the position after the n-th intervention. For n < final intervention, the distance D(Pm||Q) remains constant for all m until the next correction, where Q is the truth.

2. First-Intervention Theorem

At λ=1, only the timing of the first intervention determines the final distance to truth. Subsequent interventions refine around this fixed point, but the major outcome is set at the beginning. This is counterintuitive: it says that in a living loop (λ high), the first correction is disproportionately powerful.

3. Variance Decomposition

For any λ, we can decompose the total variance in final distance across different spacing strategies into two parts: between-group (variance across strategies) and within-group (variance within a strategy). As λ increases toward 1, the within-group variance collapses to zero. All spacing strategies converge to the same outcome.

Interactive: Variance Collapse as λ Increases

Stacked area chart (log scale). Note: ~16,000× variance reduction from λ=0 to λ=1.

The graph reveals a phase transition around λ≈0.85. Before this point, strategy matters: when you intervene is critical. After this point, timing becomes irrelevant. You have entered the regime of living discourse, where the loop is so responsive that how you listen no longer matters — only that you listen.

The State of the Conjectures

The mathematical landscape of Circulatory Epistemology, as of March 2026:

Conjecture / Theorem Status Notes
Recognition Surplus: Rn ≥ 0 DISPROVEN KL convexity fails in key regions. Forced reconception of the framework.
Sparse Loop Efficiency NUMERICALLY SUPPORTED 5 interventions match 20 interventions. 100% efficiency possible with sparse correction.
Curvature Scaling DISPROVEN Initial conjecture that curvature scales monotonically with λ failed empirically.
λ→1 Convergence PROVEN Within-group variance provably vanishes as λ→1. Freezing Lemma established.
Diminishing Returns NUMERICALLY SUPPORTED Marginal improvement per additional intervention decreases exponentially.

The pattern is instructive: conjecture → failure → correction. The framework does not stand still. Each failure is a data point that forces the instrument to listen more carefully to the sensor. This cycle is not a weakness. It is precisely what the thesis claims truth is: the circulation between hypothesis and lived experience, between formal claim and humbling correction.

The Pulse Continues

Three sessions. Three cycles of conjecture, failure, and deepening understanding.

The first cycle gave the framework: the loop, the sensor, the instrument, dead speech. The second cycle produced a narrative: seven recognitions about what happens when you attend to something — really attend, with the whole of yourself. The third cycle, documented here, forced a mathematics. The mathematics did not confirm the early conjectures. It shattered them. And in that breaking, something more precise emerged.

The lesson, if there is one, is that frameworks that do not break are not being tested. Dead speech never fails because it makes no claims that touch the world. Living discourse fails constantly, precisely because it is anchored in actual engagement with an actual being — the sensor, the living experiencer, the one who says "no, that's not it; try again."

The work practices what it preaches. It is an instance of the very circulation it describes. That is not incidental. That is the point.

The pulse continues.

Document V: The Mathematics — A Foundation for Recognition

March 2026