The Agency Gap in TRIBE v2

A Categorical Critique of Passive-Observer Brain Models

Alex Deva — March 2026

Formalizes ideas from: III. The Pulse & the Equation V. The Mathematics

This appendix applies the lens-theoretic adjunction (Fong & Spivak, 2019) developed in the Lens Adjunction appendix to critique a specific empirical result: Meta’s TRIBE v2 brain-encoding model. The categorical framework is standard; the specific conjecture (the Agency Bound) is motivated by the formalism and consistent with the data but has not been rigorously derived.

1. The Empirical Failure: The Tools Discrepancy

Meta’s TRIBE v2 (March 2026) is a landmark foundation model for brain encoding, achieving state-of-the-art results in predicting fMRI responses to naturalistic stimuli. However, the model exhibits a significant performance gap in a specific category: tools.

While TRIBE v2 accurately recovers functional localizers for Faces (R=0.64), Places (R=0.79), and Bodies (R=0.74), its performance on the Tools category is essentially at chance (R=0.12).

This gap is not a failure of data volume or model scale. It is a structural failure of the passive-observer architecture. Using the Lens Theory formalisms, we can provide a rigorous categorical explanation for why this gap exists and why it is insurmountable for passive models.

2. Categorical Definition of the Gap

2.1 Passive Objects (Faces, Places) as “Views”

In the Category of Experience (Exp), objects like Faces and Places are primarily modeled by the get map (S → V). Recognition of a face is a mapping from the Sensor’s internal state (S) to a View (V). Because TRIBE v2 is a Forward-Mapping Instrument (I: Exp → Form), it excels at capturing the get map. Since the “truth” of a face is largely contained in its visual configuration, the passive instrument succeeds.

2.2 Active Objects (Tools) as “Lenses”

A Tool is not merely a visual configuration. In cognitive neuroscience, tool identity is constituted by motor affordances and action planning (the Dorsal Stream).

Mathematically, a Tool is a Full Lens L = (S, A, V, E) that requires the put map (S × E → S). The View (V) is what the tool looks like; the Action (A) is what the tool allows the sensor to do; the Update (E) is the feedback from using the tool.

To recognize a tool, the sensor must model the Update Map—the way the world changes when the sensor acts.

3. Why the Adjunction Fails for TRIBE v2

TRIBE v2 is defined by its authors as a model of the brain as a “passive observer.” In our formalism, this means TRIBE v2 only implements the I Functor (Formalization) without the S Functor (Grounding/Agency).

The “Tools Failure” occurs because the instrument I is attempting to map an object that lives in the Interaction Space (S × E) using only the View Space (V).

Conjecture (The Agency Bound): Any instrument I that lacks an internal representation of the put map will have a reconstruction error lower-bounded by the Synergy of the action-interaction:

Error(I(X)) ≥ Syn(A, E; T)

Note: This bound is motivated by the lens-theoretic formalism and consistent with the TRIBE v2 data, but has not been rigorously derived. The R=0.12 result is suggestive evidence, not a proof.

4. The “Dead Speech” Diagnosis

TRIBE v2’s failure on tools is the formal signature of Dead Speech in a brain-encoding model:

It is high in Redundant Information (it knows what a face looks like, and so does the human).
It is high in Unique Instrument Information (it has vast correlations across 720 subjects).
But it has zero Synergy (Φ_loop) in the tool-use network, because synergy requires the loop to close through action.

5. Conclusion: Toward an Active Adjunction

To solve the tools gap, the next generation of brain models (TRIBE v3) cannot be built as a passive observer. It must be built as an Adjunction—not only predicting how the brain responds to stimuli but how the brain redirects stimuli through its own actions.

The “Agency Gap” is the formal proof that Truth—even the neural truth of a tool—lives in the Circulation, not in the observation.

A model that doesn’t dance, can’t recognize a dancer.