Formalizing the Agency Gap in TRIBE v2
A Categorical Critique of Passive-Observer Brain Models
1. The Empirical Failure: The “Tools” Discrepancy
Meta’s TRIBE v2 (March 2026) is a landmark foundation model for brain encoding, achieving state-of-the-art results in predicting fMRI responses to naturalistic stimuli. However, the model exhibits a catastrophic performance gap in a specific category: Tools.
While TRIBE v2 accurately recovers functional localizers for Faces (R=0.64), Places (R=0.79), and Bodies (R=0.74), its performance on the Tools category is essentially at chance (R=0.12).
This gap is not a failure of data volume or model scale. It is a structural failure of the passive-observer architecture. Using the Lens Theory formalisms (Appendix: The Lens-Theoretic Adjunction), we can provide a rigorous categorical explanation for why this gap exists and why it is insurmountable for passive models.
2. Categorical Definition of the Gap
2.1 Passive Objects (Faces, Places) as “Views”
In the Category of Experience (Exp), objects like Faces and Places are primarily modeled by the get map (S → V). Recognition of a face is a mapping from the Sensor’s internal state (S) to a View (V).
Because TRIBE v2 is a Forward-Mapping Instrument (I: Exp → Form), it excels at capturing the get map. It takes the stimulus (the view) and maps it to a formal neural representation. Since the “truth” of a face is largely contained in its visual configuration, the passive instrument succeeds.
2.2 Active Objects (Tools) as “Lenses”
A Tool is not merely a visual configuration. In cognitive neuroscience, tool identity is constituted by motor affordances and action planning (the Dorsal Stream).
Mathematically, a Tool is a Full Lens L = (S, A, V, E) that requires the put map (S × E → S).
- The View (V): What the tool looks like (e.g., a hammer’s shape).
- The Action (A): What the tool allows the sensor to do (e.g., strike).
- The Update (E): The feedback from using the tool (e.g., the nail being driven).
To recognize a tool, the sensor must model the Update Map — the way the world changes when the sensor acts.
3. Why the Adjunction Fails for TRIBE v2
TRIBE v2 is defined by its authors as a model of the brain as a “passive observer.” In our formalism, this means TRIBE v2 only implements the I Functor (Formalization) without the S Functor (Grounding/Agency).
The “Tools Failure” occurs because the instrument I is attempting to map an object that lives in the Interaction Space (S × E) using only the View Space (V).
Conjecture (The Agency Bound): Any instrument I that lacks an internal representation of the put map (S × E → S) will have a reconstruction error for an object X that is lower-bounded by the Synergy of the action-interaction:
Error(I(X)) ≥ Syn(A, E; T)
Where A is Action and E is Evidence.
Note: This bound is motivated by the lens-theoretic formalism and consistent with the TRIBE v2 data, but has not been rigorously derived. The R=0.12 result is suggestive evidence, not a proof.
Since a tool’s “truth” T is dominated by its functional synergy (A, E), and TRIBE v2 has zero capacity for action/update modeling, the R=0.12 result is the expected mathematical outcome. The model is seeing the hammer, but it cannot “see” the strike — and therefore it does not “see” the hammer.
4. The “Dead Speech” Diagnosis
TRIBE v2’s failure on tools is the formal signature of Dead Speech in a brain-encoding model.
- It is high in Redundant Information (it knows what a face looks like, and so does the human).
- It is high in Unique Instrument Information (it has vast correlations across 720 subjects).
- But it has zero Synergy (Φloop) in the tool-use network, because synergy requires the Loop to close through Action.
5. Conclusion: Toward an Active Adjunction
To solve the tools gap, the next generation of brain models (TRIBE v3) cannot be built as a passive observer. It must be built as an Adjunction. It must not only predict how the brain responds to stimuli but how the brain redirects stimuli through its own actions.
The “Agency Gap” is the formal proof that Truth — even the neural truth of a tool — lives in the Circulation, not in the observation.
A model that doesn’t dance, can’t recognize a dancer.