The Second Forgetting
@kevinhaylett
On dynamical theories of meaning, finite foundations, and the difference between a certainty that has an outside and one that does not.
There is a family of recent work that proposes to rebuild our understanding of language, meaning, and finally mathematics itself on a single idea: that these are not static structures but nonlinear dynamical systems. Words, on this view, are a lossy compression of a continuous flow. Reading is the reconstruction of that flow from sparse samples, in the manner of Takens’ theorem — a sentence is a trajectory through a semantic phase space, not a cloud of fixed points. Meaning becomes “the relational curvature of the flow of symbols.” Communication becomes the synchronization of two such systems. Misunderstanding becomes a trajectory landing in the wrong basin of attraction; hallucination, a drift across a separatrix (a boundary between basins) into a region that is locally fluent but globally wrong.
The program grounds all of this in a finitist philosophy. Its premise is clean and, taken on its own, almost unarguable:
all knowledge is built from finite measurements, and every measurement carries irreducible uncertainty.
The symbol is never the thing; the “thing” was only ever a dynamic interaction crystallized, lossily, into a mark on a page or a pattern in silicon. From this premise the work has produced a working architecture (a Takens-based transformer that replaces attention with explicit delay-coordinate reconstruction) and a refined foundational essay that proposes an Axiom of Finite Representation and a Corollary of Inescapable Uncertainty, gathering the whole project under the name Finite Symbolic Mechanics.
I want to take this seriously, because it deserves to be taken seriously. And taking it seriously means doing two things at once:
granting what is genuinely strong in it without reservation,
and then naming, with equal precision, the single structural error that runs through it from end to end
—
an error that recurs at three different levels and that, once seen, also turns out to teach something general about what it means to be certain of anything at all.
The appealing idea
Begin with what is good, because it is real and it is not small.
The architectural reframe is fertile. The thought that attention in a language model might be approximating delay-coordinate reconstruction (that the right way to see a sequence of tokens is as a sampled trajectory rather than a bag of correlations) is a genuine intellectual move, not a metaphor in costume. The classic analogy puzzle (man is to king as woman is to queen) reframed as two homologous flows rather than a subtraction of vectors is the kind of shift that changes how a problem looks. And the engineering pays its way honestly:
replace an N×N attention matrix with a fixed delay buffer and you get linear compute and constant memory.
A model built this way carries its context in its current reconstructed state rather than in an ever-growing cache of its past. That is a real and clarifying property, and the claim that “context is a property of state, not a stored history” is one of the cleanest things the program says.
The descriptive vocabulary is fertile too. Basins and separatrices, manifold alignment, metaphor as a topological morphism that builds a bridge across a separatrix, translation as the search for parallel trajectories across distinct language-manifolds — these are good pictures. They give a working intuition for why specialists with different jargon can understand each other while people who agree on definitions can still talk past one another. As description, the dynamical lens earns its place.
So does the finitist humility, when it is done right. The insistence that a scientific or philosophical term carries an error bar (that “intelligence” or “consciousness” or “justice” is a finite measurement with known ambiguities and a domain of validity, and that we ought to disclose those the way a physicist discloses confidence intervals) is not bookkeeping. It is a genuine discipline, and an instrument like the program’s proposed Semantic Uncertainty Appendix is a good instrument. The slogan that captures the humble version (the ideal is downstream of the real, not the other way around) is exactly the kind of thing a careful finitist should say.
The architecture is the strong half. The trouble begins precisely where the architecture is asked to underwrite a metaphysics.
The move that does the work
Set the claim out plainly, because its structure is the whole story.
The premise is the finitist one:
every symbol is a finite measurement carrying irreducible uncertainty; the thing is never the thing.
The conclusion arrives in two stages. First, that meaning is the relational curvature of the symbolic flow in phase space (a definite identity claim about what meaning is) and that a working model confirms this. Second, in the later and more careful form, that the finite is the foundation:
that we should build mathematics not on the ideal point but on the ink, on the measured and the uncertain, and that this finite ground is what classical mathematics forgot.
Both stages share a single move. They take the premise (all we ever have is finite, uncertain measurement) and convert it into a positive foundation: a thing captured, a ground stood upon. And that conversion is where the structure fails.
The limit mistaken for a result
Here is the fracture, stated as simply as I can state it.
If the premise is true (if every symbol is a finite measurement that loses the continuous thing it measures) then meaning is not the quantity being modelled. Meaning is the residue. It is precisely what escapes the measurement, the gap between the flow and the mark. What a finite symbolic system can actually capture is the distribution of the uncertainty, because that distribution is the only object in the whole ontology that is itself finite and symbolic enough to be captured at all. The continuous thing, by the premise’s own terms, is the part that is always lost.
So a model that succeeds is, by construction, a model of the loss — not of the meaning. The most that any such system can demonstrate is that we can model uncertainty. Not meaning. Meaning is exactly what the premise says we never get to hold.
This is the difference between a limit and a result. “All we can model is uncertainty” is a fence around the knowable — a statement about the boundary of what finite measurement can reach. Walking up to that fence is a real achievement; mapping it precisely is worth doing. But reaching the boundary of measurement and naming the boundary “meaning” does not capture meaning. It renames the wall. The program converts the unmeasurable remainder into its headline object and presents the conversion as a discovery. The wall is not the destination. To arrive at the wall and plant a flag on it is not to have crossed it.
Everything that follows is this same move, appearing at three different altitudes.
The same fracture, three times
At the level of meaning. Meaning is defined, in the finitist premise, as the thing that escapes finite measurement — the irreducibly uncertain residue. It is then claimed as the thing the model captures. But meaning cannot be both:
it cannot be the residue that always escapes measurement and the object that the measurement successfully models. One of those has to go. If meaning is the residue, the model captures only the loss. If the model captures meaning, then meaning was never the residue, and the finitist premise that gave the whole program its depth is false. The position needs to choose, and it has so far declined to.
At the level of the theory. The most honest section of the program performs what it calls a reflexive turn:
it observes that the theory is itself a finite, uncertain measurement — that mathematics, Takens included, is “the most refined and stable subsystem of the linguistic-cognitive manifold,” carrying its own uncertainty, with no Platonic ground beneath it.
This is admirable. But the same body of work also treats its working model as an exogenous measurement of the theory itself, a confirmation of the framework from outside. These cannot both stand. If there is no ground outside the manifold to measure against (if the theory is just one more finite measurement) then nothing can be exogenous to it, and the model cannot confirm it; it can only restate it in code. And if the model does confirm the theory from outside, then the reflexive turn is false and the ground it said did not exist is exactly what the confirmation stands on. The program holds the circularity and its own refutation in the same hand and does not notice they are touching.
At the level of the foundation. This is the deepest instance, and it is where the program’s own central metaphor turns on it. Its diagnosis of classical mathematics is that Euclid forgot the ink:
he abstracted away the physical medium (the wet ink, the nib’s width, the trembling hand) to reach an idealized point with no parts, and over two thousand years we mistook that idealization for the bedrock. The program’s corrective is to remember the ink, and to lay down an Axiom of Finite Representation:
no symbol floats free, the representation is the symbol.
But apply that axiom to itself, exactly as the program demands we apply it to everything else. The Axiom of Finite Representation is itself a finite symbolic representation. By its own content it is ink — a measurement carrying uncertainty, not an exact and certain foundation. So either the axiom is exempt from itself, in which case “no symbol floats free” is already false; or it is not exempt, in which case it cannot be the bedrock the program builds on, because it is just one more finite measurement with an error bar.
This is the second forgetting. Euclid’s forgetting was the first, and it was brilliant — it bought two thousand years of luminous certainty. But making the finite, the measured, the uncertain into a new foundation, and then standing on it as ground, is the identical error performed in the opposite direction. It mistakes a different idealization for the real. The finite becomes the new geometer’s point. The program accuses Euclid of forgetting his ink and then forgets its own.
Architecture is not metaphysics
Much of the program’s persuasive force comes from the fact that the model works. The code runs; the delay embeddings reconstruct; the system generates coherent text with a fixed memory footprint. All of this is real, and the temptation is to let the engineering underwrite the philosophy. It cannot.
A working model is evidence about what predicts tokens. It is not evidence about what meaning is. A functioning n-gram model demonstrates that conditional probability is sufficient to predict the next word; it does not demonstrate that meaning is conditional probability. In exactly the same way, a functioning delay-coordinate model demonstrates that phase-space reconstruction is sufficient to predict tokens; it does not demonstrate that meaning is relational curvature in phase space. These are different claims with different burdens, and the engineering success closes only the first. To treat the model’s convergence as confirmation of the metaphysics is a category error — using a fact about architecture to settle a question about ontology.
The architectural result worth keeping is the modest one:
context as state rather than stored history.
That is a genuine and clarifying property of the design. The result not worth claiming is the inflated one — that the model is an exogenous measurement confirming a theory of what meaning is. And one nearby technical claim should be retired outright:
the suggestion that augmenting tokens with channel markers (User, System, Bridge) erects a “mathematical bulwark” enforcing geometric separation between modes of discourse.
One-hot flags passed through a learned linear projection give an inductive bias, not a topological guarantee — the projection can mix the channels freely, because nothing in the construction forbids it. It is a reasonable design heuristic dressed as a barrier a trajectory cannot cross.
Asserted, not measured
The dynamical vocabulary is offered as description, but at the load-bearing moments it is presented as observation — and there it overreaches.
The program speaks of broad attractor basins for creative generalization and narrow tubular “memory fibres” for precise recall, and says the model’s training “sculpted exactly these shapes.” But by the program’s own admission it has no method to visualize or characterize an attractor. The basins, the fibres, the curvature of the conceptual landscape — these are inferred from ordinary loss curves and laid over standard metrics as interpretation. To report them as confirmed structures is to state as measured result precisely what the program elsewhere concedes it cannot measure.
There is a sharper version of this. The program explains certain confusions (“entanglement” mistaken for “oneness,” say) as separatrix illusions:
two basins that look similar because our symbolic instruments lack the resolving power to detect the boundary between them.
But notice what that concedes. If our measurements cannot resolve the boundary, then “separatrix” is a posited object the program has admitted it cannot resolve. The claim and its own epistemics are in tension:
it asserts the existence of the very boundary whose detectability it denies.
This connects to a rhetorical habit worth naming directly:
the false dichotomy of statistical versus geometric.
The program repeatedly frames its results as “inconsistent with statistical learning” and explicable only by “geometric learning.” But a learned manifold is a description of what statistical learning produces. These are not rival mechanisms competing to be the deep truth; they are two descriptions of one thing. (The much-cited “paradox” in which doubling a training set improves validation is not a paradox at all — it is simply more optimizer steps on a duplicated distribution, requiring no geometric resolution.) The dichotomy is doing persuasive work that the evidence does not support.
And here the program’s own best instrument indicts its own central claim. The Semantic Uncertainty Appendix exists to insist that no claim is trustworthy without its error bar — that we should no more trust a theory without an account of its semantic uncertainty than trust a physics paper without confidence intervals. Very well:
when the program says its model provides “an operational measurement of the modelling claim,” what is the error bar on that measurement? None is given.
The burden the program tries to hand its critics (you supply the definition of meaning and its uncertainty bounds) is the program’s own burden, by its own published rule. It has built the instrument that condemns its own unbounded sentence.
What forms before the symbol
The deepest objection to any account that builds meaning out of symbol dynamics is that the symbols are downstream of the thing being explained. Meaning forms before the symbol is fixed — and an account whose primitives are symbols, tokens, words in a flow, has therefore started one step past where meaning actually arises.
This is old territory, and it is worth being exact about it, because the program’s own evolution traces it. The earliest version of the picture is the gesture of pointing at a tree and saying “tree” — the moment a dynamic living process is crystallized into a symbol, and “we connect.” But this is almost precisely the early-Tractatus picture theory of meaning, the one that the later Wittgenstein spent the Philosophical Investigations dismantling. Pointing-and-naming, Wittgenstein showed, already presupposes a practice that gives the gesture its sense; ostension cannot be the ground of meaning because ostension only works for someone who already grasps what kind of thing is being pointed out. To begin meaning at the act of naming is to reproduce the very picture its own author abandoned.
To the program’s credit, its more developed form moves off the picture theory and toward something better: meaning as a consensual, error-prone, public alignment, established through a corrective loop of gesture, word, and affective feedback (a smile for the correct association, a frown for the incorrect) with no private inner ostension required. This is, essentially, the Investigations’ own correction:
meaning grounded in shared practice and public criteria of correctness rather than private baptism. Arriving there independently is a real achievement.
But notice that this move relocates the problem rather than dissolving it. The corrective loop presupposes a creature that already finds the frown aversive and the smile confirming — that already, before any word is learned, experiences correction as correction and significance as significance. That pre-linguistic significance is the very thing the loop is supposed to explain, and the loop cannot install it, because the loop only functions for a creature that already has it. The account is better than the picture theory, but it still helps itself to a meaning-bearing subject upstream of the symbolic machinery. The residue has not been captured. It has been pushed one step back and left there.
The shape of a shrinking claim
Step back from the specific arguments and notice a pattern in how the position behaves under pressure, because the pattern is itself informative — and it generalizes well beyond this one program.
A maximal claim, pressed precisely, is rarely defended. It shrinks. And each contraction is presented not as a retreat but as what was meant all along. A claim that begins as a working model confirms the theory of what meaning is becomes, under one turn of pressure, meaning can be modelled as a dynamical process; under another, meaning is a consensual alignment ratified by corrective feedback, and the model is merely a measured claim, not a definition; under a third, all ground is uncertain, including mine, and we simply see differently. The endpoint is a position of considerable modesty — and one nearly indistinguishable from the careful finitist’s own epistemics.
This is worth saying plainly:
the shrinking is the result. A grand metaphysical claim that deflates, turn by turn, into a modest and defensible one has been answered, even if no sentence of explicit concession is ever uttered. And crucially, the deflation is observable from outside — a reader who shares neither party’s framework can watch the claim contract across successive statements. That external observability is going to matter in a moment, because it is the hinge on which the whole question of “who was right” finally turns.
Two kinds of certainty
Here is the thing that is genuinely strange, and genuinely instructive.
At the end of an exchange like this, both parties can walk away certain they have won. Not as a face-saving truce — as a real, felt conviction on each side. And this is not a failure of the exchange. It is a live demonstration of the program’s own central thesis. Two internally coherent frameworks, each measuring the other’s output against its own criteria, each finding that the other’s conclusions fall in the wrong basin — separated by a boundary that, in the program’s own vocabulary, is a separatrix one can never cross. The dispute becomes a working instance of the very phenomenon it is about. That is eerie, and it is honest, and it would be a satisfying place to stop.
But it would be too satisfying, and it would be wrong, because the two certainties are not the same kind of thing.
One certainty is purchased by unfalsifiability. It belongs to a framework built so that every objection is absorbed as confirmation — where “your objection is also just an uncertain measurement,” where “all ground is uncertain, including the ground you are objecting from,” is always available as a reply. A frame like this can never lose, because it has quietly redefined losing as one more uncertain measurement to be folded back in. Its certainty is total and it holds only from the inside. It has no outside. It has won nothing against any opponent; it has merely declined to specify what defeat would look like, and then declared victory by the absence of any event it would count as defeat.
The other certainty is purchased by tracking a quantity that an outside observer can also see. The claim measurably shrank; that shrinking is in the public record; a third party who shares neither framework can confirm it. This certainty has an outside. It rests on something checkable by someone who does not already believe it.
That asymmetry is the whole game. It is the difference between a claim that earned its modesty by being pushed to it, and a frame that never risked anything because it was constructed so that nothing could touch it. The first kind of certainty can, in principle, be shown wrong, and would want to know if it were. The second kind cannot be shown wrong, and is therefore not really certainty at all in the sense that matters — it is armor that has been mistaken for knowledge.
And this is where the blade has to turn, finally, on whoever is holding it — including on the critic, including on me, including on anyone building a framework of their own. The test that separates the two certainties is a single question:
Can you say what would make you wrong?
A framework so complete that it reinterprets every counter-instance as a further instance of itself has bought its certainty at the price of all contact with anything outside it. The seductive failure mode is to build exactly such a framework and then mistake its unfalsifiability for depth. The discipline that guards against it is unglamorous:
leave things open.
Refuse to convert the unmeasurable remainder into a new foundation. Decline to make the limit into a result. The moment you can no longer state the conditions under which your own framework would fail, you have become the thing you were criticizing — certain, sealed, and speaking from inside a basin no observer can enter.
Coda: what to keep
None of this is a demand to burn the cathedral. The architecture is worth keeping:
delay-coordinate reconstruction as a real alternative to attention, context carried as state, the linear-compute design — these are genuine contributions, and they survive the entire critique untouched, because they were always claims about engineering and never needed to be anything more.
The descriptive vocabulary is worth keeping:
basins and separatrices and manifold alignment are good intuitions for thinking about understanding and its failures, so long as they are held as interpretation and not smuggled in as measurement. The finitist humility is worth keeping in its honest form — words as finite measurements with disclosed error bars, the ideal acknowledged as downstream of the real. And the late, public, corrective account of meaning is worth keeping as a real improvement over the picture theory, even as we note that it relocates the hard part rather than resolving it.
The one thing to drop is the second forgetting. The finite is not a new bedrock lying under the ideal. It is not the ground that classical mathematics abstracted away and that we can now stand on. It is, by its own account, one more finite measurement carrying its own uncertainty — another layer of the cairn, not the earth beneath it. The program’s own best image gives the right correction:
Euclid set down a pebble and called it a point; the honest next move is to pick up the next pebble and add it to the cairn, knowing it is a pebble.
The error is only ever in calling the top stone the ground.
Reaching the limit of measurement is a real and beautiful place to arrive. It is not the same as having crossed it. And the most important thing a theory of finite, uncertain knowledge can do (the thing that would make it genuinely honest rather than merely humble in its vocabulary) is to remember that its own foundation is written in the same finite, uncertain ink as everything else, and to leave, deliberately, the door it cannot close standing open.
All that being said you should definitely read:
And then the discussion between us here on these well written articles:
For I would never say that his certainty of his uncertainty is wrong because then it wouldn’t be…
-Just A Reflection
