Once AI is smarter than us at everything, what are humans for? Not a capability we keep. A position no closed system can manufacture: the outside.
In July 2024, Nature published one of the strangest experimental results in machine learning. Ilia Shumailov and colleagues trained a language model on text, then trained the next generation on the first generation's output, and the next on that, recursively. The models did not explode. They did something worse: they grew smoothly, confidently incoherent, as the rare tails of the original distribution evaporated generation by generation. In the paper's most-quoted demonstration, text about medieval architecture went in; by the ninth generation, what came out was a list of jackrabbits (Shumailov et al., "AI models collapse when trained on recursively generated data," Nature 631, 2024).
No error message. No falling loss curve screaming for attention. Each generation was coherent on its own terms. The system had simply, politely, lost contact with reality, because nothing in the loop was anchored to anything outside the loop.
Hold that image, because it is the cleanest available answer to a question that is usually answered badly: once AI systems are smarter than us at essentially everything, what are humans actually for?
The standard answer is some arrangement of the words judgment, taste, creativity, and ethics. Humans will supply the ineffable. You can find this in the economics-of-AGI literature and a hundred think pieces, and notice its shape: it is a residual claim. Humans keep whatever AI has not matched yet. The authors usually concede the point in the same breath by calling the position "precarious," and they are right to, because a residual defined by current incapability shrinks by construction. If your account of human purpose erodes every time a model improves, it was never an account of human purpose. It was a countdown.
There is a different answer, and it does not shrink. It comes out of mathematical logic, complexity theory, and the empirical wreckage of that Nature paper, and it goes like this: the human's permanent job is not to be smarter than the system. It is to be outside the system. Verification, it turns out, is structurally relative to an outside vantage point, there are exactly two kinds of outside, and neither one can be manufactured from within. That job does not erode as capability rises. It appreciates.
The logical wall. Three theorems, each nearly a century old, draw a hard boundary around self-verification. Gödel's second incompleteness theorem: a consistent formal system strong enough for arithmetic cannot prove its own consistency. Tarski's undefinability theorem: no such system can even define its own truth predicate; "true in language L" is only expressible in a metalanguage above L. Truth is not just hard to certify from inside; it is not definable from inside. And Löb's theorem, the sharpest of the three for our purposes: a consistent system cannot adopt the blanket principle "whatever I prove is true" without becoming inconsistent. Researchers studying self-modifying agents formalized the consequence as the Löbian obstacle: a reasoner cannot, from within its own theory, certify that its successor reasoning in that same theory is sound. Self-trust is not a free axiom. It is formally obstructed, and the only exits are to climb to a stronger external theory or to retreat to probabilistic, defeasible confidence. Both exits import something from outside.
The empirical wall. Logic aside, consider a system that generates hypotheses, evaluates them, and decides what counts as evidence, all three at once. That is a closed epistemic loop, and a closed loop can optimize the only signal it has, internal consistency, indefinitely, without that consistency tracking anything beyond itself. There is no theorem of the form "sufficiently coherent implies true." Coherence is what a good theory and a good lie have in common. Intelligence is a machine for selecting among coherent stories; it has no internal preference for the true story over the more elegant one. The thing that breaks the loop is friction from a signal the system did not generate and cannot fully predict: an experiment that fails, a measurement that disagrees, a reader who says "that is not what happened." Model collapse is exactly this wall observed in a lab. The recursion in the Nature experiment was indiscriminate, and the authors and subsequent commentary are clear that mixing real, human-generated data back in mitigates the decay. That nuance is not a loophole; it is the thesis. The real data is the anchor. Remove the anchor and the drift is not a risk; it is the default trajectory.
If the training-scale version feels remote from your work, the same loop already runs at retrieval scale, on systems you may operate today. The webcomic XKCD coined "citogenesis" in 2011 for Wikipedia's circular-sourcing failure: an unsourced claim gets written into an article, a journalist repeats it, and the article then cites the journalist as its source. The claim is now load-bearing and self-supporting, with no contact point to reality anywhere in the cycle. An agent pipeline that writes its own generations into a memory store and later retrieves them as evidence is running exactly this experiment on itself, one document at a time. The countermeasure is the same at every scale: provenance tags that distinguish self-generated content from outside content, and a guaranteed minimum of the latter. A memory that cannot tell which of its facts it made up is a small, fast model-collapse machine.
"Doesn't Gödel prove humans are special?" No, and the failure of this argument is half the point. The Lucas-Penrose argument claims a human can "see" that a system's Gödel sentence is true while the system cannot prove it, therefore minds exceed machines. The consensus refutation: the human only knows the Gödel sentence is true conditional on the system's consistency, and humans have no privileged access to their own consistency either. Grant the machine the same conditional assumption and it derives the same conclusion. The self-verification walls bind every sufficiently strong reasoner, carbon or silicon. This matters because it relocates the human advantage. We are not above the walls. We are simply a different system, with different training data, a different architecture, and different failure modes. The human edge is not transcendence. It is decorrelation.
"Can't the AI just double-check its own work?" It can, and for one class of error this works: careless, stochastic slips, the cosmic rays of reasoning. Re-sampling catches those. But systematic error, the kind baked into the substrate by training, is precisely the kind a second instance of the same substrate will confidently reproduce. We ran into this directly in our own multi-agent work: on a reasoning battery, a majority vote across eight samples of the same model failed to beat a single greedy pass from that model, 0.714 against 0.720. Eight voters, one blind spot, unanimously. Asking the same system twice buys you redundancy against noise and nothing against bias. The errors that matter most are exactly the ones where your identical twin agrees with you.
"Then verification needs an even smarter checker, and we're in an infinite regress." This is the objection that complexity theory demolishes, and it is the most hopeful result in the whole stack. In interactive proof theory, a polynomial-time verifier, properly structured, can be convinced of the answers to PSPACE-complete problems by interrogating arbitrarily powerful provers: the celebrated IP = PSPACE result. In 2018, Geoffrey Irving, Paul Christiano, and Dario Amodei translated this into an alignment proposal, "AI Safety via Debate": two strong AI systems argue opposite sides, a much weaker judge picks the winner, and the structural asymmetry doing the work is that in a debate game, lying is harder than refuting a lie. A lie must be defended everywhere; a refutation only needs to win at one point, and the opponent is incentivized to find it. Their formal claim: debate with optimal play lets a polynomial-time judge correctly settle questions in PSPACE, where unaided judging reaches only NP (Irving et al., arXiv 1805.00899). Follow-up work has both sharpened the protocols (doubly-efficient debate, Brown-Cohen et al. 2023) and mapped the limits honestly: in empirical tests with weak language models judging strong ones, Kenton et al. (2024) found the scheme works across moderate gaps and degrades when the capability gulf gets too large. Scalable oversight is an open research front, not a solved problem. But the direction of the theory is unambiguous and deeply counterintuitive: the verifier does not need to match the prover. Verification and generation are not symmetric jobs. The judge's qualification is not brilliance. It is independence.
Put the walls and the objections together and the standard worry runs exactly backwards.
A more capable system is a better generator of coherent, confident, well-defended claims. When those claims are true, wonderful. When they are systematically wrong, the wrongness arrives wearing better arguments than before, and other systems sharing its substrate, its data, its objective, are progressively less able to catch it, because they share the very regularities that produced the error. Capability raises the production rate of plausible falsehood faster than it raises same-substrate detection. Which means the value of a genuinely decorrelated outside check does not decline as AI improves. It compounds.
This is why the human role is structural rather than residual. Not because humans will stay better at judgment, taste, or anything else. We will not. But "outside the loop" is not a capability. It cannot be trained into the loop, by definition. And on two specific questions, humans are not merely a useful outside but the only possible one. The first is grounding where no mechanical experiment exists, the long-horizon, unrepeatable, in-the-world questions where a differently-built mind reporting "that does not match the reality I live in" is the only out-of-model signal available. The second is values, where the situation is stronger still: on the question of what matters to humans, humans are not a sensor that might be replaced by a better sensor. We are the territory itself. An ASI deriving values in a closed loop, with no living human signal, is the jackrabbit failure on the one question where there is no external experiment to rescue it. Honesty requires the converse, too: where a mechanical oracle exists, humans are not needed in the loop, and pretending otherwise is sentimentality. A proof that type-checks does not care what anyone feels about it.
All of this compresses into one operating principle you can apply this week, whether you run a research lab, a fleet of agents, or a code review process. Route every claim to the most decorrelated verifier that suffices, in this order:
Mechanical oracles first. Proof checkers, compilers, test suites, reproduced experiments. Zero shared substrate with the generator, zero charm susceptibility. A type checker cannot be talked into anything. Push every claim you possibly can into this tier; this is the tier where AI legitimately self-serves.
Different substrate second. A reviewer from a different model family, a differently-built pipeline, or a structured adversarial debate. Not a fresh instance of the same model: a sibling shares the family blind spots. The point is independent failure modes, and debate-style protocols let even a weaker judge extract signal from stronger arguers.
Humans at the leaves. Reserved for what actually needs us: the grounding checks no oracle covers, the value questions where we are the referent, and randomized spot-checks that keep the whole pyramid honest.
| Tier | Verifier | Catches |
|---|---|---|
| 1. Mechanical oracle | Proof checker, compiler, test suite, reproduced experiment | Anything decidable by a tool with zero shared substrate |
| 2. Different substrate | Other model family, differently-built pipeline, adversarial debate | Systematic error a same-substrate sibling would reproduce |
| 3. Human at the leaf | A differently-built mind | Grounding with no mechanical oracle; value questions; spot-checks |
And then the meta-rule, which turns this from philosophy into instrumentation: measure the decorrelation. Track how often your checker disagrees with your generator, and what happens when it does. A reviewer that approves 99 percent of what the producer emits is not evidence of quality. It is evidence that producer and reviewer share failure modes, which is to say it is not a reviewer; it is an echo with a dashboard. In any verification system, persistent near-total agreement is the smoke alarm.
The question "what will humans be for?" assumed the answer had to be a capability, some summit we hold while the water rises. The mathematics says otherwise. Every closed mind, at every capability level, has the same two unfillable gaps: it cannot certify itself, and it cannot anchor itself. Something outside has to do both. That is not a job intelligence competes for. It is a job intelligence creates more of. The human is not the smartest verifier in the room, and never will be again. The human is the room's only exit to the outside, and the smarter the room gets, the more everything depends on the door.
Sources: Shumailov et al., "AI models collapse when trained on recursively generated data," Nature 631:755-759 (July 2024); the medieval-architecture-to-jackrabbits demonstration and the disappearing-tails finding, with the indiscriminate-recursion nuance per the authors and A. Borji's note (arXiv 2410.12954). Gödel's second incompleteness theorem; Tarski's undefinability of truth; Löb's theorem and the Löbian obstacle to self-trust as formalized in the agent-foundations literature (MIRI). Lucas-Penrose refutation per the standard consensus treatment. IP = PSPACE (Shamir 1992). Irving, Christiano & Amodei, "AI Safety via Debate" (arXiv 1805.00899, 2018), including the harder-to-lie-than-refute asymmetry and the PSPACE-judge claim; Brown-Cohen et al., doubly-efficient debate (arXiv 2311.14125); Kenton et al., "On scalable oversight with weak LLMs judging strong LLMs" (arXiv 2407.04622, 2024). The same-substrate majority-vote result (8-sample vote 0.714 vs single-pass 0.720) is from our own multi-agent experiments, reported as observed.
A memory that can't tell which of its facts it made up is a model-collapse machine.
The essay's countermeasure is provenance: tag what an agent generated itself versus what came from outside the loop, and guarantee a minimum of the outside. Chain of Consciousness is that provenance layer, a tamper-evident record of where each claim in an agent's memory actually came from, so retrieval can't quietly cite the agent back to itself. It's the anchor that keeps the loop in contact with the outside.
pip install chain-of-consciousness · npm install chain-of-consciousness
Hosted Chain of Consciousness → · See it in action