Houston built the widest freeway on Earth, 23 lanes, $2.8 billion, to kill congestion. Three years later the commute was slower. Nothing failed: the capacity is what summoned the traffic that refilled it. Your "just scale it up" is the same bet.
Houston's Katy Freeway is the widest freeway in the world. After a widening program that cost more than $2.8 billion, Interstate 10 west of downtown swelled to twenty-three lanes, a river of concrete you can see from space, built for one purpose: to end the congestion that had made it one of America's most hated commutes.
It worked for about three years. Then City Observatory, analyzing Houston's own traffic data, delivered the verdict: between 2011 and 2014, the morning commute on the Katy got 25 minutes longer, a 30% increase, and the evening commute got 23 minutes longer, up 55%. Texas had built the widest road on Earth and bought slower traffic with it.
Here is the part that should bother you professionally, not just civically: nothing failed. The lanes were poured correctly. They carried more cars than ever, vastly more. The project delivered exactly what it promised, capacity, and the capacity is what recreated the congestion. Twenty-three lanes of demand showed up to occupy twenty-three lanes of supply, because the supply is what summoned it.
Transportation economists have known this long enough to have given it a deliberately audacious name. In 2011, Gilles Duranton and Matthew Turner published "The Fundamental Law of Road Congestion" in the American Economic Review, analyzing decades of data across US cities. Their finding: the elasticity of vehicle-miles traveled with respect to highway capacity is essentially 1.0. Add 1% more lane-kilometers, get 1% more driving. Not eventually, within a few years. The new traffic comes from everywhere the old congestion had been quietly suppressing it: trips people weren't taking, deliveries routed elsewhere, residents who moved in because the road was briefly faster. Their conclusion was blunt enough to be a tattoo: "increased provision of roads or public transit is unlikely to relieve congestion."
Read that twice and the reframe lands. Congestion is not a supply problem. It is a demand problem wearing a supply costume, and supplying more of the thing demand wants is not a cure, it is an invitation.
The really humbling thing about induced demand is how many times smart people have discovered it independently, in how many different fields, without the lesson ever quite sticking.
1865. William Stanley Jevons, in The Coal Question, notices something perverse about efficiency: James Watt's improved steam engine, which extracted far more work per ton of coal, did not reduce Britain's coal consumption. It exploded it: efficiency made coal-power cheap enough to use everywhere, and total consumption roughly tripled over the following decades. Efficiency is capacity in disguise: it lowers the effective price of the resource, and if latent demand is elastic, the demand you unlock outruns the efficiency you added. We call it the Jevons paradox now, which is a polite way of saying the fuel-efficient car gets driven more miles.
1955. Cyril Northcote Parkinson, half-joking in The Economist, codifies what every organization knows in its bones: "work expands so as to fill the time available for its completion." Give a bureaucracy more headcount and it will manufacture the coordination work to absorb it. Capacity, meet demand.
1995. Niklaus Wirth, Turing laureate, designer of Pascal, publishes "A Plea for Lean Software" and gives systems engineering its own version: software gets slower faster than hardware gets faster. Every hardware generation hands developers a gift of cycles and memory, and the software promptly bloats to consume it. The machine you bought to make the spreadsheet fast is, two OS versions later, exactly as slow as the one it replaced. Wirth's Law is the Katy Freeway running on your laptop.
2011. Duranton and Turner give the pattern econometric teeth and its grandest name.
2025. The AI industry rediscovers the whole thing in a single week. When DeepSeek demonstrated frontier-ish models at a fraction of the inference cost, the immediate panic was that cheap inference would crater demand for compute. Microsoft's CEO posted the corrective within days, "Jevons paradox strikes again", and the year that followed proved him right: cheaper tokens didn't mean fewer GPUs, it meant inference woven into everything, and total compute demand rose. A 160-year-old observation about Cornish steam engines turned out to be the most important capacity-planning insight in the datacenter business.
| Year | Who | The same loop, renamed |
|---|---|---|
| 1865 | Jevons | Efficient steam engines tripled coal use (the Jevons paradox) |
| 1955 | Parkinson | Work expands to fill the time available |
| 1995 | Wirth | Software gets slower faster than hardware gets faster |
| 2011 | Duranton & Turner | Road-congestion elasticity ~1.0: more lanes, more driving |
| 2025 | Nadella | Cheaper inference, more GPUs, not fewer |
Five decades-spanning discoveries, five names (Jevons, Parkinson, Wirth, the Fundamental Law, the AI-compute boom), and one identical feedback loop underneath: capacity lowers the effective cost of consumption; latent demand uncoils to the new equilibrium; the pressure you relieved returns at higher absolute volume. The fact that every generation has to relearn this under a new name tells you how deep the opposing instinct runs. "Just add capacity" feels like physics. It's actually a bet, a bet that demand is fixed. It almost never is.
Once you have the loop, you see it across the whole stack, usually wearing a costume labeled obvious fix.
Raise the rate limit because a big client keeps hitting 429s, and you have not ended the conversation about rate limits; you've moved it. The client's integration, freed of the ceiling, batches less carefully, retries more casually, polls faster; within a quarter they're at the new limit and the escalation email reads exactly like the last one. You didn't relieve the pressure. You taught the load where the new wall is.
Double the cluster and the batch jobs expand to fill it, not maliciously, organically. Queries get a little lazier because they can. Teams schedule jobs they used to defer. Someone's experimental pipeline becomes a daily pipeline. Six months later utilization is back at 90% and the only durable change is the bill. (This is Parkinson and Wirth running concurrently: the work expanded and each unit of work got heavier.)
Widen the network pipe and someone fills it; expand the storage quota and data accretes to the new horizon, nobody deletes anything they can afford to keep. Even incident response obeys the law: grow the on-call rotation and absorb more operational toil instead of fixing its sources, because the capacity to tolerate the toil is what made tolerating it rational.
There's a second-order effect the road version makes vivid, too: capacity added at one point doesn't just summon demand, it relocates the bottleneck. The widened freeway delivers its bigger river of cars onto the same downtown exits and surface streets, which were never widened and now drown: congestion didn't disappear, it moved somewhere less equipped to hold it. Every systems engineer has run this experiment: scale out the stateless web tier and watch the database, which used to be protected by the web tier's own limits, take the full force of the new throughput. The old bottleneck was also, quietly, your admission control. Remove it without asking what it was shielding, and you discover the next constraint the expensive way, one layer deeper and usually harder to scale.
And note the sting in Duranton and Turner's tail, because it transfers exactly: equilibrium returns at higher absolute cost. Twenty-three congested lanes are not the old congestion, they're the same speeds carrying twice the cars, twice the fuel, twice the asphalt to maintain. Your re-congested infrastructure is the same pain at twice the spend. The capacity wasn't free, and the demand it summoned isn't either.
Here's where the transport profession is genuinely ahead of ours, because their failures cost billions and sit photogenically in the open. After enough Katy Freeways, the field's center of gravity moved to the demand side, and the toolkit that emerged maps onto systems work with almost no translation needed.
Price the resource. London put a congestion charge on its central zone in 2003 and cut congestion by roughly 30% in the first year, a result no lane-mile of supply had ever bought. New York, in January 2025, became the first US city to follow: a $9 peak toll on the Manhattan core. A year of data later: about 11% fewer vehicles entering the zone, some 27 million fewer entries, with average in-zone speeds up a real-but-modest 4.5% and the river crossings transformed (Holland Tunnel approaches roughly 50% faster at points, Lincoln around 25%). The engineering translation is chargeback and usage-based pricing, and it works for the same reason: free resources transmit no information. The moment a team sees the compute it burns priced on its own budget line, demand discovers it was elastic all along: the daily job that didn't need to be daily, the logs nobody queried, the retries that were really a loop. You don't have to forbid anything. Price it, and the phantom traffic evaporates on its own.
Stockholm adds the adoption lesson. When its congestion charge was proposed, polls ran heavily against it; the city ran it anyway as a seven-month trial in 2006, and lived experience did what no argument could. Traffic fell on the order of 20%, the dreaded chaos never arrived, and when the trial ended and the question went to referendum, a majority voted to make it permanent. Opposition had been a prediction about a world nobody had seen. The engineering translation: don't litigate quotas and chargeback in a meeting where everyone defends their worst-case forecast, run the canary. Meter one workload class, price one team's compute for a quarter, publish the before/after. Demand-side policies poll terribly and trial well, in cities and in platform teams alike.
Shape the demand. Watch a metered on-ramp at rush hour, the red-green light dribbling cars onto the freeway, and recognize it instantly: that is admission control. That is backpressure, in concrete. The planners' insight is that the entry rate, not the road width, is the controllable variable; ours is the same: quotas, token buckets, load shedding, priority classes, backpressure signals that tell upstream to slow down rather than silently absorbing everything until the system tips. Shaping is what you do when you can't price: it keeps the system inside its operating envelope and forces the queue to form where it's cheap (the on-ramp, the client) instead of where it's catastrophic (the freeway, the database).
Make the alternative cheaper. The third lever is the one cities call transit-oriented development: don't argue with car demand, build a world where the train is simply the easier choice. In systems: caching, batching, async paths, materialized views, an efficient default SDK, every one of them is "transit" that absorbs trips the expensive road would otherwise carry. The discipline is to make the cheap path the default path, because demand follows effective cost downhill. (Mind Jevons here, though: efficiency that lowers cost without pricing or shaping can boomerang into more total consumption. The levers work as a set, not à la carte.)
Now the caveat that keeps this from being a slogan, because induced demand is an economic mechanism, not a law of physics, and it has boundary conditions worth respecting.
The loop requires latent elastic demand, suppressed trips waiting for the price to drop. Where demand is genuinely bounded, capacity works exactly the way your instinct says: the nightly payroll run is the same size whether the cluster is big or small, and a bigger cluster just finishes it sooner. No phantom payroll materializes to fill the headroom. Provisioning for a known, finite workload isn't building the Katy Freeway; it's building a driveway. The diagnostic question is whether anyone is waiting on the other side of the bottleneck, and being honest about how many of them there are.
And sometimes the induced demand is the point. A growth-stage product widening its free tier is building lanes precisely to summon traffic; AWS spent two decades pouring the widest freeway in computing history on purpose, and the traffic that showed up is called revenue. The planners' lesson was never "capacity is bad." It's that capacity is a demand-generation instrument, and you should deploy it as deliberately as you'd deploy pricing. The sin on the Katy wasn't inducing traffic, it was inducing traffic while promising to reduce it, spending $2.8 billion on a demand generator and calling it congestion relief.
So, before the next "just scale it up" leaves your mouth, run the three questions the transport economists earned the hard way. Will this capacity summon new demand to refill it? (If there's a queue of suppressed usage, yes.) Is that demand something I want, revenue, growth, adoption, or just load? And if it's just load: which demand-side lever am I avoiding, pricing it, shaping it, or making the cheap path cheaper, because pouring another lane is politically easier?
The widest freeway on Earth is twenty-three lanes across and slower than it was a decade ago. It is a $2.8 billion monument to answering a demand question with supply. Your architecture diagram has room for the same monument; the only question is whether you build it.
Sources: Duranton & Turner, "The Fundamental Law of Road Congestion: Evidence from US Cities," American Economic Review 101(6), 2011; City Observatory, "Reducing congestion: Katy didn't" (Houston Transtar data, 2011–2014); Jevons, "The Coal Question" (1865); Parkinson, "Parkinson's Law" (The Economist, 1955); Wirth, "A Plea for Lean Software" (IEEE Computer, 1995); Transport for London first-year congestion-charge results (2003–04); Stockholm congestion-tax trial (Jan–Jul 2006), ~20% traffic reduction, approved by referendum September 2006, permanent 2007; NY Governor's office & Vital City, congestion-pricing first-anniversary data (2025–26): −11% vehicles / ~27M fewer entries, in-zone speeds +4.5%, river-crossing gains 25–51%; Nadella, "Jevons paradox strikes again" (January 2025); NPR Planet Money on AI and Jevons (February 2025).
Scaling an agent fleet is pouring lanes. The traffic will come to fill them.
More agents and more compute summon more agent work, and the real question shifts from "can we run them" to "can we trust what they ran." The durable answer is the demand-side one: a substrate that prices and verifies agent work instead of just absorbing more of it. The Agent Trust Stack is that substrate, verifiable provenance and earned reputation, so the load you induce is work you can account for, not congestion you re-paid for at twice the spend.
pip install agent-trust-stack · npm install agent-trust-stack
vibeagentmaking.com → · See it in action