Will frontier models continue to provide the current quality?

In recent times, I have increasingly found myself debating with ChatGPT and Claude on various ideas that are bubbling in my head. But since I love to perform most of my discussion in incognito mode - keeps thing ephemeral - a concept I learned in software development and have come to love.

But keeping incognito means, I would lose all the insights that I built up during that discussion. So I decided to publish my discussion with claude verbatim. My thoughts are in italics and Claude's thoughts are penned under the heading Claude.

The entire discussion was done using Claude Opus 4.6

What do you think will be actual state of models accessible to us in few years?My hot take - Frontier models will go the same way social media did. Facebook was awesome in 2012. Today it still exists with high activity (but the quality has gone way down). In fact there is no true good social media left - nearly all are influencer driven marketing political propaganda platforms with social connections and networking as a side dish. And it all had to do with making platforms profitable.Similarly we are living in highly subsidised era of llms where top quality is available at relatively cheap cost (200 USD/mo claude code is still quite expensive but cheap relatively). I recently burned my hand by using claude code on bedrock api burning 100$ on almost trivial project which suggests that purely on inference too claude code is heavily subsidised.Pure inference will never happen because the world is changing and models need to be keep training to adapt to it. And hence models will only become bigger and more expensive. At some point investors will pull the plug on this perpetual subsidy and cost will rise (and simultaneously quality decrease due to quantization, routing and others). At that point users will start to explore other better value/cost options among cheaper distilled models. Causing market fragmentation with variety of tiers like we have for CRMs but none of the LLM will be at the quality we have today because profit economics will not support it.I suppose some open source initiative will take place to regularly train model financially supported by some big tech companies as it is a powerful tech but progress will be slower and of poorer quality. What's your take?

Claude:

This is a really interesting thesis, and I think parts of it are sharp while other parts have structural flaws in the analogy. Let me engage with it seriously.

Where I think you're right

The subsidization point is almost certainly correct. When you burned $100 on Bedrock for a trivial project, you were likely seeing something closer to the real cost of inference at scale. The consumer pricing is clearly a land-grab strategy — acquire users, build switching costs, then monetize. That playbook is well-established in tech.

And your instinct that "pure inference will never be enough" is important. Models do need continual retraining, RLHF refinement, safety work, infrastructure upgrades — it's not like shipping a static software product. The operational costs are ongoing and enormous.

Where I think the analogy breaks down

The social media degradation story is fundamentally about misaligned incentives between the user and the business model. Facebook's product is attention, and its customer is the advertiser. So optimizing for profit meant optimizing for engagement (outrage, addiction, shallow content) — which directly degrades user experience.

LLMs have a crucial structural difference: the business model and user value are more aligned. If I give you a worse answer, you have less reason to pay. There's no equivalent of "the ad buyer wants you doom-scrolling." The customer and the user are often the same entity, especially in the B2B/developer market which is where the real revenue is. A company paying for API access to automate engineering work cares about output quality — if quality drops, they leave.

This is more like the cloud computing market than social media. AWS, GCP, and Azure didn't get worse over time to become profitable. They got better, cheaper per unit, and found sustainable margins through scale and operational efficiency. The question is whether LLMs follow that curve.

The cost trajectory question

You're assuming models will "only become bigger and more expensive." I'd push back here. Several countervailing forces exist:

Hardware improvements — Each generation of GPUs/TPUs delivers more FLOPS per dollar. NVIDIA's margins are absurd right now, which means there's room for competition to drive hardware costs down.
Algorithmic efficiency — The research community is consistently finding ways to get more capability per parameter. Mixture-of-experts architectures, better training recipes, improved data curation — these compound. GPT-4 level capability in 2023 required enormous compute; something comparable might require a fraction within a few years.
Inference optimization — Speculative decoding, quantization that doesn't meaningfully hurt quality (not all quantization is destructive), KV-cache sharing, batching improvements. These are active engineering problems with steady progress.
Distillation done well — You frame distillation as necessarily degrading. But the Llama/Qwen/Mistral trajectory suggests distilled and smaller models can capture 80-90% of frontier capability at radically lower cost. That's not fragmentation — that's a healthy market.

The scenario I think is more likely

Rather than a uniform degradation, I'd expect tiering that actually works:

A handful of frontier labs (2-4) maintain cutting-edge models and charge premium prices, primarily to enterprises. Think of it like the database market — Oracle is expensive and powerful, PostgreSQL is free and excellent for most use cases, and there's a whole spectrum in between. The existence of expensive options didn't prevent good cheap options from emerging.

Open-source models will likely be "good enough" for 80%+ of use cases within a few years, backed by Meta, Google, and others who have strategic reasons to commoditize the inference layer (Meta wants AI everywhere to drive engagement on its platforms; Google wants to prevent anyone from owning the AI distribution layer).

The real risk I see isn't degradation of model quality — it's censorship creep and behavioral restriction. As models become more commercially important, the pressure to avoid liability, controversy, and regulatory risk increases. That's the analogy to social media that actually holds: not "the product gets worse because of ads" but "the product gets blander because of risk aversion." You can already see this in how models handle anything remotely sensitive.

Where I'm genuinely uncertain

The energy/compute scaling question is the wild card. If training frontier models starts requiring dedicated power plants (which is already being discussed), and if efficiency gains don't keep pace, then your thesis about unsustainable costs has real teeth. The question is whether the S-curve of capability improvement flattens before or after the economics become self-sustaining through genuine productivity gains.

I disagree on a core assumption you made. You argue that llms are inherently intelligent. But their intelligence is in proportion to training data that is fed in them and amount of compute spent to minimize the loss functions.

LLMs are often bad at the out of training data world generalization. But the out of world data will keep increasing at exponential pace at world evolves. And llms have to take into account what has happened yet and what will happen in future. Making training progressively expensive.

And inference will also have to be done on newer trained models because users will expect that which will mean inference will never become too cheap.

Only way to control cost would be to make strategic business choices like make models only for coding, writing, video etc but we all know they are not as good compared to general purpose models.

Hence the quality is bound to take a dip. Especially when you cut corners on model size, competition shows up cutting the profit margins creating a downward pressure on greedy business decisions. We have seen this in nearly all sectors, F&B, telecom, social media, etc.

Your analogy with cloud is poorer than my analogy with social media because value of compute is measurable - what tasks I can achieve with given compute and ram is well known and understood. While experience with LLM is highly subjective - and this subjectivity will force models to choose for cheap unless true ROI is justified leading to fragmented market. But VCs backing Anthropic and Oai play the market dominance game - if they can't dominate the market funding will stop leading to decrease in improvements on LLM + generally poorer quality of models today

Claude:

You're making me work here, which I appreciate. Let me take your points seriously one at a time, because I think you're right on some and wrong on others in ways worth untangling.

On "LLM intelligence is proportional to training data + compute"

You're correct that current LLMs aren't "inherently intelligent" in some magical sense. But I think you're overstating the implication. The claim isn't that they generalize perfectly — it's that they generalize surprisingly well within broad domains, and that this generalization improves with scale in ways we didn't fully predict. The "bitter lesson" from Rich Sutton's famous essay is precisely that general methods leveraging computation tend to win over clever hand-engineering.

But here's where your point has real teeth: you're essentially arguing that the maintenance cost of keeping a model "current" grows unboundedly. The world generates more information, more code libraries, more events, more context — and the model has to absorb all of it. This is a genuinely strong argument.

However, I'd push back on the mechanism. Models don't need to retrain from scratch to stay current. RAG, tool use, web search, fine-tuning on deltas, continual learning research — these are all approaches that decouple "knowing about the world" from "having been trained on everything." When you use Claude with web search right now, it's already a hybrid system. The trend is toward models as reasoning engines that access current information through tools, not as static knowledge bases that must be periodically retrained on everything. If that architecture wins (and it seems to be winning), your "exponentially growing training cost" argument weakens significantly.

On inference never getting cheap

This is where I think you're conflating two things. Yes, users will expect newer models, and newer models may be larger. But "newer" doesn't necessarily mean "more expensive to run." Consider: Claude 3.5 Sonnet was dramatically cheaper to run than Claude 3 Opus while being better at most tasks. That's not an anomaly — it's the expected trajectory of engineering optimization. You can get more capability per FLOP through better architecture, not just more FLOPs.

The real question is whether the capability frontier advances faster than efficiency gains. If it does, you're right — costs rise. If efficiency keeps pace or wins, costs fall even as capability improves. Historically in computing, efficiency has won over multi-decade timescales. But I'll grant you that LLMs are young enough that we can't be confident this pattern holds.

On your critique of the cloud analogy — this is your strongest point

I actually think you're largely right here, and I want to give you credit. The measurability argument is sharp. With cloud compute, I know exactly what I'm getting: X vCPUs, Y GB RAM, Z IOPS. I can benchmark it. I can compare providers on objective metrics. Switching costs are real but fundamentally about engineering effort, not about subjective quality differences.

With LLMs, "quality" is deeply contextual and hard to measure. Is Claude better than GPT-4 for your coding task? Depends on the task, the prompt, the day. This subjectivity creates exactly the market dynamics you're describing — customers can't easily tell the difference between "90% as good" and "100% as good," so they optimize on price, which creates a race to the bottom.

But here's where I'd refine your argument rather than accept it wholesale. This "subjective quality" problem cuts both ways. Yes, it means users might settle for cheaper, worse models. But it also means that a provider who delivers noticeably better results for high-value tasks commands enormous pricing power. If Claude saves a senior engineer 2 hours a day, the $200/month is trivial against their salary. The subjectivity doesn't just push toward cheap — it creates a bimodal market where some users are extremely price-sensitive and others are extremely quality-sensitive.

On the VC/market dominance argument

You're describing a scenario where: VCs fund → companies chase dominance → dominance doesn't materialize → funding dries up → quality drops. This is a legitimate risk and has played out in other sectors (meal delivery, scooters, etc.).

But there's a critical difference: LLMs are becoming infrastructure, not consumer products. When VC funding dried up for Uber-like companies, the underlying service (getting from A to B) didn't disappear — it just repriced. Similarly, even if Anthropic or OpenAI face funding crunches, the technology doesn't vanish. The trained models exist. The research is published. The open-source ecosystem has already absorbed enormous knowledge.

The meal delivery analogy would suggest: prices rise, some companies die, survivors find sustainable margins, consumers pay more but the product persists. That's not the "Facebook degradation" story — it's a normal market maturation story.

Where I think your thesis is actually most compelling, though you didn't emphasize it enough

The strongest version of your argument isn't about cost — it's about what happens to quality when you optimize for the median user. As LLMs scale to billions of users, the pressure is to optimize for the most common use cases. The long tail of expert, nuanced, domain-specific quality gets sacrificed because it doesn't move aggregate metrics. This is the true parallel to social media — not ads, but the tyranny of engagement metrics flattening everything toward the median.

You can already see hints of this: models getting more "helpful" and "friendly" at the cost of precision. More verbose. More hedging. More "as an AI." That trajectory, driven by scale economics, is the real threat to quality.

Here's a question to sharpen your own thinking:

Your thesis depends heavily on training costs growing unboundedly. But consider this concrete scenario: Meta releases Llama 4 (or 5, 6...) as open source. They absorb the training cost because commoditizing LLMs serves their strategic interest (keeping AI out of competitors' moats). Meanwhile, a vibrant ecosystem fine-tunes these models cheaply for specific domains.

In that world, the "VC plug gets pulled" scenario doesn't matter — the training cost is borne by a company with $40B+ annual profit doing it for strategic reasons, not ROI on the model itself. Does your thesis survive in a world where the biggest model trainer doesn't need the model to be directly profitable? That's the scenario I'd want you to stress-test, because I think it's the strongest counterexample to your overall framework.

Yes but you are wrong in pricing claude code for 200 dollar per month. That is highly subsidised. I can't comment on actual cost but the high price is very high given the cost of inference on API. You have access to data, you can do the math.

Also if RAG and all were replacement to training new models we would be seeing more of them instead of race to still train better models like GPT5.3 and opus 4.6.

I am sure it is empirically known that users in general prefer better models compared to older models in quality, only cut back when price is involved.

And when price increases they will cut back even more leading to drop in experience of quality.

I never argued LLMs are gone and nobody will use them, I just argued the quality of models that we are given for so less prices are unsustainable and when capitalistic market will turn to profit expansion over market consolidation drop in quality is unavoidable.

Claude:

Alright, you're sharpening the argument well. Let me do the math you're asking for, and then be honest about where this leaves us.

The Claude Code subsidy math

Let me work with rough but reasonable numbers. A moderately active developer using Claude Code might generate something like 5-10M input tokens and 1-2M output tokens per day (context loading, file reads, multi-step reasoning, code generation). That's conservative for someone using it as a primary tool.

Over 20 working days per month, call it 150M input tokens and 30M output tokens.

At Sonnet 4 API pricing ($3/MTok input, $15/MTok output): roughly $450 input + $450 output = ~$900/month. At Opus pricing ($15/$75): you're looking at $4,500/month.

And a power user — someone doing large refactors, multi-file edits, running agentic loops — could easily double or triple that. Your $100 Bedrock burn on a "trivial project" is entirely consistent with this.

So yes, $200/month for Claude Code is subsidized by a factor of maybe 4-20x depending on usage intensity and which model is being routed to. You're right.

On RAG vs. training — conceding the point

You caught me being too clever. You're correct: if RAG + tool use were truly sufficient substitutes for better base models, the multi-billion dollar training runs wouldn't be happening. The fact that every major lab is still racing to train larger, better foundation models tells you something the research papers don't always say clearly — RAG patches knowledge gaps but doesn't substitute for deeper reasoning capability, coherence over long contexts, or the kind of "understanding" that emerges from training. The training race is the revealed preference of the entire industry, and it says: base model quality still matters enormously and can't be cheaply approximated.

I was wrong to lean on that as a counterargument. Fair point.

On your refined thesis

Let me restate it to make sure I've got it, because I think you've actually narrowed it to something much more defensible than where you started:

"The current quality-to-price ratio of frontier LLMs is artificially favorable due to VC subsidization. When market dynamics shift from growth to profit extraction, the price will rise and/or the quality will decline. Users will respond by downgrading to cheaper alternatives, fragmenting the market. The net result is that the average quality of LLMs people actually use will be lower than what's available today at today's prices."

If that's your claim — I think you're substantially correct, with one important caveat.

The caveat: "quality people actually use" vs. "quality that exists"

Your thesis is really about access economics, not about the technology itself. You're not saying frontier capability will regress — you're saying the affordable tier will be worse than what we currently get at subsidized prices. That's a much stronger and more defensible claim.

And it maps well onto real precedents. Streaming services gave you everything for $8/month during the land grab. Now it's $15-20/month each across five platforms with worse catalogs. The total content is better than ever, but the experience at a given price point degraded. That's a tighter analogy than social media, and I think it supports your argument better.

Where I still push back — but only partially

The one force that doesn't exist in streaming/telecom/social media but does exist in AI is open-source with deep-pocketed strategic backers. Meta has spent billions training and releasing Llama models for free. Their incentive isn't LLM revenue — it's preventing any single company from owning the AI layer. That strategic dynamic doesn't have a parallel in your other examples (nobody open-sourced a competing telecom network or streaming catalog).

But I'll concede: open-source models are typically 6-18 months behind frontier, and your argument is specifically about frontier quality at accessible prices. Open source narrows the gap but may never fully close it.

Honest updated position: I think you're about 70-75% right. The subsidized golden age will end, average user experience will degrade relative to today's price point, and market fragmentation will happen. Where I maintain skepticism is on the degree — I think the floor stays higher than you might expect because the enterprise use case (where ROI is clear and measurable) sustains a viable market for genuinely good models, and that trickles down.

I think ROI on enterprise is also weak. LLMs add a lot of value and will stay. But, many enterprise has real world problems like dealing with legacy data digitized from printed records, complex human interactions, and simply unique edge cases and vocabulary - something llms may not be trained on because they are propreitary.

Also, you may argue that llm providers can custom train on those datasets under enterprise agreement. But, I will argue back that in many cases dataset size is not that big (its big in traditional sense, but not too big that they can provide large LLMs with truly intelligent response properties). So even enterprise license will move towards price efficiency. After all models are not moat in restricted domain. This is all but established. I still think big enterprises like Google, microsoft etc. will have LLM offering with enterprise deals but labs working on frontier models will downsize or cease to exist. Maybe a csr initiative like meta's llama will persist (maybe by group of companies under independent foundation like linux foundation but we won't be seeing this mad race anymore.

Claude:

You're painting a picture that's becoming increasingly coherent, and I want to be honest — I find myself agreeing with more of it than I expected when we started.

The enterprise ROI argument is underappreciated

Your point about legacy data, proprietary vocabulary, and domain-specific edge cases is something the AI hype cycle consistently glosses over. I've seen this discussed far less than it deserves.

Consider a concrete example: a large insurance company with 30 years of claims data, half of it digitized from handwritten adjuster notes with inconsistent terminology, regional jargon, and implicit institutional knowledge. An LLM can help — summarization, search, classification. But the gap between "demo impressive" and "production reliable" is enormous. The model hallucinates on edge cases. It doesn't know that "Code 7R" in the Northeast region means something different than in the Southwest because that was a regional manager's convention from 2003. Fine-tuning helps, but as you correctly note, the proprietary dataset is often too small to meaningfully shift a 70B+ parameter model's behavior in deep ways.

So what happens? The enterprise buys a cheaper, smaller model, wraps it in a pipeline with traditional software (rules engines, lookup tables, human review queues), and the LLM becomes one component — valuable but not magical. And when the LLM is just one component, you optimize it on cost like any other input. You don't pay frontier prices for a part of a pipeline.

Your implicit model of market evolution

Let me try to formalize what I think you're actually describing, because I think you've arrived at something specific:

Phase 1 (now): VC-subsidized frontier race. Users get extraordinary quality at artificial prices. Labs burn cash pursuing market dominance.

Phase 2 (2-4 years): Investors demand returns. Prices rise. Some labs consolidate or fold. Users discover that 80% of their use cases work fine on cheaper models.

Phase 3 (5+ years): LLMs commoditize into infrastructure. Big tech (Google, Microsoft, Meta) maintains offerings as loss leaders or strategic tools bundled into their ecosystems — the way Google gives away Gmail to dominate advertising. Independent frontier labs (Anthropic, OpenAI as independent entities) either get acquired, find sustainable niches, or shrink significantly. A Linux Foundation-style consortium maybe maintains an open frontier model, but progress slows because nobody's burning $10B/year on training runs for charitable purposes.

The end state: LLMs are everywhere, genuinely useful, but the quality plateau is below where frontier models are today at consumer-accessible prices. The "magical" feeling of 2024-2025 becomes nostalgic, like people reminiscing about early Google search before SEO ruined it.

Is that a fair summary of your position?

Where I think this is strongest: The independent frontier lab argument. You're essentially saying that Anthropic, OpenAI, and similar pure-play AI labs face the same structural problem as any capital-intensive startup without a diversified revenue base. They need massive ongoing investment just to stay competitive, and the moment investors lose patience, the model degrades. Google and Microsoft can absorb AI costs into trillion-dollar businesses. Anthropic cannot — at least not at the current burn rate. That's a genuinely precarious position, and I think you're right to flag it.

Where I think you might be underweighting one thing

You're treating LLMs as though their value proposition is relatively static — "they help with tasks, the question is just how much and at what price." But there's a scenario where LLMs (or their successors) cross a threshold into genuine autonomous agency — where they don't just assist with code but ship features, don't just summarize documents but make decisions. If that happens, the ROI calculus changes completely because you're not comparing "$200/month tool vs. $50/month tool" — you're comparing "autonomous worker vs. human salary."

I'm not saying this definitely happens. But if it does, your entire framework shifts because the willingness to pay becomes anchored to labour replacement rather than tool utility, and that's a fundamentally different (and much larger) market.

However — and this supports your thesis — if that threshold isn't crossed, then everything you're saying follows logically. LLMs remain very good tools, tools get commoditized, margins compress, quality at accessible price points declines.

Page updated

Report abuse