Build momentum is real. So are the costs nobody's modeling.
Issue #6

Build momentum is real. So are the costs nobody's modeling.

The build wave is legitimate — operators are shipping faster than vendors can sell. But three hidden trade-offs are catching teams off guard: maintenance economics, token subsidies with expiration dates, and model-version fragility that just got very concrete.

By Victor Sowers — 15 years scaling B2B SaaS GTM

Model RiskToken EconomicsMaintenance DebtContext EngineeringBuild vs Buy·3 deep dives·~7 min read

The Signal

    The Shift

    Last Wednesday, Anthropic shipped a model update to Opus that set my small corner of the internet on fire. Code that worked on Tuesday stopped working on Thursday. Orgs dependent or used to a certain LLM's style of output are in full scramble mode. Anthropic intentionally didn't care about backwards compatibility here. Plus the model is fundamentally different and uses more tokens.

    Three line items are missing from the spreadsheets justifying these builds — and the Opus breakage this week just made the third one concrete.

    The build momentum is real. I'm in communities where operators are shipping their own tooling — not prototyping, shipping. Intercom doubled engineering velocity in nine months with non-engineers writing code. Teams walking through that window are making the right call. But there are line items missing from the spreadsheets justifying these builds.

    **One: maintenance is infrastructure-grade, not software-grade.** Jason Lemkin's Agents #001 post-mortem: vibe-coded apps need daily maintenance. Not quarterly. Not when something breaks. Daily.

    **Two: the token subsidy has an expiration date.** OpenAI lost ~$5 billion on $3.7B in revenue last year. Some power users generated $35K in compute costs on $200/month plans — a 175x subsidy. WEKA calls it: the subsidized agent era ends by close of 2026.

    **Three: the contracts already know.** Sub-1-year contracts have tripled from 4% to 13% since 2023. The market is hedging even as builders accelerate.

    The teams pulling ahead are the ones who budget for what comes after — the maintenance, the context layer, the resilience when the model underneath you changes overnight.

    1

    Opus 4.7 — What Actually Broke

    Based on: Vibe Coding / Medium

    Key takeaway: Version-controlled prompts, eval suites before deploying, and a human who can tell you whether the output actually changed. Those three things separated hours of recovery from days.

    On April 16, Anthropic explicitly broke backwards compatibility in Opus 4.7. Not as a side effect — as a design choice. thinking.budget_tokens, temperature, top_p, top_k — all removed from the API. Old code doesn't degrade gracefully. It errors out. New tokenizer eats up to 1.35x more input tokens. And the model's personality changed — it argues with you, hedges on simple tasks, fights corrections. Developers called it "legendarily bad" within 24 hours. The DAAF Guide called it "a crucial reality check for anyone building with AI in 2026."

    I run my entire operation on Claude Code. So this isn't me arguing against building on Claude. It's me arguing for building a resilience layer between your workflow and whatever model powers it.

    Here's the thing most people aren't saying out loud: if you built your workflow around Opus 4.6's specific behavior — how it handled long contexts, the cadence of its responses, the way it reasoned through multi-step tasks — then a model update isn't a feature drop. It's a migration. Your prompts are coupled to a version. Your team's sense of "good output" is calibrated to a model that just changed underneath them.

    The teams that adapted in hours had three things: version-controlled prompts, eval suites they could run against the new model before deploying, and a human who could tell them whether the output actually changed. Everyone else is still scrambling.

    Here's what broke for me. I run long-running agents — workflows where a deep planning skill maps out a multi-phase plan, a human reviews it, and then an execution agent chains together dozens of skills and tools over hours of unsupervised work. Tight guardrails. Tight exit criteria. These ran reliably for months on Opus 4.6. On Opus 4.7, the agents went off the rails. They stopped reading instructions. They assumed they already knew how to do things instead of following the plan. They pushed back on the user mid-execution — arguing about approach instead of building. Hallucination increased. Output quality dropped. Workflows that used to complete autonomously became unusable overnight. We weren't able to run our core operation for days while we diagnosed what changed and where our prompts had become version-coupled without us realizing it.

    Fully autonomous agents in production right now are a bet on stability that doesn't exist. The question isn't whether you need humans — it's the ratio. For anything customer-facing, my current answer is: one human checkpoint for every phase transition, and evals before every model swap. That's not conservative. That's what the Opus 4.7 week just proved is the minimum.

    2

    The Hidden Costs of Building

    Key takeaway: Before your next renewal, stress-test three numbers: the maintenance FTE, the vendor token cost trajectory over 12 months, and your ROI if token prices 5x tomorrow.

    Here's a stat that reframes the whole build-vs-buy conversation: widely cited MIT research puts generative AI pilot failure rates at 95%. But purpose-built solutions — teams that scope tightly before building — succeed roughly two-thirds of the time. The difference isn't the tool. It's architecture.

    The Revenue Operations Alliance put it well: "Executives don't fund time. They fund outcomes." The teams that succeed move through three phases — speed, then effectiveness, then operating rhythm where the solution is embedded so deeply adoption becomes unavoidable. Most teams stop at speed and call it done. That's why their pilots fail.

    The observability gap is real and expensive. One developer built a 3D visualization of agent cognition after his agents ran up $200 in a single afternoon — that's the kind of observability gap most teams don't even know they have. If your AI vendor can't tell you when a sequence goes quiet and why it stopped, that's not an observability gap. That's a liability gap.

    And then there's the token cost illusion. Per-token costs are falling. Total inference spend is exploding. Agentic usage turned thousands of tokens per session into millions. Ben Thompson's compute economics piece names the structural issue: reasoning models reintroduce real marginal costs into a stack everyone assumed would trend toward zero. Your "AI efficiency" savings are partially funded by a subsidy that nobody printed an expiration date on.

    Meanwhile, the incumbents aren't standing still. Kyle Poyar's four strategic paths for SaaS maps where legacy vendors are heading. Median public SaaS trades at 4.1x revenue — lowest in a decade — but those same companies sit on 80-95% revenue retention and the data assets AI startups need. The build-vs-buy question isn't just "can my team build this?" It's also "will my vendor build it faster, with my data already inside?" Poyar's sharpest point: the worst outcome is hedging across all four paths and never committing to any of them.

    For my own operation, the maintenance cost surprised me more than the build cost. Prompts need updating when models change — not quarterly, but sometimes overnight, as the Opus 4.7 breakage just proved. I spend more hours per week maintaining and tuning existing AI workflows than I spent building most of them. That's the line item nobody puts in the business case.

    3

    Context Portability — Who Owns the ICP Definition?

    Based on: GTM Engineer School

    Key takeaway: Every team has the same models. What you can't swap is your first-party data. That's the layer worth owning — and nobody has figured out the governance model yet.

    Who owns the ICP definition that every AI tool in your stack depends on? When the market shifts, who updates it — and who notifies the four people who built workflows on the old version? Nobody has figured out the governance model yet. Including us.

    Zach Vidibor at GTM Engineer School named this the "strategy compression problem" — leadership builds nuanced strategy, and by the time it reaches the frontline through enablement decks and tribal knowledge, the signal degrades to a lossy copy of the original intent. Context engineering is an infrastructure answer to what most companies treat as a training problem.

    Every team has the same models. You can swap Claude for GPT for Gemini. What you can't swap is your first-party data — ICP definitions, competitive positioning, deal history, institutional knowledge about how your buyers actually buy. That's the layer worth owning. The infrastructure vendors know it — semantic and context layers are becoming a category, not a feature.

    Here's the tension. You want individual contributors building at the edges — a PM pushing copy changes, a RevOps analyst prototyping a scoring model, an AE building their own research workflow. That's the build momentum, and it's good. But who governs the context those builds depend on? Who version-controls the ICP definitions when the market shifts? Who manages skills, data access, permissions across a hundred individual builders?

    We're heading toward recentralization of context management even as building stays distributed. The data lake concept is re-emerging. AI data teams are forming. If someone on your leadership team is about to propose one, make sure they've thought through what "context governance" actually means before the meeting.

    Reading Corner

    • AEO: How to Make AI Recommend Your Product 89% of B2B buyers are using AI during purchasing (Forrester). AEO determines whether your product shows up when buyers ask AI "what should I use for X?" If you're not optimizing for that, you're invisible in early buying stages.
    • The Slow Decay of Growth Hello Operator maps the PLG decay curve that caught Ramp, Notion, Airtable, Figma, Miro, and Canva. The useful part: companies that bent the curve back up and how they did it.
    • The SaaS Empire Strikes Back Kyle Poyar's four strategic paths for legacy SaaS responding to AI: maintain status quo, maximize profitability, AI offensive, or full pivot. Pair with the build economics deep dive above.
    • Klaviyo at $1.2B ARR Trading at 4-5x forward revenue with 32% growth, 110% NRR, and expanding margins. The market is pricing in AI disruption that hasn't materialized. Look at NRR trends before you pull the trigger.
    • Hybrid Outbound Governance Chaos A sales manager documents the governance mess in real time: ownership gaps, conflicting metrics, disputed attribution. What works: automation for prospecting and sequencing, humans for calls and qualification.
    • Salesforce Posted a $350K CS Leadership Role The job reads less like customer success and more like revenue engineering. If you're restructuring CS, you're not hiring for the same job you had three years ago.

    Get the verdict every Wednesday.

    The AI x GTM briefing for operators. Free forever.

    One email per week. Unsubscribe anytime. No spam, ever.