AI DevelopmentThought LeadershipSwyx

[AINews] Is Harness Engineering real?

Read original

Why I picked this

Victor's instinct here cuts to the existential question nobody in AI tooling wants to say out loud: what if the orchestration layer just... evaporates? Swyx surfaces a fascinating tension from inside the model providers themselves — Anthropic's Claude Code team rewrites their harness every 3-4 weeks because they believe the model should do the heavy lifting, not the wrapper. That's not a technical preference, that's a philosophical position. And it puts every company building 'AI agent frameworks' in an awkward spot: you're betting your business on a layer that the people building the actual intelligence think shouldn't exist. The finance analogy is perfect — was it the trader's skill or the institutional seat? Except here, the seat is getting smarter every quarter, and the trader might be obsolete by Q3. This isn't just architecture debate, it's a market structure question with real consequences for where you place your bets.

ai-coding-toolsagent-engineeringvendor-positioningmarket-consolidation

Three lenses

Builder

If I'm building an AI product today, I'm watching this closely but not panicking yet — models still need guardrails, logging, fallback logic, and cost management that someone has to write. The question is whether that 'someone' is a $50M venture-backed framework company or just 200 lines of Python I maintain myself.

Revenue Leader

The Big Harness vs Big Model debate matters less to me than deployment reality — I need something my team can actually use Monday morning, and right now that's still a framework with docs and support. But if I'm evaluating a vendor whose entire value prop is 'orchestration,' I'm asking hard questions about their moat in 18 months.

Contrarian

Here's what nobody's saying: the framework companies know this is coming, which is why they're all pivoting to 'observability' and 'governance' — the last defensible layer before you're just reselling API credits. Watch for the rebrand wave in Q2.

I'm not even sure these guys want me to exist - AI framework founder at OpenAI event

Key takeaways

  • Central debate emerging: 'Big Model' (minimal harness, model does everything) vs 'Big Harness' (orchestration/framework layer adds value) - mirrors finance debate about trader skill vs institutional position
  • Model providers like Anthropic/OpenAI are philosophically minimalist on harness - Claude Code rewrites from scratch every 3-4 weeks, emphasizing 'thinnest possible wrapper' with all secret sauce in the model itself
  • Existential threat to AI framework/orchestration companies as reasoning models improve - framework founders questioning their own necessity as models become more capable of self-orchestration

People mentioned

  • Boris Cherny, Engineer @ Anthropic/Claude Code
  • Cat Wu, Engineer @ Anthropic/Claude Code
  • Ryan Lopopolo, Codex Team @ OpenAI
  • Noam Brown, Researcher @ OpenAI
  • Swyx, Author/Analyst @ Latent Space

Companies

OpenAIAnthropicClaude Code

Key metrics

  • rewritten from scratch every 3-4 weeks
  • $3M in profits (finance analogy)

Why this matters for operators: Critical for operators evaluating build-vs-buy decisions in AI agent infrastructure — reveals philosophical divide between model providers and orchestration vendors that will shape vendor viability over next 12-18 months

I cover AI×GTM intelligence like this every Wednesday.

Get STEEPWORKS Weekly

More picks

Personal Productivity & AI-Augmented WorkLenny's Newsletter

How Intercom 2x’d their engineering velocity in 9 months with Claude Code | Brian Scanlan

  • Intercom doubled engineering throughput (merged PRs per R&D employee) in 9 months using Claude Code while maintaining code quality
  • Built custom telemetry infrastructure to measure AI adoption and quality impact across hundreds of engineers, plus skills repository with automated enforcement hooks
  • Achieved 100% adoption across engineering AND expanded to non-technical roles (designers, PMs, TPMs) shipping code—suggesting AI coding tools democratize development
ai-coding-toolscursor-vs-copilotautomation-stacks
AI Developmentr/artificialVictor's pick

I built a 3D brain that watches AI agents think in real-time (free & gives your agents memory, shared memory audit trail and decision analysis)

the repo for this is kinda cool - https://github.com/RyjoxTechnologies/Octopoda-OS

  • Agent memory persistence is the #1 pain point (38%) for multi-agent systems, followed by debugging complexity (24%)
  • Loop detection prevents runaway costs - one case saved $200 in a single afternoon from stuck GPT-4 calls
  • Visual observability (3D graph showing agent activity, memory operations, and inter-agent communication) addresses debugging complexity that affects 24% of users
ai-agent-observabilityai-agent-memoryai-cost-control
GTM OpsSaaStr — Jason LemkinVictor's pick

5 Interesting Learnings from Klaviyo at $1.2 Billion in ARR: 32% Growth, 110% NRR, and Somehow Only 4x Revenue

SaaS dead, dying or underpriced? Feels like a stock pickers market with attractive opportunities to me

  • Klaviyo trading at 4-5x revenue despite 32% growth, 110% NRR, and profitability—potentially most mispriced public B2B company or signal of 'New Normal' for SaaS valuations
  • NRR improved to 110% while scaling to $1.2B ARR by doubling $1M+ ARR customers and growing $50K+ customers 37% YoY—rare upmarket expansion success at scale
  • International revenue grew 42% YoY and now represents 33%+ of business, breaking 'Shopify add-on' narrative with regional hubs in Dublin and Singapore
market-consolidationrevenue-platform-consolidationback-to-basics-gtm

This analysis was produced using the STEEPWORKS system — the same agents, skills, and knowledge architecture available in the GrowthOS package.