Define AGI. If it's 'a model that can reliably do any cognitive task a human can,' we're not there. If it's 'a system that meaningfully completes useful work in production,' we crossed that line about 18 months ago and we're still figuring out the implications. The marketing chatter about AGI is mostly noise; the actual work of building agents that earn their keep is where the value is.

Will AI replace developers?

It'll replace some of the work developers do. Boring, well-specified code (API clients, schema mapping, internal tools, glue logic, tests) — agents write that 5–10× faster than humans now. Novel architecture, performance-sensitive systems, ambiguous product requirements — those still need humans. Developers who lean into AI tools as accelerators are shipping 2–3× the surface area they used to. Developers who refuse are getting outshipped.

What's the biggest mistake teams make with AI in 2026?

Wrapping AI around workflows that should have been deterministic automation. Followed closely by: shipping LLM-touched workflows without evals or observability. Both are the difference between something that works in a demo and something that works in production.

Is open-source AI catching up?

Yes and no. Llama 3.3 70B and Mistral Large are competitive with GPT-4 for many tasks at lower cost when self-hosted. For frontier reasoning, Claude and GPT still lead. Open source matters when data residency, cost-at-scale, or vendor independence is the priority. For most production workloads, the API trade-off (slightly more expensive, much lower operational overhead) still wins.

What should I bet on for the next 12 months?

Agentic workflows that complete jobs, not just answer questions. Longer context (1M+ tokens is normalizing). Cheaper inference (Gemini Flash and Claude Haiku are pushing this hard). Better tool calling. MCP adoption normalising agent-tool integration. Voice AI quietly becoming a real category.

Define AGI. If it's 'a model that can reliably do any cognitive task a human can,' we're not there. If it's 'a system that meaningfully completes useful work in production,' we crossed that line about 18 months ago and we're still figuring out the implications. The marketing chatter about AGI is mostly noise; the actual work of building agents that earn their keep is where the value is.

Will AI replace developers?

It'll replace some of the work developers do. Boring, well-specified code (API clients, schema mapping, internal tools, glue logic, tests) — agents write that 5–10× faster than humans now. Novel architecture, performance-sensitive systems, ambiguous product requirements — those still need humans. Developers who lean into AI tools as accelerators are shipping 2–3× the surface area they used to. Developers who refuse are getting outshipped.

What's the biggest mistake teams make with AI in 2026?

Wrapping AI around workflows that should have been deterministic automation. Followed closely by: shipping LLM-touched workflows without evals or observability. Both are the difference between something that works in a demo and something that works in production.

Is open-source AI catching up?

Yes and no. Llama 3.3 70B and Mistral Large are competitive with GPT-4 for many tasks at lower cost when self-hosted. For frontier reasoning, Claude and GPT still lead. Open source matters when data residency, cost-at-scale, or vendor independence is the priority. For most production workloads, the API trade-off (slightly more expensive, much lower operational overhead) still wins.

What should I bet on for the next 12 months?

Agentic workflows that complete jobs, not just answer questions. Longer context (1M+ tokens is normalizing). Cheaper inference (Gemini Flash and Claude Haiku are pushing this hard). Better tool calling. MCP adoption normalising agent-tool integration. Voice AI quietly becoming a real category.

All resources

AI agents

The state of AI development in 2026

May 4, 2026· updated May 21, 20266 min read

The honest version

It's mid-2026. Here's what's actually happening in AI development on the ground, not the version that gets pitched on Twitter.

What's working in production:

Document processing agents (invoices, contracts, KYC) — meaningful cost reduction, broadly shipping.
Voice agents — Twilio + GPT-4o Realtime is uncanny enough that real businesses are deploying it for booking and qualification.
Internal knowledge-base chatbots — when grounded in real docs with proper retrieval, they work.
Workflow orchestrators with one or two LLM-judgment steps — replacing RPA stacks.
Code agents as developer accelerators — every serious engineering team uses them in some form.

What's underrated:

Boring automation with one LLM call inside. Most of the value is in shipping that, not in building "fully autonomous" anything.
Evals and observability tooling. The teams that invest here ship faster; the teams that don't grind to a halt.
Microsoft Power Platform as a sandwich layer between AI and enterprise data. Boring, but it's the path of least resistance for many M365 shops.

What's overhyped:

"AGI" marketing. The work that ships value is decidedly non-AGI.
Multi-agent systems for problems that one agent solves fine. Adding agents adds coordination overhead; most teams aren't ready.
Fine-tuning for tasks that RAG solves.
"AI-first" everything. Most products benefit from AI on specific steps, not as an overhaul.

What changed since 2024

If you tuned out for 18 months, here are the deltas:

Tool calling went from flaky to reliable

Models in 2024 would invent tool arguments, ignore schemas, or fail to call tools when they should. By mid-2025 Claude tool calling crossed reliability thresholds that made production agents workable; GPT and Gemini followed. Agent frameworks (LangGraph, Vercel AI SDK, Anthropic's Claude SDK) matured to match.

Long context normalised

2024 was 128k-token max for most production work. 2026 is 200k–1M routinely. This changes what's possible — you can feed whole contracts, whole codebases, whole knowledge bases into a single agent invocation and get reasoned output back.

Cost dropped sharply

Frontier models in 2024 cost €15–€60 per million tokens. By 2026 you can run agents on Claude Haiku or Gemini Flash for €0.30–€2 per million tokens with quality good enough for many production workloads. This made volume agentic workflows economically viable that weren't 18 months ago.

Voice realtime arrived

GPT-4o Realtime in late 2024 changed voice AI from "novelty" to "shippable." Sub-second response with proper barge-in is the first time AI phone agents felt like talking to a real entity. The voice AI category is real now.

MCP started normalising agent-tool integration

Model Context Protocol (Anthropic, late 2024 → 2025) is becoming the way agents connect to tools and data sources. Not universal yet but rapidly heading there. Reusable agents across ecosystems is starting to be a thing.

Eval tooling caught up

Langfuse, Braintrust, Promptfoo, Helicone — proper observability and eval tooling exists now. The teams that adopt it ship better agents; the teams that don't get stuck.

What's hard about AI development in 2026

The hard parts aren't what people expect.

Not hard: getting an LLM to do something interesting. Models are smart enough now that the model isn't the bottleneck.

Hard: getting an LLM to do something useful reliably at scale, with sensible cost, with good error handling, with proper evals, with auditable traces, with respect for guardrails, integrated with your actual business systems, maintainable by humans who understand neither the model nor the framework.

The hard parts are engineering, not AI.

What we ship vs what we don't

Honest list of what our work looks like in 2026:

We ship:

Document processing agents (invoices, contracts, KYC, claims).
Voice agents (bookings, qualification, after-hours).
Conversational agents grounded in customer or internal knowledge.
Workflow orchestrators with judgment steps (classification, routing, extraction).
Custom dashboards and operations tools for the teams running these systems.
Power Platform builds that integrate AI into the Microsoft 365 stack.

We don't ship:

Fully autonomous agents that touch money without approval gates. (Not because we can't, because we shouldn't.)
Multi-agent orchestration for problems a single agent handles fine.
Fine-tuned models when RAG works.
"Replace the entire customer support team with AI" projects. The math doesn't work the way the pitch deck claims.

Where the next 12 months go

Best-guess predictions, with the usual humility about prediction accuracy:

More agents in production, fewer agent demos. The "look what's possible" phase is mostly over; the "is it shipped and is it earning?" phase is on.
Voice as a real category. Phone agents will move from novelty to baseline expectation for service businesses by end of 2026.
Vertical agents emerge. AI agents specialized to specific industries (AP automation, legal contract review, clinical documentation) become productized.
MCP adoption widens. Reusable tools and agents across ecosystems become the norm.
Cost continues to drop. Inference cost halves every 12–18 months; this lets more workflows pay back.
Evals tooling consolidates. A handful of platforms win out; the rest fade.
The "AGI in 2027" pitch keeps not happening on the timeline its loudest proponents claim, while incremental progress keeps mattering more than the milestone debate suggests.

How to think about it as a buyer

Three principles:

Solve the problem first, pick the technology second. If a deterministic automation does it, that's the answer. AI is the answer when judgment or unstructured inputs make automation impossible.

Insist on evals and observability. If a vendor or in-house team can't show you how they measure quality and how they debug failures, the system will degrade.

Prefer agencies and engineers who say "we won't do that" sometimes. The honest answer in AI right now is often "this isn't the right problem for AI." If your vendor never says that, they're selling something.

How to think about it as a builder

Three principles:

Boring engineering scales; cargo-cult frameworks don't. Pick the simplest stack that works. Write tests. Add observability. Treat agent steps like any other engineering component.

Evals are your test suite for AI. Build them as you build the agent, not after. Run them in CI. Surface scores on a dashboard. They are what lets you ship changes without fear.

Talk to the operator, not the executive. The person who'll actually use the system day-to-day is the one whose feedback matters. Their input shapes the agent more than any roadmap document.

The bottom line

AI development in 2026 is engineering, with AI as one component of the system. The teams that treat it that way ship; the teams that treat it as a magic wand don't.

If you have an idea you want to ship, our services cover the shapes we build. If you want our internal version of how to ship AI fast and reliably, see The AI Development playbook. Or just drop us a note.

Frequently asked questions

Keep reading

Article

What is an AI agent? The full breakdown

An AI agent is a system that turns a goal into a sequence of tool calls. Where a chatbot answers questions, an agent completes jobs. It plans steps, picks tools, executes them, recovers from failures, and either finishes the task or hands off to a human. The defining ingredients are a goal, retrieval, tools, guardrails, evals, and observability.

Article

ChatGPT API vs Claude API vs Gemini: which to pick (2026)

Claude Sonnet 4.6/4.7 is our default for production agents — most reliable tool calling, best structured output, strong reasoning. GPT-4o wins for voice (Realtime is best-in-class) and the largest ecosystem. Gemini 2.5/2.0 wins for long-context, vision-heavy document work, and cost-sensitive volume workloads. Pick per task; abstract behind a provider interface.

Article

The AI Development playbook: how we ship agents in 6 weeks

We ship production AI agents in 6 weeks by being opinionated about tools, refusing to skip discovery, building evals from day one, and treating code agents as a force multiplier. This is the playbook — what we use, what we refuse, and why it lands consistently.

Service

AI Agents Development

Custom agents that read documents, hold conversations, take phone calls, and execute multi-step workflows — wired into the systems you already run.

Want this delivered in your stack?

If the article describes a workflow you'd like to ship, drop us a note. We reply within one business day.

Get a proposal

Frequently asked questions

Is AGI close?

Will AI replace developers?

What's the biggest mistake teams make with AI in 2026?

Is open-source AI catching up?

What should I bet on for the next 12 months?

Keep reading

What is an AI agent? The full breakdown

ChatGPT API vs Claude API vs Gemini: which to pick (2026)

The AI Development playbook: how we ship agents in 6 weeks

AI Agents Development

Want this delivered in your stack?