Most board-level conversations about agentic AI in 2026 land on the same question, asked with varying degrees of patience: where does the AI actually sit? The vendor pitch deck talks about agents and autonomy. The proof of concept demonstrates a model answering a complex question. The production system then must make a real operational decision that affects working hours, revenue, or eligibility for a benefit, and the question of where the AI sits in that decision suddenly matters quite a lot. To the auditor. To the regulator. To the operations team. To the person on the wrong end of the decision.

The honest answer in most 2026 enterprise systems is that the AI sits in the wrong place. It is asked to compose, decide, calculate, validate and notify, all inside a single prompt. The output is plausible most of the time. It is wrong some of the time. And when it is wrong, there is no clean line between the part of the system that orchestrated, the part that calculated, the part that decided, and the part that explained - because all four were the same prompt.

Gartner projects 40% of enterprise applications will integrate task-specific AI agents by the end of 2026, up from less than 5% in 2025. Yet AaiNova's March 2026 enterprise architecture study found that while 79% of organisations report some AI agent adoption, only 11% are in production and just 2% have deployed at full scale. The architecture is where most deployments fail - and where they fail most often is in the lack of a clear division of labour between the components that compose, calculate, decide and explain.

This post argues for a specific four-part architectural pattern that has proven itself across regulated, safety-critical, and audit-sensitive AI deployments. The pattern names four distinct components and assigns each one a job. The deliberate design move is that no component is asked to do another component's job - and in particular, the language model is not asked to do any job that requires arithmetic, deterministic rule application, or final decision authority.

The four-part division of labour

Four components, four distinct jobs, in a strict separation of concerns:

The optimisation solver decides

Where the underlying problem is combinatorial - allocating 30 staff across 14 locations and 48 time slots under hard constraints, scheduling 80 stand assignments against safety geometry, routing fleet movements across a network - the deciding mechanism is a mathematical optimiser, not a language model. Mixed-integer linear programming solvers, constraint programming engines, or hybrid approaches running on tools like Google OR-Tools or Gurobi produce provably feasible solutions under explicit constraint sets. They cannot hallucinate. They cannot return an answer that violates a hard constraint. They are deterministic and reproducible - the same inputs produce the same outputs. This is the component that holds final decision authority on the combinatorial layer of the problem.

The agent orchestrates

The agent's job is to manage the flow of work, not to make the decision. It calls typed tools to fetch data, passes inputs to the optimisation solver, receives results, calls validation tools, handles bounded retry logic on validation failure, and passes the final structured output downstream. The agent uses a language model under the hood for orchestration reasoning, deciding which tool to call next, summarising tool outputs, handling edge cases, but it does not make any of the substantive decisions itself. Microsoft's Foundry Agents, LangGraph, and similar agent frameworks all support this orchestrator-only role explicitly.

The rules engine constrains

Between the solver's solution and the agent's downstream output sits a deterministic rules engine that validates against the constraints that cannot be expressed cleanly inside the optimiser - policy rules, soft constraints with override paths, contextual rules that depend on combinations of data, and rules that change frequently and need pull-request-style change management. The rules engine returns a structured validation result. If the solver's solution fails a rule, the agent re-runs the solver with an updated objective function or surfaces the violation to a human reviewer. The rules engine is the guardrail layer; it cannot be bypassed by the model.

The language model explains

The language model's job is the one it is genuinely best at: rendering structured outputs into natural language. Once the solver has decided, the rules engine has validated, and the agent has orchestrated, the language model translates the result into one-line rationales per allocation, plain-English summaries of edge-case handling, and human-readable audit explanations. It does not compute numbers, apply rules, or change decisions. It explains decisions that have already been made by other components. This is the role for which large language models are uniquely well-suited and least dangerous.

Why the separation matters

Four reasons the strict separation pays off in production, each one corresponding to a specific failure mode of architectures that do not separate.

The first is auditability. When the four components are separated, every decision has a clear trail - the inputs the agent fetched, the constraints the solver received, the solution it returned, the rules the validation engine applied, the result it produced, and the explanation the language model rendered. The audit trail is structural, not retrofitted. When the four jobs are fused into a single prompt, the audit trail is whatever the prompt happened to log - which is rarely enough to defend in front of a regulator.

The second is correctness on the parts that have to be correct. Combinatorial optimisation problems have provably optimal or near-optimal solutions. A language model asked to allocate 30 staff to 14 locations will produce something that looks like an allocation; an optimisation solver will produce one that is mathematically guaranteed to satisfy every hard constraint and minimise the chosen objective. The difference between "looks like" and "provably is" is the difference between a demo and a production system.

The third is governance. Different components answer to different governance owners. The rules engine is owned by the policy team - HR for working-time rules, DPO for privacy rules, Legal for contractual rules. They review and approve rule changes through pull-request workflows in Git. The optimisation model is owned by the operations research team. The agent and language model are owned by the AI engineering team. When the four jobs are fused, every change to any of them is everybody's change, and governance collapses into committee fatigue. When they are separated, the right owner owns the right component.

The fourth is the human-in-the-loop architecture that regulated AI requires. The UK and EU GDPR Article 22 restricts solely automated decisions with significant effect. The EU AI Act, with high-risk obligations binding from 2 August 2026, requires meaningful human oversight in production. A separated architecture exposes natural points where a human can intervene - reviewing the solver's solution before publication, overriding the rules engine on documented exceptions, approving the language model's rendering before it goes to the customer. A fused architecture has only one intervention point, which is too early to be useful or too late to matter.

Where the rest of the market is going wrong

Vector Labs' March 2026 framing of the five mainstream agentic AI architectures - ReAct, Multi-Agent Systems, Agentic RAG, Tool-Use, Hierarchical - captures the dominant industry conversation honestly. What all five share is an assumption that the language model is the dominant intelligence in the system. They differ in how they extend the model with retrieval, with other agents, with tool calling, or with hierarchical decomposition - but the model is always at the centre of the decision.

For genuinely combinatorial problems, this is the wrong centre. The model cannot beat a properly configured MILP solver at constrained optimisation, and trying to do so produces solutions that pass plausibility checks but fail provability ones. The right architecture for combinatorial workloads is one where the model is the orchestrator and explainer, not the decision-maker.

For non-combinatorial problems - document triage, eligibility decisioning, customer service - the four-part pattern still holds, with the optimisation component absent or simplified. The agent orchestrates document intelligence tools rather than an optimisation solver; the rules engine still constrains; the language model still explains. The framework adapts to the workload. What does not adapt is the discipline of keeping decision authority out of the language model.

What good looks like in practice

Four signals that a four-part separation has been implemented properly, useful for evaluating any vendor architecture in a shortlisting conversation.

The vendor can answer where the decision is made. If the answer is "the agent decides" or "the model decides," the architecture has not separated. The right answer names the optimisation solver, the rules engine, or the human reviewer - and is specific about which decisions go to which component.

The model cannot produce numerical outputs as decisions. Arithmetic, counts, allocations, scores, monetary values - all come from deterministic tools, not from the model. If the production output includes a number that the model generated, the architecture has not separated properly.

The rules engine has named owners and a pull-request workflow. Rules-as-code in Git, with HR, DPO, Legal and Operations each named as the owner of their relevant rule category, and pull-request review on every rule change. This is the test of whether the rules engine is real or whether it is a configuration file the vendor maintains.

The language model is asked to explain, not to decide. The model produces one-line rationales referencing the rule consulted and the signal weighted, not standalone judgements. If the model's output is the decision, the model is the decision-maker - and you are back to a fused architecture.

Closing

The conversation about agentic AI in 2026 has been dominated by autonomy - how much of the work can the agent do without us. The more useful conversation is about division of labour - which component does which part of the work, and which component does the part that genuinely needs that component's strengths. The four-part separation - optimisation solver decides, agent orchestrates, rules engine constrains, language model explains - is a pattern that holds across regulated, safety-critical and audit-sensitive workloads, and it is the pattern that survives contact with production scrutiny. The architectures that are still failing at the 89% rate Gartner reports are mostly the ones that have not done this separation. The ones that are getting through to production are mostly the ones that have.

If you are designing or evaluating an agentic AI architecture for a regulated or audit-sensitive workload and want to discuss how the four-part division of labour applies to your use case, the VE3 team would welcome a 30-minute conversation.

‍