The FCA launched the Mills Review in January 2026 to examine how increasingly autonomous AI systems will reshape retail financial services by 2030. The review is asking a question that every financial services architect should already be answering: when an AI agent performs a function that looks like a regulated activity, who is accountable, and can you prove it?
This is not a theoretical concern. Agentic AI systems, those that can plan, execute multi-step tasks, and take actions with real-world consequences, are moving into production in financial services. They are routing customer queries, pre-populating advice suitability assessments, executing trades within parameters, and managing collections workflows. The design choices made now will determine whether these systems survive regulatory scrutiny or become the subject of it.
Why s.166 Is the Right Lens
A section 166 skilled person review is the FCA's sharpest investigative tool for examining how a firm's systems and controls actually operate. Unlike a thematic review, which examines an issue across the sector, a s.166 targets a specific firm and demands evidence: not what your policy says, but what your systems do in practice.
For agentic AI workflows, a s.166 review would examine three things. First, whether the firm can demonstrate that the AI system operates within its intended boundaries at all times, not just on average. Second, whether human oversight is genuine or performative. Third, whether the audit trail is complete enough to reconstruct any individual decision the agent made, including the reasoning path.
Designing for s.166 survivability is not about passing an exam. It is about building systems that are genuinely controllable, auditable, and explainable. The firms that treat regulatory compliance as a design constraint, rather than a post-hoc documentation exercise, build better systems.
Design Pattern 1: Bounded Autonomy with Hard Limits
The most common failure mode in agentic systems is scope creep: an agent designed to do one thing gradually being asked to do adjacent things, with each incremental expansion tested lightly or not at all. In the firms I've advised, this rarely happens as a single deliberate decision. It happens through a series of small accommodations. A product owner asks the agent to handle one more edge case, then another, until the system's effective scope has drifted well beyond what was originally tested and documented.
The pattern that works in regulated environments is bounded autonomy. The agent operates within a defined action space, with hard limits enforced at the infrastructure level, not just in the prompt or application logic. If an agent is authorised to send a collections letter, it cannot also offer a payment plan unless that action is separately authorised, tested, and documented.
In practice, this means:
The firms getting this right separate the agent's reasoning capability from its execution capability. The model can think broadly. The execution layer constrains what it can do. This separation is what makes the system auditable: you can inspect the agent's reasoning and independently verify that the execution stayed within bounds.
Design Pattern 2: Genuine Human Oversight
The FCA has flagged human-in-the-loop protocols as a "live issue" and signalled that guidance is coming in 2026. The reason is straightforward: many firms claim human oversight of AI systems, but the oversight is nominal. A human "reviews" 200 AI-generated decisions per hour, which is not review. It is rubber-stamping.
Genuine human oversight in agentic workflows requires design choices that make oversight meaningful:
The test is simple: if a regulator asked your human reviewer to explain the last 10 decisions they approved, could they do so with specificity? If the answer is no, the oversight mechanism needs redesigning.
Design Pattern 3: Reconstructible Decision Trails
The audit trail for an agentic system is fundamentally different from a traditional model audit. A scoring model takes an input and produces an output. An agent takes an input, reasons about it, takes multiple actions, observes the results, adjusts its approach, and produces an outcome through a chain of decisions. Auditing that chain requires a different kind of logging.
The minimum standard for a s.166-survivable audit trail:
This is expensive. Full reasoning capture increases storage costs and adds latency. State snapshots require point-in-time data architecture. Counterfactual replay requires model versioning and controlled inference. These are engineering costs that regulated firms must budget for. The alternative, deploying agents without adequate audit trails, is a cost that materialises later, in enforcement actions and remediation programmes.
The SMCR Question
The FCA's Mills Review raised a question that cuts to the heart of agentic AI governance: how does the Senior Managers and Certification Regime operate where AI systems perform functions traditionally subject to direct human oversight?
Under SMCR, a senior manager is personally accountable for the activities within their area of responsibility. When an AI agent performs those activities, the accountability does not transfer to the machine. It remains with the senior manager. A senior manager I work with in retail banking described this as the "accountability gap": they are personally liable for outcomes produced by a system whose inner reasoning they cannot directly inspect. Closing that gap is a design problem, not a governance problem. This means the senior manager must be able to demonstrate that they understood what the agent was doing, that they had adequate controls in place, and that they were in a position to intervene.
The design implication is clear: agentic systems in regulated workflows need dashboards, alerts, and reporting that are designed for the accountable senior manager, not just for the technology team. The senior manager needs to see, in near-real-time, what the agent is doing, whether it is operating within its boundaries, and what the key risk indicators look like.
Building for 2026 and Beyond
The FCA has not yet published specific rules for agentic AI. The PRA and Bank of England have consistently signalled that AI will be overseen through existing frameworks rather than bespoke AI-specific regulation. This means the design patterns above are not speculative. They are applications of existing regulatory expectations, SMCR accountability, adequate systems and controls, treating customers fairly, to a new class of technology.
The firms building agentic systems today have a choice: design for the regulator you have, or redesign when the regulator tells you to. The first approach is cheaper, faster, and produces better systems.
*To discuss how the 90-Day AI Acceleration programme can help your organisation design agentic AI systems for regulated environments, contact the Value Institute.*
