PRA SS1/23 and AI Model Risk

The PRA's Supervisory Statement SS1/23, in effect since May 2024, is the first dedicated model risk management framework for UK banks. It was written with traditional models in mind: credit scorecards, market risk VaR, stress-testing engines. But its four principles apply to every model a bank operates, and that now includes generative AI.

Most banks I advise have not yet reconciled their SS1/23 compliance programmes with their GenAI deployments. The gap is widening as GenAI moves from experimentation into production workflows. Closing it requires understanding what the PRA actually expects, and where GenAI breaks the assumptions that traditional MRM frameworks were built on.

The Four Principles, Applied to GenAI

SS1/23 organises model risk management around four principles: governance, inventory and documentation, development and validation, and ongoing risk assessment. Each creates specific challenges when applied to generative AI.

Principle 1: Governance. The PRA expects boards and senior management to provide effective oversight of model risk. For traditional models, this is straightforward: the model does one thing, its risk profile is stable, and oversight can be periodic. GenAI models are different. A large language model deployed for customer correspondence, internal summarisation, or regulatory reporting can produce a different output for the same input. Its behaviour can shift with prompt changes, context window variations, or provider-side model updates. Board oversight of a system whose behaviour is inherently non-deterministic requires different reporting, different metrics, and a different cadence.

In practice, this means the model risk committee needs GenAI-specific reporting: not just performance metrics but behavioural drift indicators, output quality sampling results, and escalation summaries. The senior manager accountable under SMCR needs enough visibility to explain what the model is doing, even when the model itself cannot fully explain it.

Principle 2: Inventory and documentation. The PRA expects firms to maintain a comprehensive inventory of all models, including those embedded in vendor products. For GenAI, this creates two problems. First, the definition: under SS1/23, a model is a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates. Large language models fit this definition, but so do the dozens of smaller models and fine-tuned adapters that firms are building on top of foundation models. The inventory must capture the full stack, not just the headline model.

Second, documentation. Traditional model documentation describes the model's methodology, assumptions, limitations, and validation results. For a GenAI system, the methodology includes the foundation model (which the bank likely did not build), the fine-tuning data (which may include proprietary client data), the prompt engineering layer (which changes frequently), and the retrieval-augmented generation architecture (which determines what context the model sees). Documenting all of this to a standard that survives PRA scrutiny is a material effort.

Principle 3: Development and validation. The PRA expects independent validation of models, including assessment of conceptual soundness, outcome analysis, and benchmarking. For GenAI, "conceptual soundness" is a challenge: the internal workings of a large language model are not fully interpretable even to their developers. Banks cannot explain why the model generated a specific output in the way they can explain why a logistic regression scored a borrower at a particular level.

The practical response is to focus validation on outputs rather than internals. This means systematic testing across a defined set of scenarios, with quantitative measures of accuracy, consistency, and safety. It means adversarial testing: deliberately probing the model for failure modes, including hallucination, bias, and prompt injection. And it means ongoing monitoring, not just pre-deployment validation, because GenAI models can degrade or drift in ways that traditional models do not.

Principle 4: Ongoing risk assessment. The PRA expects firms to assess model risk on an ongoing basis, including monitoring for model deterioration and reassessing risk when models are used in new contexts. For GenAI, this is where most firms fall short. A model that performs well in testing may behave differently in production when exposed to unexpected inputs, adversarial users, or edge cases that the training data did not cover. Ongoing risk assessment for GenAI requires automated monitoring of output quality, regular human review of a statistically meaningful sample of outputs, and clear escalation protocols when anomalies are detected.

The Resource Reality

The immediate impact of SS1/23 has been a substantial uplift in MRM resources. Banks are investing in additional skilled personnel, validation tooling, and governance infrastructure. Adding GenAI to the scope of this programme multiplies the resource requirement.

A traditional credit risk model is validated once, then monitored periodically. A GenAI system used across multiple business functions requires continuous monitoring, frequent re-validation as the underlying model or its usage patterns change, and a documentation burden that scales with the number of use cases. Three of the last five banking clients I have worked with have underestimated the MRM headcount required for their GenAI programmes by at least 40%.

The temptation is to treat GenAI as out of scope for SS1/23 on the grounds that it is "not really a model" in the traditional sense. This is a mistake. The PRA has explicitly called out AI as within the scope of SS1/23. Firms that exclude GenAI from their model inventories are creating a regulatory gap that will surface during supervisory review.

What "Comprehensive" Actually Means

The word "comprehensive" in SS1/23 is doing more work than most banks acknowledge. It means every model, in every business unit, including those embedded in third-party systems. For GenAI, it means:

Every foundation model the bank uses, whether self-hosted or accessed via API. Every fine-tuned variant. Every prompt template that materially affects the model's behaviour (because a prompt is, functionally, a configuration of the model). Every RAG pipeline that determines what context the model sees. And every downstream system that acts on the model's outputs.

This is not a theoretical standard. It is what a PRA supervisor will expect to see when they ask for your model inventory. Banks that define "model" narrowly to avoid the compliance burden are taking a position that is difficult to defend.

Practical Steps for 2026

First, expand the model inventory to include all GenAI systems, using a definition broad enough to capture foundation models, fine-tuned variants, prompt engineering layers, and RAG architectures. If in doubt, include it. The cost of over-inclusion is documentation effort. The cost of under-inclusion is regulatory risk.

Second, build GenAI-specific validation capabilities. Traditional model validation teams may not have the skills to validate large language models. Invest in adversarial testing, output quality measurement, and automated monitoring tooling. This is a capability build, not a one-off exercise.

Third, establish a GenAI-specific reporting cadence for the model risk committee. Quarterly reviews are too slow for systems that can change behaviour between meetings. Monthly or continuous reporting, depending on the criticality of the use case, is the emerging standard among the banks I see managing this well.

Fourth, address the third-party dimension. If the bank uses a foundation model from an external provider, the provider's model updates are a risk event that needs to be captured in the bank's MRM framework. Model version changes, capability updates, and safety patches all need to trigger a reassessment of the bank's validation results.

The banks that treat SS1/23 as a GenAI governance framework, not just a traditional model risk framework, will find themselves better prepared for whatever the PRA does next. Those that wait for explicit GenAI-specific guidance are deferring a compliance obligation that already exists.

*To discuss how the 90-Day AI Acceleration programme can help your bank align GenAI deployments with SS1/23 expectations, contact the Value Institute.*

PRA SS1/23 and AI Model Risk

The Four Principles, Applied to GenAI

The Resource Reality

What "Comprehensive" Actually Means

Practical Steps for 2026

Related Insights

Measuring What Matters: How to Know If Your AI Investment Is Actually Working

The AI Strategy Myth: Why 'Just Add AI' Is Not a Strategy

Get insights delivered weekly