AI in Collections

Collections is one of the highest-stakes environments for AI in banking. The customers are vulnerable. The regulatory scrutiny is intense. The conduct risks are material. And the pressure to reduce cost-to-collect is relentless. AI can help with all of this, but only if the design choices prioritise demonstrable outcomes over operational efficiency.

The FCA's Consumer Duty, now embedded as the standard against which firms are assessed, requires firms to evidence good outcomes, not merely assert them. For collections, this means outcome KPIs must go beyond cure rates to include sustainability of arrangements, repeat delinquency, vulnerability identification, root-cause resolution, and auditable decision trails. Any AI system deployed in collections will be measured against these standards.

In the collections transformations I have been involved in, the most common failure pattern is not technical but architectural: banks invest in the AI model but under-invest in the outcome measurement framework that makes the model defensible. Two of the last four collections programmes I reviewed had sophisticated propensity-to-pay models but no mechanism to evidence whether the arrangements those models recommended actually succeeded at six and twelve months. Without that feedback loop, the AI cannot improve and the firm cannot demonstrate good outcomes to the regulator. Getting the measurement architecture right before deployment is not a nice-to-have; it is the difference between a system that strengthens your regulatory position and one that creates a new conduct risk.

Where AI Adds Value in Collections

The collections process has three phases where AI patterns are maturing: triage, engagement, and arrangement.

Triage is where the largest gains are available. Traditional collections models segment customers by days past due, balance, and a credit score. AI can incorporate a wider signal set: transaction patterns indicating temporary cash flow disruption versus structural financial difficulty, vulnerability indicators (changes in spending on essentials, missed payments on other products, contact centre interaction patterns), and propensity to self-cure without intervention. Better triage means allocating human agents to the cases that need them and automating outreach for the cases that do not.

The design choice that matters: the triage model must explain its segmentation in terms a reviewer can interrogate. "Customer A was classified as high-likelihood self-cure because of X, Y, Z" is auditable. "Customer A was classified as low priority by the model" is not. Under Consumer Duty, the firm must be able to demonstrate that its collections approach delivered good outcomes for each customer segment. This requires the AI to produce interpretable classifications, not just scores.

A UK lender I advised restructured its collections triage around three segments (self-cure, guided resolution, and intensive support) with the AI producing a plain-language rationale for every classification. The shift from opaque scores to interpretable segmentation cut unnecessary agent contact by a third and gave the compliance team an audit artefact they could actually present to the regulator.

Engagement is where AI is most visible to the customer, and where the conduct risks are highest. AI-driven outreach (automated messages, chatbot interactions, personalised communication timing) can improve contact rates and reduce the cost of early-stage collections. But the tone, timing, and content of these communications must pass Consumer Duty review.

The patterns that work:

Personalised timing based on when the customer is most likely to engage, drawn from their historical interaction data. This improves contact rates without increasing message volume.

Adaptive messaging that adjusts tone based on the customer's situation. A customer who has missed one payment on an otherwise clean account needs a different message from a customer in persistent arrears. The AI must distinguish these cases and adjust accordingly.

Vulnerability detection integrated into every interaction. If a customer's responses indicate potential vulnerability (language suggesting distress, references to health issues, erratic interaction patterns), the system must escalate to a specialist human agent. This is not optional under Consumer Duty; it is a core requirement.

The design choice that matters: the AI must have hard escalation triggers for vulnerability, not probabilistic ones. A model that assigns a 70% probability of vulnerability and does not escalate has made a conduct decision that will be difficult to defend in an FCA review.

Arrangement is where AI can improve both outcomes and efficiency, but where the risks of getting it wrong are acute. AI systems that propose repayment arrangements based on the customer's income, expenditure, and financial situation can produce more sustainable outcomes than a human agent working from a standard affordability template. But the arrangement must be genuinely affordable, not just mathematically feasible.

The patterns that work:

Income and expenditure estimation using transaction data, supplemented by customer-provided information. The AI can identify disposable income more accurately than a static template, but the estimate must be conservative. An arrangement that fails within three months because the AI overestimated disposable income is a worse outcome for the customer and the firm than a slightly longer arrangement that succeeds.

Sustainability scoring that predicts the probability of the customer completing the arrangement. Arrangements with a predicted completion probability below a defined threshold should be flagged for human review before being offered.

Breathing space detection. AI systems should identify customers who may be eligible for the Breathing Space scheme and route them appropriately, rather than continuing with standard collections activity.

The Outcome Measurement Framework

The FCA's 2026/27 work programme confirms that Consumer Duty outcomes monitoring is a priority area for multi-firm review. For collections, the outcome framework needs to capture four dimensions, and the discipline I find most effective is defining all four with explicit thresholds before the system goes live, then publishing those thresholds to the board. In every collections programme I have helped design, that single step is what separates systems that strengthen the firm's regulatory position from those that quietly accumulate conduct risk.

Cure sustainability. Not just whether the customer cleared the arrears, but whether they remained current for 6 and 12 months afterwards. An AI system that achieves high cure rates through aggressive arrangements that subsequently fail is delivering poor outcomes.

Vulnerability outcomes. What happened to customers identified as vulnerable? Were they routed to specialist support? Did their outcomes differ materially from the general population? If the AI identified vulnerability but the process did not adapt accordingly, the identification has no value.

Customer understanding. Did the customer understand the arrangement they entered into? This is measurable through post-arrangement surveys, complaint rates, and early default rates (which can indicate that the customer agreed to something they could not sustain).

Fairness across segments. Are outcomes consistent across customer demographics? An AI system that produces systematically worse outcomes for specific demographic groups has a conduct problem regardless of its aggregate performance. This requires regular bias testing against protected characteristics.

Building for Regulatory Review

A collections AI system that survives FCA scrutiny shares three characteristics.

First, every decision the system makes is logged with enough context to reconstruct the reasoning. Not just "customer was sent message A at time T" but "customer was in segment X, with vulnerability score Y, and the system selected message A because of Z." This audit trail is the foundation of any regulatory defence.

Second, the system has hard constraints that override the model's optimisation. If the model suggests an arrangement that exceeds 50% of disposable income, the constraint blocks it regardless of the model's confidence. If the model detects potential vulnerability, escalation happens regardless of the model's assessment of severity. These constraints exist because the firm's conduct obligations are not probabilistic; they are absolute.

Third, the outcome framework is defined before deployment, not after. The firm knows, before the AI goes live, what metrics it will measure, what thresholds constitute good outcomes, and what triggers a review. Post-hoc metrics selection is a red flag in any regulatory review.

The banks deploying AI in collections most effectively are the ones that treat Consumer Duty not as a constraint on their AI but as the design specification for it. Having advised on Consumer Duty-compliant collections design for both challenger and incumbent banks, the pattern I see most clearly is this: the question that produces better systems is not "how do we deploy AI in collections without breaching Consumer Duty?" but "how do we use AI to deliver better Consumer Duty outcomes than we can achieve without it?" The firms that frame the problem that way end up with stronger systems and stronger regulatory positions.

*To discuss how the 90-Day AI Acceleration programme can help your bank design Consumer Duty-compliant AI collections systems, contact the Value Institute.*

AI in Collections

Where AI Adds Value in Collections

The Outcome Measurement Framework

Building for Regulatory Review

Related Insights

Measuring What Matters: How to Know If Your AI Investment Is Actually Working

The AI Strategy Myth: Why 'Just Add AI' Is Not a Strategy

Get insights delivered weekly