Agent Risk Management: Managing Delegated Autonomy Over Time

Something that has always amused me in technology is that we keep re-discovering old risk problems and then branding them as new. We did it with distributed systems, cloud, microservices and data platforms. We are now doing it again with AI agents. The vocabulary is new, the demos are impressive, and some of the failure modes are genuinely different, but the underlying management problem is familiar: how do you let a complex system do useful work inside a complex organisation without pretending you can predict every interaction upfront?

To be fair, agents are new in some important ways. A system that can plan, call tools, write to platforms, trigger workflows, coordinate with other agents and continue across multiple steps is not just a slightly better chatbot. It changes the control problem because the system moves faster, touches more things, and creates more chances for failure before anyone sees the final result. The old question, “is this answer acceptable?”, does not disappear, but it becomes much too small.

The better question is whether we can design an organisational risk management system that keeps delegated autonomy inside an acceptable operating envelope over time. That sounds less exciting than “AI safety”, but it is much closer to the actual problem firms are going to have to solve. Agents are probabilistic systems operating inside deterministic infrastructure, embedded in social and organisational processes, and increasingly connected to business-critical platforms.

The strange usefulness of risk management ideas

Financial services gives us a useful forcing function because supervisors have had to deal with model risk for decades. SR 11-7 does not mention hallucination rates. It does not define retrieval precision. It has no section on prompt injection, corpus freshness, tool-use drift, chain-of-thought leakage, synthetic evaluation sets or agentic decision boundaries. It was written in 2011, in the world of logistic regressions, scorecards, valuation models, stress testing models and other mostly quantitative things.¹

And yet, for a long time, if an examiner walked into a financial institution and asked about a GenAI system with material impact, SR 11-7 was the kind of framework they would instinctively reach for. Not because it was written for AI, but because its core logic is not really about any single technology. It asks three boring and powerful questions: is the approach conceptually sound, do the outcomes support the intended use, and is the system monitored over time? That is useful precisely because it does not care whether the thing underneath is a regression, a machine learning model, or a RAG pipeline glued to a set of enterprise tools.

The US landscape has now moved. In April 2026, the OCC, Federal Reserve and FDIC issued revised interagency model risk management guidance. The OCC bulletin says the revised guidance rescinds prior model-risk guidance, including OCC Bulletin 2011-12, and it expressly says generative AI and agentic AI models are not within the scope of the new guidance because they are novel and rapidly evolving.² At first glance that sounds like it weakens the SR 11-7 analogy. I think it does the opposite.

The revised guidance still talks about model development and use, validation and monitoring, governance and controls, conceptual soundness, outcomes analysis, ongoing monitoring, vendor products and proportionality.³ But for GenAI and agents it effectively says: this is not neatly solved by the old model-risk box. That moves the burden back to the institution. Define your standards. Define your testing. Define your thresholds. Show your results. Explain why a serious, informed third party would agree that your approach is reasonable.

That is the part people miss. The examiner does not need to understand every implementation detail of your retrieval pipeline, agent runtime, vector store, memory layer, prompt orchestration, tool router or policy engine. They do need to understand whether your governance is rigorous. In practice, the questions are often very simple: what standard did you set, how did you test against it, did the system meet the thresholds, and would an informed outsider agree that your approach is reasonable?

That is not only an American idea. It is how supervision tends to work when technology moves faster than rulebooks.

The same pattern in the UK

In the UK, the PRA’s SS1/23 is a useful parallel. It is not an agentic AI framework; it is a model risk management supervisory statement for banks. But the structure matters more than the label. The PRA wants model risk treated as a risk discipline in its own right and sets out five principles: model identification and risk classification, governance, model development and use, independent validation, and mitigants. It also notes that the expectations are relevant to risks associated with artificial intelligence in modelling techniques, such as machine learning, to the extent those risks apply to models more generally.⁴

Different language, same basic shape. Know what you have. Classify the risk. Govern it. Validate it. Monitor it. Mitigate it. Make sure someone independent can challenge it. For agentic systems, that pattern is useful even when the model-risk label is not quite enough, because the institution still needs to show that the system has been understood, bounded, tested, monitored and challenged.

The FCA is taking a similarly pragmatic line. Its current AI approach says it wants to support safe and responsible AI adoption in UK financial markets, but it does not plan to introduce extra AI regulations. Instead, it says it will rely on existing frameworks, with a principles-based and outcomes-focused approach, and points to frameworks such as Consumer Duty and senior manager accountability.⁵ This matters because if an agent produces poor customer outcomes, the defence is unlikely to be, “the model said so.”

The real questions will be much more mundane. Who was accountable for the use case? What outcome was the firm trying to achieve? What consumer harm was reasonably foreseeable? What controls were in place before deployment? What monitoring showed the system was still behaving acceptably? What happened when it was not? That is not AI theory. That is governance.

The same pattern in Australia

Australia has the same shape again, but it arrives through a different regulatory vocabulary. APRA’s CPS 220 requires a risk management framework appropriate to the institution’s size, business mix and complexity. It defines the framework as the totality of systems, structures, policies, processes and people that identify, measure, evaluate, monitor, report and control or mitigate material risks.⁶ That is almost exactly the kind of organisational frame agent governance needs, because the risk is not sitting inside the model alone.

CPS 230 then pushes the operational resilience angle. APRA describes the standard as strengthening operational risk management, responding to business disruptions and managing risks from service providers for all APRA-regulated entities, with an effective date of 1 July 2025 and transitional treatment for some pre-existing service provider arrangements.⁷ That is highly relevant to agents because many agent failures will not look like classic model failures. They will look like operational failures.

An agent sends the wrong instruction, escalates the wrong case, touches the wrong system, triggers a workflow too early, fails to stop, or depends on a third-party platform whose behaviour changes underneath it. Another agent then consumes the first agent’s output as if it were ground truth, and the failure cascades through a workflow that no one person can see end-to-end. That is operational risk, technology risk, vendor risk, customer risk and conduct risk tangled together.

ASIC’s Report 798 makes the point more directly for AI. ASIC reviewed AI use among AFS and credit licensees, including how they were identifying and mitigating consumer risks and how their governance arrangements were keeping up.⁸ ASIC also warned publicly that a governance gap could emerge as AI adoption accelerates, and emphasised that existing obligations already put the onus on licensees to maintain appropriate governance, compliance measures and due diligence over third-party AI supplier risk.⁹

Again, different language, same burden. Do not wait for the AI-specific rulebook. Use the obligations you already have. Show that the governance works.

So what actually changed?

The mistake is to treat agents as a better version of the systems we have already governed. Most Responsible AI programs were built for bounded systems: a model summarises a document, drafts a memo, classifies a complaint, answers a question, or assists a human who remains inside a known workflow. In those cases, the governance question is often whether the answer is acceptable, so we built controls around review processes, prompt testing, output sampling, bias checks, human sign-off and production approvals.

Those controls are not wrong. For many bounded use cases they are quite sensible. The problem is that agents change the shape of the system. An agent does not merely produce an answer; it may decide what steps to take, call tools, read and write state, interact with other systems, pursue an objective over multiple steps, and operate inside a changing environment. It may also depend on a foundation model or external platform that continues to evolve underneath it.

Singapore’s 2026 Model AI Governance Framework for Agentic AI is useful here because it makes the distinction explicit. It describes agents in terms of autonomy and action-space: how much the agent can decide for itself, and what it is allowed to do in the world.¹⁰ That is the shift. The central question is no longer only whether the answer was good; it becomes who granted authority to act, what the agent can touch, what the boundaries are, how we know it is still working, and how we stop or correct it when it goes off course.

What is the thing we are managing?

Before going further, it is worth clearing up the most overloaded term in this conversation: the system. In agent risk management, the system is not the model. The model is part of the system, but it is not the system. The system is the model plus the prompts, tools, APIs, data sources, retrieval corpus, memory, orchestration layer, permissions, workflow, monitoring, human operators, business rules, vendors, audit logs, escalation paths, organisational incentives and the actual environment in which all this runs.

This isn’t a social system, it isn’t a technical system, it is a socio-technical system. NIST makes this point directly in the AI Risk Management Framework: AI systems are socio-technical in nature, and risks can emerge from the interplay of technical aspects with how the system is used, who operates it, how it interacts with other AI systems, and the social context in which it is deployed.¹¹ This matters because many controls are aimed at the wrong object. A prompt test tells you something about the prompt. A benchmark tells you something about the model under a test condition. A red-team exercise tells you something about a set of attack scenarios. All of that is useful, but none of it is sufficient.

The risk emerges when the agent is placed into an organisation and starts doing work. This is the same lesson we learnt in distributed systems: the component may be fine, but the interaction may not be. The failure is often emergent and complex, not in any isolated part and often difficult to reason about.

Why deterministic safeguards become awkward

Agents are probabilistic systems acting inside deterministic infrastructure. That sentence carries a lot of pain. The enterprise systems around the agent generally expect deterministic behaviour: APIs want valid inputs, databases want consistency, entitlement systems want clear access rules, workflow engines want state transitions, payment systems want precise instructions, and compliance systems want explainable evidence. The agent, on the other hand, is operating through probabilistic reasoning and generated outputs.

So we try to wrap the agent in deterministic controls. Sometimes that works. Access control works. Rate limits work. Human approval gates work. Transaction limits work. Sandboxes work. Allow-lists work. Segregated environments work. But if we try to make every decision deterministic, we often destroy the usefulness of the agent. It becomes a very expensive rules engine with a language interface.

That is the core design tension. We cannot rely on probabilistic behaviour alone, but we also cannot pretend the system is deterministic simply because we have added a few guardrails. The answer is not “more controls” in the abstract. The answer is a layered control system that assumes uncertainty and manages it continuously.

For bounded AI, a lot of the risk conversation sits before deployment. For agents, the risk conversation moves into runtime. Singapore’s agentic AI framework makes this point in practical terms: because agents interact dynamically with their environment and not all risks can be anticipated upfront, organisations should gradually roll out agents with continuous monitoring after deployment.¹², ¹³

That is the right instinct. For material agents, pre-deployment approval is necessary, but it is not the finish line. It is the start of controlled exposure. This is very similar to what good engineering teams learnt with cloud and continuous delivery: you do not make a complex production system reliable by approving it once. You instrument it, monitor it, define rollback, learn from incidents and keep improving the system.

A practical approach

The risk management approach I would put forward has seven parts. It is not a magic framework, and it is deliberately boring. That is partly the point.

1. Define the operating envelope

Start with the domain, not the model. What is the agent actually being used for? What is the business objective? What decisions or actions does it influence? Who can be harmed? What regulation, policy or customer promise applies? What level of error is tolerable? What outcomes are unacceptable? What should cause the system to pause, escalate or stop? In other words, define what “good enough” means before arguing about metrics.

The operating envelope should cover purpose and intended use, users and affected stakeholders, material decisions and actions, relevant obligations and policies, risk appetite, performance expectations, unacceptable outcomes, escalation triggers and stop conditions. The key point is that the operating envelope is not a technical artefact only. It is a management decision. Technology can help enforce it, but the organisation has to define it.

2. Bound the authority

For agents, authority is a first-class risk dimension. This is where many AI governance processes are still too focused on the answer and not focused enough on the action. Map what the agent can actually do: can it read customer data, write to a system of record, send messages externally, approve something, trigger a payment, change a configuration, create or close a case, call another agent, delegate, spend money, or make an irreversible change?

This is the agent’s action-space. Once it is clear, decide where authority needs to be constrained: read-only versus write access, reversible versus irreversible actions, low-value versus high-value transactions, internal-only versus customer-facing communication, advisory versus executable decisions, one-step tasks versus multi-step plans, and single-agent versus multi-agent workflows. The control is not “human in the loop” as a slogan. The control is targeted human approval at points where authority, uncertainty and consequence justify it.

Basically: assume that the agent will make mistakes (much like humans) and limit the blast radius to things that you can withstand or are comfortable with.

3. Test the system, not just the prompt - even better experiment and scale

Prompt testing is useful, but it is not enough. For agents, you need to test the end-to-end behaviour of the system: task completion, tool-use accuracy, permission boundaries, escalation behaviour, refusal behaviour, recovery from bad inputs, behaviour under stale or missing data, behaviour when a tool fails, behaviour when policies conflict, behaviour across multiple steps, behaviour when another agent consumes its output, and customer or business outcomes rather than only text quality.

This is where the old model-risk question of conceptual soundness needs to be translated. For a regression model, conceptual soundness might mean that the theory, assumptions, data and model construction make sense. For an agent, conceptual soundness also means that the objective, tools, permissions, memory, orchestration, policy constraints and human checkpoints make sense for the work being delegated. The unit of validation has expanded. There is also a rub here, quite often you can’t apply traditional testing approaches to agents, simply because its too difficult to create a testing environment/scenario that is representative/isomorphic

4. Instrument the runtime

If you cannot see what the agent is doing, you cannot govern it. That sounds obvious, but it is surprisingly easy to miss when teams are racing to ship agents inside SaaS platforms, workflow tools and internal copilots. At minimum, you need to know what the agent was asked to do, what context it used, what tools it called, what data it accessed, what actions it took, what policy checks passed or failed, where humans approved or overrode it, what the actual outcome was, and whether the outcome remained inside tolerance.

This is not just logging for debugging. It is evidence for governance. It is what makes the difference between “we think the controls work” and “we can show the controls work.” It also creates the raw material for incident response, independent review, audit, model improvement and vendor challenge.

5. Monitor outcomes continuously

Ongoing monitoring is not an afterthought; it is the heart of agent governance. SR 11-7 said models should be monitored over time because conditions and uses change.¹ The 2026 revised US guidance keeps the same core idea: ongoing monitoring should evaluate whether the model is performing as expected as products, exposures, activities, clients, data relevance or market conditions change.³ For agents, this matters even more because the environment is changing and the system may be acting across multiple platforms.

Monitor the things that matter: action success and failure rates, policy breaches, near misses, customer complaints, human overrides, escalation patterns, unusual tool-call behaviour, permission use, drift in retrieved content, drift in model behaviour, vendor or platform changes, concentration of decisions or failures, and aggregate risk across agents. The important thing is not having a dashboard. The important thing is whether the dashboard is connected to decisions. If thresholds are breached, what happens? Who is notified? What is paused? What gets reviewed? What authority is revoked? What gets rolled back? Without that, monitoring is theatre.

6. Trust, but verify

Someone independent needs to challenge the system. This is another old idea that becomes more important with agents, because the team that builds the agent will usually believe the agent is reasonable. They understand the constraints, the trade-offs, the edge cases and the roadmap. All of that may be true. It is also why independent review matters.

Independent review should challenge whether the operating envelope is clear enough, whether the thresholds make sense, whether the test set is representative, whether the action-space is too broad, whether monitoring covers actual harms, whether incidents are classified correctly, whether remediation is timely, whether vendor changes are understood, and whether aggregate risk is visible. The PRA makes independent validation a standalone principle in SS1/23.⁴ ASIC asks whether governance and risk management arrangements are keeping pace with AI use and whether firms are managing consumer impact and third-party supplier risk.⁹ Same basic idea: trust the delivery team to build useful things, then verify independently that the organisation can live with what has been built.

7. Correct quickly

A risk management system that identifies issues but cannot change anything is not a risk management system. It is a reporting system. When an agent moves outside the operating envelope, the organisation needs a playbook. This may include immediate actions (such as contacting customers) and then a longer term response. The response might be to reduce the action-space, remove write access, add a human approval gate, change the workflow, tighten retrieval sources, change the prompt or policy layer, change the model, change the vendor, retrain users, add monitoring, roll back the release or decommission the agent.

This is where the convergence idea comes in. We are not trying to prove that the system will never fail. We are trying to make sure that when the system drifts, fails, surprises us or starts producing poor outcomes, the organisation has enough sensing, authority and discipline to pull it back. This is the virtuous cycle that drives organisational improvement, continuous improvement with transparency and accountability.

So what is agent risk management?

Here is my working definition: agent risk management is the organisational supervision of delegated autonomy in a socio-technical system. It is the set of policies, controls, measurements, accountabilities and feedback loops that keep agents, humans, tools, data and platforms operating within an acceptable envelope of performance, policy and risk over time.

That definition is intentionally not model-centric. The model matters, but the model is not the unit of control. The unit of control is the socio-technical system doing work.

The deepest shift is not from traditional AI to generative AI, or from generative AI to agents. The deepest shift is from certification to convergence. A certification mindset asks whether we can approve this system as safe. A convergence mindset asks whether we can keep this system good enough for our purposes as it changes, interacts, fails and gets corrected.

That is a much better question, and it is also a much harder organisational capability. It requires product teams, risk teams, compliance, technology, operations, data teams, vendors and business owners to work together. It requires clear authority, instrumentation, thresholds, independent challenge, incident discipline and the humility to accept that the first design will be wrong in some important ways.

But this is the work. Agents are not just a new interface. They are a new delegation mechanism. Once a firm delegates work to a probabilistic system, the governance question becomes very practical: what did you allow it to do, how did you know it was doing the right thing, and what did you do when it was not?

That is the conversation regulators, boards and risk functions are going to keep coming back to. Not because every regulator has an agentic AI rulebook, but because good risk management has always required the same basic discipline: understand the thing, define the acceptable range, measure what happens, challenge the evidence, and correct the system when reality disagrees with the plan.

References

Attribution

This document was written by me in conjunction with GPT 5.4, the concepts and ideas are my own.

Board of Governors of the Federal Reserve System, SR 11-7: Guidance on Model Risk Management and attachment, 2011. https://www.federalreserve.gov/supervisionreg/srletters/sr1107.htm and https://www.federalreserve.gov/supervisionreg/srletters/sr1107a1.pdf ↩︎ ↩︎
Office of the Comptroller of the Currency, OCC Bulletin 2026-13: Model Risk Management: Revised Guidance, 17 April 2026. The bulletin states that the agencies are issuing updated interagency guidance, rescinding certain prior OCC model-risk issuances, including OCC Bulletin 2011-12, and that generative AI and agentic AI models are not within the scope of the guidance. https://www.occ.treas.gov/news-issuances/bulletins/2026/bulletin-2026-13.html ↩︎
Office of the Comptroller of the Currency, Federal Reserve Board and FDIC, Supervisory Guidance on Model Risk Management, 17 April 2026. The guidance discusses model development and use, validation and monitoring, governance and controls, conceptual soundness, outcomes analysis, ongoing monitoring, and vendor and third-party products. https://www.occ.treas.gov/news-issuances/bulletins/2026/bulletin-2026-13a.pdf ↩︎ ↩︎
Bank of England / Prudential Regulation Authority, SS1/23 – Model risk management principles for banks, published 17 May 2023 and effective from 17 May 2024. https://www.bankofengland.co.uk/prudential-regulation/publication/2023/may/model-risk-management-principles-for-banks-ss ↩︎ ↩︎
Financial Conduct Authority, AI and the FCA: our approach, first published 8 September 2025 and updated 13 February 2026. https://www.fca.org.uk/firms/innovation/ai-approach ↩︎
APRA, Prudential Standard CPS 220 Risk Management, July 2017. https://www.apra.gov.au/sites/default/files/Prudential-Standard-CPS-220-Risk-Management-%28July-2017%29.pdf ↩︎
APRA, Operational risk management, including CPS 230 materials and implementation timeline. https://www.apra.gov.au/operational-risk-management ↩︎
ASIC, REP 798: Beware the gap — Governance arrangements in the face of AI innovation, released 29 October 2024. https://www.asic.gov.au/regulatory-resources/find-a-document/reports/rep-798-beware-the-gap-governance-arrangements-in-the-face-of-ai-innovation/ ↩︎
ASIC, 24-238MR: ASIC warns governance gap could emerge in first report on AI adoption by licensees, 29 October 2024. https://www.asic.gov.au/about-asic/news-centre/find-a-media-release/2024-releases/24-238mr-asic-warns-governance-gap-could-emerge-in-first-report-on-ai-adoption-by-licensees/ ↩︎ ↩︎
IMDA Singapore, Model AI Governance Framework for Agentic AI, Version 1.0, published January 2026. https://www.imda.gov.sg/-/media/imda/files/about/emerging-tech-and-research/artificial-intelligence/mgf-for-agentic-ai.pdf ↩︎
NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0), January 2023. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf ↩︎
IMDA Singapore, Model AI Governance Framework for Agentic AI, Version 1.0, including guidance on gradual rollout and continuous monitoring after deployment. https://www.imda.gov.sg/-/media/imda/files/about/emerging-tech-and-research/artificial-intelligence/mgf-for-agentic-ai.pdf ↩︎
AI Trust OS - A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments. https://arxiv.org/pdf/2604.04749 ↩︎

Agent Risk Management: Managing Delegated Autonomy Over Time#

The strange usefulness of risk management ideas#

The same pattern in the UK#

The same pattern in Australia#

So what actually changed?#

What is the thing we are managing?#

Why deterministic safeguards become awkward#

A practical approach#

1. Define the operating envelope#

2. Bound the authority#

3. Test the system, not just the prompt - even better experiment and scale#

4. Instrument the runtime#

5. Monitor outcomes continuously#

6. Trust, but verify#

7. Correct quickly#

So what is agent risk management?#

References#

Attribution#