‍The governance trade-off that will define how public sector organisations adopt AI-powered simulation

AI-powered scenario gaming is gaining real traction in government. Defence agencies, transport departments, and policy teams are turning to simulation tools to stress-test decisions before they are made in the real world. But as these tools mature, a critical design question has emerged: should AI in a scenario game be tightly controlled, with pre-approved content and human-governed outputs, or should it generate scenarios dynamically, adapting to each session in real time?

The answer matters far more in a government context than it does in a commercial one. Getting it wrong does not just affect the quality of a workshop. It can undermine policy integrity, create reputational exposure, and produce outputs that no one can defend.

The Rise of AI-Enabled Simulation in Government

Scenario gaming and tabletop exercises have long been standard practice in government. What is changing is the use of AI to power them. The US Army ran its second AI Tabletop Exercise in April 2026, focused on agentic AI for cyber defence. The AI 2027 project has run more than 30 tabletop exercises with policymakers and congressional staffers to model AI risk trajectories. CISA has used AI tabletop exercises with government and industry partners to test cyber resilience at scale.

In the UK, the Department for Transport published its Transport Artificial Intelligence Action Plan in June 2025, setting out a clear ambition to embed AI across the transport ecosystem. Public sector bodies are now exploring AI-enabled simulation as a tool for foresight, policy development, and risk identification, not just crisis response.

Alongside this, Bloomberg Cities reported in March 2025 that government leaders are increasingly deploying AI as a practice partner to run teams through crisis simulations, using plain-English prompts rather than technical tools. The demand is real. So is the governance gap.

What the Distinction Actually Means?

The terms are sometimes used loosely, so it is worth being precise.

Controlled AI operates within a pre-agreed framework. Scenarios, event injects, scoring dimensions, and consequence logic are designed and reviewed upfront by subject matter experts. AI supports the session by explaining trade-offs, summarising discussion, varying presentation, and analysing outputs. But the substantive content, the actual policy scenarios, the decisions on offer, and the outcomes, has been human-governed from the start.

Generative AI creates content dynamically during the session. Given a decision by a team, it generates consequences, produces follow-on events, and adapts the scenario in real time. The experience is richer and more reactive. But the content has not been reviewed, verified, or approved before it appears on screen in front of a senior leader.

The distinction is not about which model is more advanced. Generative mode often feels more sophisticated. The question is which model is appropriate for the context in which it is being used.

Why Government Faces a Different Risk Profile ?

In a commercial innovation workshop, a generatively produced scenario that slightly misrepresents a regulatory position is a minor inconvenience. In a government policy session, the same output could be taken as a signal of departmental intent, quoted in a press briefing, or shape a senior leader's understanding of a live policy area.

There are three specific risks that make this governance question acute for public sector organisations.

Policy contamination. When AI generates scenario content live, it draws on its training data rather than on verified departmental information. It may reflect outdated policy, reproduce incorrect regulatory positions, or generate plausible-sounding but inaccurate consequence logic. In a room of senior officials, inaccurate content presented through a technology interface can carry unwarranted authority.

Auditability and accountability. Government decisions, even those made in a training exercise, carry scrutiny. If a session produces an output that feeds into a policy paper or a ministerial briefing, and that output was generated by an uncontrolled AI model, there is no audit trail. No one can point to the human who reviewed and approved the content before it was used.

Reputational and legal exposure. Generative AI can produce content about real organisations, real policies, and real people. In a live policy gaming session, this creates genuine exposure. An AI-generated event inject that speculates about a named operator's failure, or an AI-produced consequence that implies a particular regulatory outcome, may not be intended as anything more than a simulation prompt. But it can cause harm if it leaves the room.

What a Controlled Model Actually Enables?

A well-designed controlled AI model is not a compromise. It is the appropriate architecture for a government context, and it enables significant capability that pure human facilitation cannot match.

Before a session, AI can help design and refine scenario content at scale. What might take a team of policy analysts weeks to develop, AI can produce as a set of variants in hours, with subject matter experts then reviewing and approving the final content. The speed advantage is real, even if the generative step happens before the session rather than within it.

During a session, controlled AI can explain complex concepts in plain English, summarise discussion themes in real time, suggest follow-on questions for the facilitator, and respond to participant queries within the boundaries of the pre-approved scenario. This is genuinely valuable. It reduces the facilitation burden and makes sessions accessible to non-specialists.

After a session, AI can analyse decisions across multiple runs, identify patterns, generate risk maps, and produce structured debrief outputs. This is arguably the highest-value application. It turns a single workshop into a body of evidence that can feed into policy development, strategic planning, or capability assessments.

The strongest argument for controlled AI is not that it is safer. It is that it produces outputs government can actually use. A decision log from a controlled session is citable. A consequence generated by an unreviewed model is not.

When Generative Mode Has a Place

Generative AI in scenario gaming is not inherently inappropriate. There are contexts where it adds genuine value and the risks are manageable.

Internal innovation labs, where the purpose is to stretch thinking rather than produce formal outputs, can benefit from the unpredictability of generative content. Exploratory horizon-scanning sessions, where no formal policy decision is downstream, are lower-risk environments. Scenarios focused on blue-sky futures, where speculative content is expected and clearly framed as such, are different from sessions that examine live operational risks.

The key variable is what happens to the output. If the session is exploratory and the outputs are used purely to stimulate discussion, with no downstream policy or communications use, generative mode may be entirely appropriate. If the output will inform a ministerial brief, feed into a risk register, or be shared beyond the room, the governance standard needs to be higher.

A Framework for Decision-Makers

Organisations commissioning AI-powered scenario gaming should ask these questions before agreeing a delivery model.

Who will be in the room? Senior leaders and ministers require a higher governance standard than policy analyst teams.

What will happen to the outputs? Formal documentation, policy papers, and communications use all require human-reviewed content.

Are the scenarios live or hypothetical? Sessions that touch live policy areas, named organisations, or real operational risks need controlled content.

How many times will this run? Controlled AI is designed for reuse. Pre-approved content can be run repeatedly, with AI supporting variation and analysis across sessions.

What is the assurance requirement? Public sector procurement often requires auditability. Generative outputs cannot be audited retroactively.

The answer to most of these questions, in a government context, points toward controlled AI as the default, with generative features available for specific, lower-risk applications.

What This Means in Practice

The most effective architecture for government scenario gaming combines both approaches in sequence rather than in competition. AI generates and refines content during the design phase, subject to human review and approval. Controlled AI governs the live session. Generative analytics operate freely on the output data after the session, where there is no risk of unreviewed content entering a policy process.

This is not a limitation of ambition. It is a recognition that AI tools deliver maximum value in government when they are configured for the accountability standards the context demands. The question is not whether AI should power scenario gaming. It clearly should, and it will. The question is how the governance architecture is designed around it.

For organisations building these capabilities now, that design decision is one of the most consequential choices they will make.