The use case everyone wants and underestimates
If you wanted to pick the single AI use case with the most obvious payoff in public services, you would pick document generation. Statutory reports, committee papers, FOI responses, complaint replies, assessments, letters to residents: drafting consumes an enormous share of professional time, and much of it is structured, repetitive and ripe for assistance. Point an AI at it and the savings are immediate and real. Little wonder it is the use case vendors lead with and buyers reach for first.
Here is the uncomfortable argument of this article. Document generation is not only the highest-payoff use case in the public sector - it is also the most quietly dangerous. And the reason is precisely what makes it appealing: the work product is fluent, confident prose, which is exactly what large language models are best at producing and worst at guaranteeing to be true.
In public services, the most dangerous AI output is not the one that reads badly. It is the one that reads well.
The fluency trap
Large language models are, at their core, engines for predicting plausible language. They optimise for how a sentence should sound, not for whether it is true. That is why they hallucinate - generating confident, well-formed statements that are subtly or completely wrong, delivered in exactly the authoritative register a reader expects from an official document. Research on generative AI in legal and administrative settings is blunt about this: these systems prioritise linguistic fluency over factual accuracy, and the persuasive, human-like quality of the output actively encourages professional overreliance.
That overreliance has a name - automation bias, and it is the second half of the trap. People tend to accept AI-generated output even when contradictory evidence is in front of them, and the more polished the output looks, the stronger the pull. So the failure mode in document generation is not the obvious garbage a reviewer would catch. It is the immaculate paragraph with a transposed figure, a misremembered policy, a subtly wrong eligibility rule or an invented reference - signed off precisely because it looked right. The better the draft reads, the less it gets checked.
In a marketing email, that is an inconvenience. In a statutory document, it is a decision taken on a false basis.
Why this matters more in public services than anywhere else
Public sector documents are not just text. They carry decisions and discharge legal duties. An education, health and care plan shapes the support a child receives. A complaint response can be escalated to an ombudsman. An FOI answer is a statutory obligation with deadlines and appeal rights. A committee report informs a decision taken in public and subject to challenge. The content of these documents has consequences for real people, and accountability for that content cannot be delegated to a model that cannot be held responsible for anything.
This is the heart of what some scholars now call the capability–accountability gap: AI's ability to produce official-looking work has raced ahead of any mechanism to hold it accountable for the work's correctness. In the public sector, that gap is not an abstract concern. It is the difference between a tool that helps an officer discharge their duty and one that quietly transfers risk onto them while appearing to lift it.
The wrong metric and the right one
Most AI document tools are sold on speed. “Draft a complaint response in seconds.” “Generate a report in one click.” It is a seductive pitch, and it optimises for the wrong thing. Speed of drafting is trivial; any model can produce fluent text fast. The hard part - the part that determines whether the tool creates value or liability - is verification.
So the question to ask of any document-generation tool is not “how good is the draft?” It is: “how easily, and how reliably, can a human verify it?” Reframed that way, the metric that matters is not raw time saved but trustworthy time saved - time returned to officers that does not create downstream risk, rework or challenge. A tool that produces a beautiful draft a reviewer cannot efficiently check has not saved time at all. It has merely moved the work, and added a hazard.
Don't measure a document tool by how good the draft looks. Measure it by how easily the draft can be trusted.
What good actually looks like: built for verification, not just generation
If verification is the real challenge, then a responsible document-generation capability is designed around it. Six features distinguish a tool that saves trustworthy time from one that manufactures risk at scale.
1. Grounded in approved sources, not the model's memory
The draft should be assembled from the organisation's own approved content and the actual case record - using retrieval from authoritative sources - rather than from whatever the underlying model happens to have absorbed. Grounding is the single most effective defence against hallucination: if every claim must come from a real, supplied source, the model has far less room to invent.
2. Citation and traceability by default
Every factual claim in the draft should link back to where it came from, so a reviewer can confirm it in seconds rather than re-researching from scratch. As commentators tracking AI and law into 2026 observe, provable transparency and audit trails are shifting from nice-to-have to expected - and “explainable by design” is becoming a competitive advantage, not a compliance chore.
3. Friction in the right place
Good design fights automation bias rather than feeding it. The tool should make genuine review easy and expected - surfacing what to check, flagging low-confidence passages, and never offering a frictionless one-click path to send an unread document. A little well-placed friction is a feature, not a flaw.
4. Human authorship and accountability
The officer edits, approves and owns the final document. AI drafts; it never decides, signs or sends. This is not a courtesy to add later - it is the architecture that keeps accountability where the law and the public expect it to sit. (Our companion guide on responsible AI sets this out in full.)
5. Fit to template, tone and statutory requirement
A draft that ignores the organisation's house style, mandatory sections or statutory wording creates as much rework as it saves. A tool fit for public sector use is shaped to the document types it serves - not a generic writer bent awkwardly to fit.
6. Auditable and under your control
Who generated what, from which sources, and who approved it should all be logged. And the case data and the drafts themselves should stay within the organisation's environment, never used to train external models - the data-sovereignty point we explore separately. Sensitive content does not stop being sensitive because an AI helped write it.
The checklist: questions to ask before AI drafts anything official
A practical test for any document-generation solution headed for statutory or public-facing work.
- Are drafts grounded in our approved content and the actual record - or in the model's general training?
- Does every factual claim link to a verifiable source we can check quickly?
- Does the design encourage genuine review, or a one-click path to accept and send?
- Does an accountable officer always edit, approve and own the final document?
- Is output shaped to our templates, tone and statutory requirements?
- Is there an audit trail of what was generated, from where, and who approved it?
- Do our data and the drafts stay in our environment, and are they ever used to train any model?
- How is accuracy measured and monitored over time, not just demonstrated once?
The real prize - claimed safely
None of this is an argument against AI document generation. Done well, the prize is exactly as large as the hype suggests - and, crucially, safe to bank. Officers freed from the blank page and the boilerplate can spend their judgement where it actually matters. Responses become faster and more consistent. Backlogs that have dogged FOI and complaints functions for years become tractable. Public bodies elsewhere are already using automated drafting to cut cycle times while maintaining compliance - the two goals are not in tension when the tool is built right.
The shift that makes this real is a shift in what we ask for. In 2026, human-in-the-loop generative AI is no longer an innovation story; it is the baseline expectation. The organisations that benefit most will be those that stop being dazzled by fluent drafts and start demanding verifiable ones - treating the ability to check the work as the headline feature, not the fine print.
Build it in, don't bolt it on
The clearest sign of a document-generation tool fit for public services is that it is built around verification, not just generation. Our own approach reflects that: drafts are grounded in the organisation's approved sources and records; claims are traceable to where they came from; an accountable officer always reviews, edits and owns the final document; output is shaped to the organisation's templates and statutory needs; and everything is logged, kept within the organisation's environment, and never used to train external models. The goal is not a faster way to produce documents nobody has truly checked. It is a faster way to produce documents people can stand behind.
The best-written draft was never the goal. The most verifiable one is. Reframe success that way, and AI document generation becomes one of the most valuable - and most defensible - investments a public body can make.
If you are weighing up AI for drafting in statutory or public-facing work, we would be glad to share how to capture the time savings without inheriting the risk. Visit us for more information.


.png)
.png)
.png)



