There's a hope, rarely said out loud, that the Federated Data Platform will somehow tidy things up. That once the messy, fragmented data of a large trust is connected to a shiny national platform, the inconsistencies will resolve, the duplicates will reconcile, and everyone will finally be looking at the same numbers.

It won't, and they won't. A federated intelligence layer is a magnifying glass, not a filter. It surfaces and amplifies the quality of whatever you feed it - and it does so with a confidence and a reach that makes poor data more dangerous, not less, because now it's presented cleanly and shared widely. The single most important principle for any trust onboarding the FDP is also the least glamorous: assure your data quality before it flows in, not after. This post is about how to do that - practically, and in a way that holds.

Why the FDP makes data quality non-negotiable

A local report built on shaky data is a contained problem. The analyst who made it usually knows its caveats, the audience is small, and the blast radius is limited. Push that same data through a federated platform and three things change at once.

It becomes visible - surfaced in standardised products and dashboards that carry an implicit authority. It becomes comparable - set alongside other organisations' data through a shared model, so any inconsistency in how you define or record something now shows up as a discrepancy. And it becomes actionable - feeding operational decisions about waiting lists, capacity, and flow. Data that was merely untidy in a local spreadsheet becomes a basis for decisions once it's in the platform.

This is why the consistent message from FDP practitioners is that quality has to be sorted upstream. The platform's strength - connecting and standardising data across the NHS - is precisely what turns an upstream quality problem into a system-wide one. (It's also why the FDP doesn't replace your local data layer; that layer is where the quality work happens, as we covered in our piece on whether the FDP replaces your data warehouse).

What "data quality" actually means

"Data quality" is too vague to act on until you break it into dimensions you can measure. For NHS data, six matter most:

Accuracy - does the data correctly describe the real-world thing? (Is the recorded NHS number actually the patient's?)

Completeness - are required fields populated, or riddled with nulls and "unknown"s?

Consistency - does the same fact agree across systems, or does the EPR say one thing and the warehouse another?

Timeliness - is the data current enough for the decision it informs?

Validity - does it conform to expected formats, ranges, and code sets (e.g. valid SNOMED or ICD codes)?

Uniqueness - is each entity recorded once, or are duplicate patient and episode records inflating your counts?

Naming the dimensions matters because it turns "our data's a bit rough" into a set of specific, testable questions - and because different fixes apply to different dimensions. You can't improve what you haven't defined.

Where NHS data quality actually breaks

Quality problems are rarely random. In an acute trust they cluster in a few predictable places, and knowing them tells you where to look first.

Manual re-keying and shadow tools. Wherever a human retypes data from one system into another - or into one of the hundreds of ungoverned Access databases and spreadsheets most trusts run - accuracy and consistency erode. Every manual hop is an opportunity for divergence.

Interface fragmentation. Multiple integration points and inconsistent messaging (the familiar tangle of HL7 v2 feeds) mean the same data can arrive differently in different places.

Merger divergence. Where organisations have combined, the same concept is often recorded two ways, against two code sets, by two teams who never agreed a single definition.

Free text and inconsistent coding. Unstructured entry and variable coding discipline undermine validity and comparability - exactly the dimensions a national platform stresses hardest.

Notice that none of these is fixed inside the FDP. They're all upstream, in the systems, interfaces, and habits that produce the data. Which points to the central discipline.

Fix it at source, not in the pipeline

When quality problems surface, the tempting shortcut is to patch them downstream - a cleansing script in the pipeline, a manual correction in the report. It feels faster. It's a trap. Downstream fixes are invisible, undocumented, and fragile: they break when the data shifts, they multiply as each team builds its own, and they leave the actual source untouched, so the problem regenerates indefinitely.

Fixing at source - correcting the EPR configuration, the data-entry process, the interface mapping, or the definition that's causing the error - is slower to start and far cheaper to sustain. It fixes the problem once, for every downstream consumer, permanently. The rule of thumb: a downstream cleanse is acceptable only as a documented, temporary stopgap while the source fix is in train - never as the answer.

There's an organisational dimension to this that's easy to miss. Source fixes often require engaging the people who create the data - the ward clerk, the coding team, the clinician at the keyboard - not just the technical team who move it. That's harder than writing a script, because it means understanding why the data is entered the way it is, and sometimes redesigning a process or a form rather than blaming the user. But it's also where the durable wins live: a free-text field turned into a structured drop-down, a confusing screen redesigned, a definition agreed and trained out, removes a whole class of errors at the point they would otherwise be born. Quality improvement, done properly, is as much about workflow and engagement as it is about pipelines.

Governance that makes quality stick

Data quality isn't a one-off cleanse; it's a state you maintain, which makes it a governance problem as much as a technical one. Four things turn a one-time improvement into a durable standard:

Clear ownership. Every critical dataset needs a named owner accountable for its quality, and data stewards close to the source who can act on issues. Quality with no owner decays by default.

Agreed definitions. A shared business glossary so "admission," "discharge," or "waiting time" means one thing across the trust. Most consistency problems are really definition problems in disguise.

Automated, continuous checks. Quality rules that run automatically against the six dimensions and flag breaches early - embedded health checks rather than periodic manual audits. Automation is what makes quality affordable at scale.

Data lineage. The ability to trace any figure back to its source, through every transformation. Lineage is what lets you diagnose a problem at root rather than guessing - and it's increasingly essential evidence for compliance.

This is also where quality work and your wider obligations converge. The same ownership, definitions, and controls that improve quality are the evidence base for the Data Security and Protection Toolkit, support compliance under the NHS Records Management Code of Practice, and underpin the trust in data that every downstream use - operational, statutory, and research - depends on. Good governance isn't a parallel workstream to quality; it's the mechanism that makes quality last.

Be pragmatic: prioritise what flows into the FDP

A trust could spend years trying to perfect every dataset and never finish. Don't. The pragmatic move is to scope quality effort by what's heading into the platform and what decisions it drives. Identify the datasets that feed the FDP's operational use cases, assess them against the six dimensions, and fix the highest-impact problems at source first. Quality is not a binary you achieve estate-wide before you can start; it's a risk you reduce, deliberately, where it matters most. Sequence it like any other migration work - by impact and risk, not by trying to boil the ocean.

A practical first step: five questions before you onboard

Which datasets will flow into the FDP, and have we assessed them against the six quality dimensions?

Where is data manually re-keyed or sourced from shadow tools - the most likely points of divergence?

Does every critical dataset have a named owner and steward accountable for its quality?

Are we fixing problems at source, or quietly patching them downstream where they'll regenerate?

Can we trace a figure back through its lineage to diagnose issues - and evidence compliance?

If the answers aren't clear, that's the work to do before onboarding gathers pace - not after the platform has already amplified whatever you gave it.

At VE3, we help NHS acute trusts assure data quality ahead of FDP onboarding - profiling the data that matters, fixing problems at source, and standing up the ownership, automated checks, and lineage that keep quality and compliance durable. If you'd like our pre-FDP data quality assessment checklist, get in touch.