Digital Transformation

What Does It Actually Mean to Build a Data Twin of an Acute Trust?

Pamela Sengupta

There is a phrase circulating in NHS digital transformation circles that sounds simultaneously obvious and impossibly ambitious: the data twin of an acute trust. It appears in strategy documents, it surfaces in CDO conversations, and it sits quietly behind many of the more interesting discussions happening around the NHS Federated Data Platform right now. But most of the time when it gets said out loud, the room fills with a mixture of enthusiasm and quiet uncertainty. Everyone nods. Fewer people could tell you, in concrete architectural terms, what it would actually take to build one.

This article is an attempt to answer that question with some precision. The concept is not a vision statement. It is an architectural commitment with very specific engineering implications, and the trusts and technology partners who understand those implications early will be the ones shaping national standards rather than scrambling to meet them.

1. The FDP Context: Why This Moment Matters

The NHS Federated Data Platform, built on Palantir's Foundry software and live across more than 110 hospital trusts in England as of early 2026, represents the most significant data infrastructure investment the NHS has made in a generation. NHS England's Medium-Term Planning Framework for 2026 to 2029 sets out the expectation that all providers in acute, community, and mental health sectors will onboard to the FDP and begin using its core products. The programme's own business case estimates benefits in the order of £780 million over its seven-year appraisal period, with a benefit-cost ratio of close to five to one. (Federated Data Platform Programme: accounting officer assessment, 2025)

Against that backdrop, the stakes of getting foundational architecture right could not be higher. Trusts that are onboarded to FDP and connect a few dashboards will capture some value. Trusts that use FDP as the occasion to build a genuine data twin of their organisation will create something of a different order entirely — an asset that serves operational reporting, national statutory reporting, research, population health management, ICB data flows, and AI modelling from a single, unified, semantically coherent source.

The data twin is not a destination that exists beyond the horizon. It is the goal that gives all the hard engineering and data modelling work its purpose.

University Hospitals of Leicester NHS Trust is among the acute pathfinder organisations doing this work now. As both an FDP pioneer and SRO for a programme to modernise its entire data and analytics platform, UHL is co-designing the acute canonical data model with NHS England — creating architecture that over a hundred trusts behind them will eventually adopt. The decisions being made at this vanguard will define what best practice looks like for the whole system.

2. What a Data Twin Actually Is and Is Not

A data twin of an acute trust is a digital representation of how that trust operates, expressed entirely in data. Not a reporting dashboard. Not a well-organised data warehouse. A living, structured, semantically coherent model of the trust that captures patient pathways, clinical workflows, operational flows, workforce patterns, and resource states in a form that can be queried, analysed, and acted upon in near real time — and that serves every downstream use from operational management to research from a single source of truth.

The distinction matters enormously. Most NHS trusts have data. They have plenty of it, distributed across legacy systems that accumulated over decades of departmental purchasing decisions. An acute trust of significant size might have thirty or more source systems: an EPR, theatre management systems, radiology information systems, laboratory systems, patient administration systems, HR platforms, finance systems, community care records, and various departmental databases carrying years of business logic encoded in stored procedures that nobody fully documented. (Sollof, 2021)

The problem has never been the absence of data. The problem is that none of it talks to itself in a coherent, standardised way, which means every analytical request requires bespoke extraction, every AI model trains on a different slice of reality, and the organisation can never develop a single authoritative view of what is happening across its operations. A data twin solves this by placing a canonical data model at the centre of everything.

The Canonical Data Model: The Intellectual Backbone

The canonical data model, or CDM, is the architectural backbone of the entire effort. It is a structured, domain-by-domain definition of what the organisation's data means: what a patient encounter is, what an inpatient episode consists of, what the components of a referral-to-treatment pathway are, how theatre utilisation is defined, what constitutes a discharge, how workforce availability is represented. NHS England owns the data ontology commissioned to support the FDP, and that ownership matters because the CDM is ultimately where the long-term value sits — portable, interoperable, and free from any single vendor's intellectual property.

In an acute trust, building a meaningful CDM means working across at least fifteen distinct clinical and operational domains: inpatient care, outpatient management, emergency department flows, theatre and surgical services, maternity, critical care, diagnostics, pharmacy, radiology, workforce management, finance, community care interfaces, RTT pathway tracking, cancer tracking, and supply chain. Each domain has its own data characteristics, its own source systems, its own coding conventions, and its own definitional complexities that have often evolved differently even within the same organisation.

The development of a CDM is not a software problem. It is a knowledge problem. It requires deep clinical domain expertise, an understanding of how source systems encode information, familiarity with national standards such as FHIR and SNOMED CT, and the organisational relationships to engage with clinical leads, information governance teams, and operational managers to resolve definitional disagreements that have sometimes been festering for years. (NHS Federated Data Platform infrastructure, 2023)

3. The Architecture: Medallion Layers Inside Palantir Foundry

The architectural approach that makes a data twin genuinely viable as opposed to just another data warehouse migration is the medallion architecture. The name comes from the tiered structure of the data, moving through raw, curated, and semantic layers, sometimes labelled bronze, silver, and gold. (Medallion Architecture Migration Guide | Bronze Silver Gold Layers | Lakehouse, 2025)

Raw Layer: Ingest and Preserve

The raw layer captures data as it arrives from source systems, preserving its original form and provenance. Nothing is transformed or interpreted at this stage. The raw layer is the system of record for what arrived, when it arrived, and from where providing the audit trail that NHS information governance obligations require.

Semantic Layer: Apply the CDM and Ontology

The semantic layer applies the canonical data model and the ontology, making the data meaningful and queryable in clinical and operational terms rather than system terms. In Palantir Foundry, this layer is instantiated through the Ontology Manager, with analytical access provided through Contour, Quiver, and Workshop. (Cooper, 2023) This is the layer that makes it possible for a clinician to ask a question about a patient pathway and receive an answer that draws on data from a dozen source systems, harmonised and interpreted through a consistent semantic framework.

Getting the full pipeline right, from source system connectivity through to a queryable semantic layer, requires genuine Foundry engineering capability alongside clinical informatics expertise. These are two very different skill sets, and the scarcity of people who hold both is one of the central constraints on how quickly this work can scale across the NHS.

Having the tools and the framework is not the same as having the implementation. The CDM sets national standards, but AI-ready analytics require ontological depth at the trust level that no national programme can specify centrally.

4. The Legacy Data Challenge: More Archaeology Than Engineering

The data ingestion question is more complicated than it is often made to appear in vendor presentations. Source systems in acute trusts vary enormously in how they expose data. Some offer modern APIs. Many rely on database extracts through SSIS packages or similar ETL tooling that dates back years and encodes business logic that has never been formally documented.

When an organisation has been running the same SQL Server data warehouse for a decade, the stored procedures that transform raw clinical data into reportable metrics carry institutional knowledge that cannot be recovered simply by reading the code. Migrating this kind of environment into Foundry requires a discovery phase that is as much archaeology as engineering identifying what each transformation was designed to achieve, which ones are still valid, which ones encode clinical definitions that have since changed, and which ones are simply workarounds for problems in source systems that no longer exist.

Skipping this phase in favour of moving quickly is one of the most common reasons FDP implementations struggle to deliver value after the initial onboarding. The data arrives in Foundry, but it does not behave the way the team expects, because the business logic that gave it meaning in the old environment has not been carried across. The pipeline runs; the outputs mislead.

Incremental vs Snapshot: An Architectural Decision with Long-Term Consequences

The choice between incremental and snapshot ingestion is another area where decisions made at the outset have consequences that compound over time. Snapshot ingestion — periodically copying entire datasets is simpler to implement but creates significant compute and storage overhead as data volumes grow, and makes it harder to maintain a clear lineage of how data has changed over time. Incremental ingestion, which captures only changes since the last extraction, is more efficient and supports near-real-time use cases but requires more sophisticated engineering to handle late-arriving data, deletions, and source system changes.

For a trust-building toward a genuine data twin rather than a periodic reporting environment, incremental ingestion aligned with clear data lineage tracking is the right architectural direction even though it demands more upfront investment to implement correctly. Trusts that choose the simpler path for speed will typically find themselves re-engineering the ingestion layer within eighteen months.

5. Reference Data Management, Lineage, and Governance

The Reference Data Problem

Reference data management is one of those infrastructure challenges that receives less attention in architectural discussions than it deserves in operational reality. An acute trust data environment contains dozens of lookup tables, coding schemes, and reference datasets that underpin the meaning of clinical data: SNOMED CT codes, ICD-10 and ICD-11 classifications, OPCS procedure codes, specialty codes, ward and location hierarchies, staff role classifications, and commissioning reference data, among many others. (DAPB0084: OPCS-4.11 Requirements Specification, n.d.)

These reference datasets change over time. They are used inconsistently across source systems. They create significant alignment challenges when data from multiple systems is brought into a unified model. Building a reference data management capability that tracks the version and provenance of every reference dataset, maintains mappings between different coding schemes, and propagates changes through the data model correctly is unglamorous infrastructure work. It is also absolutely foundational to the reliability of everything built on top of it. An AI model trained on data where speciality codes are inconsistently applied across source systems will learn patterns that reflect coding variation rather than clinical reality.

Data Lineage as Governance Infrastructure

The data lineage question connects directly to the NHS information governance obligations that no FDP implementation can sidestep. Any platform serving as an organisation's primary analytical environment needs to make it clear, at any point in the data lifecycle, where a given data element came from, how it was transformed, who can access it, and what it has been used for. A robust data lineage framework is not an optional enhancement to add later. It is a prerequisite for using the platform in clinical and operational contexts where decisions have patient safety implications.

The FDP governance model places NHS trusts as the data controller for their own instances. The trust is accountable for how data is processed, who has access to it, and what analytical use cases it supports. (NHS Federated Data Platform infrastructure, 2025) Building a data twin without a clear lineage and access control architecture is not just technically incomplete. It is an information governance liability.

6. The Service Management Reality: A Data Twin Is Never Done

Perhaps the most underappreciated aspect of the entire undertaking is what happens after the initial build. A data twin is not a project with a completion date. It is an operational environment that needs to be continuously maintained, monitored, and evolved. Source systems change. Clinical processes change. National reporting requirements change. New EPR implementations alter the data landscape significantly. Foundry itself receives platform updates that affect pipeline behaviour. The trust's analytical appetite grows as capability matures.

All of this means that the engineering, architectural, and data expertise deployed to build the data twin needs to remain engaged, not just to deliver an initial implementation and walk away. The distinction between the project resource needed to build and the managed service resource needed to operate is one that many procurement exercises fail to think through properly. It leads to situations where organisations invest significantly in an initial build only to find the environment degrades because no one is maintaining the pipelines or managing schema changes from source system upgrades.

A well-designed service management model for an FDP instance distinguishes clearly between three types of ongoing activity: operational management (keeping the platform running, monitoring pipelines, managing incidents), change management (handling the small but continuous flow of schema changes, new data sources, and business definition updates), and transformation (delivering new analytical capabilities and extending the data model into new domains). Each has a different cost profile, a different skill requirement, and a different engagement model. Conflating them under a single resource model is a recipe for either over-spend or under-delivery.

7. The National Opportunity: Why Pathfinder Trusts Are Building for Everyone

The pathfinder trusts currently leading FDP adoption in the acute sector are doing something with implications far beyond their own organisations. Because the national CDM is still being defined and refined, the architectural decisions and data modelling choices made at the vanguard of acute FDP implementation are effectively contributing to what will become the national standard.

NHS England is looking to early adopters to help define what an acute CDM should look like, which domains it should cover, how ontological relationships between clinical concepts should be structured, and what the reference architecture for a well-implemented acute trust instance actually is. Trusts and their technology partners doing this work now are building reusable assets that the rest of the NHS will eventually adopt. The investment is not just in their own capability. It is in the shared infrastructure of the entire health system.

This is also why the data twin concept, ambitious as it sounds, is the right frame for thinking about FDP implementation at the acute trust level. Implementing FDP to solve a narrow problem — to manage a waiting list or track a specific pathway — is valuable and achievable, but it misses the strategic opportunity. The trusts that will be most valuable to themselves and to the broader NHS are the ones that use FDP as the occasion to finally build the unified, semantically coherent, continuously maintained data environment that acute care has always needed and never quite managed to create.

How VE3 Supports FDP+ Enablement and Assurance

VE3 works with NHS trusts and ICBs on FDP+ enablement and assurance, bringing Palantir Foundry engineering capability alongside clinical informatics expertise to support the foundational data work that a genuine data twin requires. Our experience spans acute, ICB, and government data platforms, including work with the Department of Health and Social Care on the national MedTech PIM platform. We have built advanced analyst environments on Azure, developed ontological frameworks for NHS data domains, and supported trusts through the legacy-to-Foundry migration journey. If your organisation is navigating the transition to FDP and wants to understand what a well-architected acute data platform looks like in practice, we would welcome the conversation.

  • © 2026 VE3. All rights reserved.