Digital Transformation

Legacy Data Warehouse to Foundry: The Migration Decisions Nobody Tells You About

Pamela Sengupta
March 30, 2026

Almost every NHS trust that is now navigating its path to the Federated Data Platform started from the same place: a SQL Server data warehouse grown over the better part of a decade, a cluster of SSIS packages handling the overnight ETL runs, a set of SSRS reports that half the organisation depends on and nobody wants to rebuild, and a Business Objects or Power BI layer sitting on top of it all that has accumulated somewhere between fifty and five hundred reports over the years, each one slightly different from the last, each one owned by a different directorate with strong opinions about how their numbers should look.

This is not a failure state. It is the entirely predictable result of NHS trusts building analytical capability incrementally, with limited resources, under constant operational pressure, making pragmatic decisions at each stage that made sense at the time. The SQL Server data warehouse was a reasonable choice in 2012. SSIS was the right tool for the job when it was adopted. Business Objects served its purpose for years before Power BI arrived. The problem is not that these decisions were wrong. The problem is that they now need to give way to Palantir Foundry, and nobody is giving trusts an honest account of what that transition actually involves.

The standard narrative around FDP migration tends to emphasise the destination rather than the journey. It talks about unified data, near-real-time visibility, AI-ready analytics, and a platform that will serve the trust for the next decade. All of this is achievable. But between where most trusts are today and where they need to get to lies a migration exercise that is more complex, more time-consuming, and more dependent on specific types of expertise than the programme documentation tends to acknowledge. This article is an attempt to be honest about the decisions that actually determine whether a legacy-to-Foundry migration succeeds, and what makes them hard.

1. Understanding What You Are Actually Migrating

The first and most important mistake a trust can make is treating this as a data migration. It is not. Migrating data from one system to another — extracting records from a SQL Server database and loading them into Foundry — is the easy part. Any competent data engineer can do it. What is genuinely difficult, and what most migration plans underestimate, is migrating the business logic that gives the data its meaning.

In a mature NHS data warehouse, the data itself is almost incidental. The real intellectual property of the environment sits in the transformation layer — in the stored procedures that calculate whether a patient is on the right RTT pathway, in the SSIS packages that reconcile admission records from the PAS against theatre bookings from the theatre management system and flag discrepancies, in the SSRS reports that apply the specific rounding rules that finance agreed with commissioners three years ago, in the Business Objects universes that define what a completed outpatient episode means for the purposes of the Payment by Results calculation. This logic was never written down in a specification document. It accumulated over the years, each piece added by someone who understood the clinical or operational context at the time, often by someone who has since left the organisation.

This is what makes the discovery phase of a Foundry migration so much more demanding than teams typically plan for. Before a single pipeline is built in Foundry, the migration team needs to understand what every significant transformation in the legacy environment actually does, why it does it, whether the original reason still applies, and whether there are alternative or conflicting definitions of the same concept in other parts of the environment. A large trust might have fifteen or twenty different definitions of a delayed discharge spread across its data warehouse, each one correct for the particular reporting context in which it was originally created, none of them appropriate as the single canonical definition for an enterprise data platform. Resolving this is not a technical problem. It is a governance problem that requires clinical and operational stakeholders to make decisions they may have been avoiding for years.

The real intellectual property of a legacy NHS data warehouse is not the data. It is the transformation logic — the stored procedures, SSIS packages, and report definitions that encode a decade of institutional knowledge about what the data means.

The Inventory Problem

One of the most common early shocks in a legacy-to-Foundry migration is discovering how large and how opaque the existing environment actually is. Trusts that believe they have a manageable data warehouse often find, on proper inspection, that the stored procedure count runs into the hundreds, that the SSIS package catalogue contains jobs that nobody has reviewed in years but which are still running on a nightly schedule, that the reporting layer has grown to include reports that reference tables which no longer exist in the live database but which have been kept alive by a shadow ETL process that one person set up years ago and which only that person understands.

Before migration can begin in earnest, the trust needs a complete inventory of what exists: every source system connection, every ETL package, every stored procedure, every report, every scheduled job, every database table, and the dependencies among them. This inventory is not just an administrative exercise. It is the foundation on which every subsequent decision rests. Trusts that skip it or rush it discover the gaps at the worst possible moment — when a pipeline they thought they had migrated correctly is producing results that differ from the legacy system, and nobody can explain why, because nobody documented what the legacy system was actually doing.

2. The ETL Archaeology Challenge

The phrase data archaeology is not hyperbole. In the context of a legacy NHS data warehouse with a decade of accumulated ETL logic, excavating the meaning of the transformation layer is genuinely similar to archaeological work: you are recovering the intentions of people who are no longer present, using artefacts that were never designed to explain themselves, in a context where the surrounding evidence has often been partially overwritten by subsequent changes.

NHS trusts have typically built their data warehouses using the SQL Server stack — SQL Server as the database engine, SSIS for data integration, SSAS for analytical cubes, and SSRS or Power BI for reporting. This is a coherent, capable stack that has served the NHS well, and it remains actively supported by Microsoft, with SQL Server 2025 continuing to ship updated SSIS components. But the way the stack has typically been used in NHS environments has created specific archaeological challenges that a Foundry migration team needs to be prepared for.

Stored Procedures as Undocumented Specifications

The stored procedure is where much of the critical business logic in NHS data warehouses lives, and it is also where the most significant knowledge loss occurs over time. A stored procedure that calculates an RTT clock position, for example, might reference eight or nine tables, apply a sequence of conditional logic that handles a dozen different edge cases, and produce a result that is critically dependent on the order in which operations are applied. When the person who wrote it leaves the organisation, the stored procedure becomes a black box. It produces the right answer, or at least an answer that nobody has yet been able to disprove, but the reasoning behind it is inaccessible.

The challenge in a Foundry migration is not to copy these stored procedures into Foundry's transformation layer. Foundry uses a fundamentally different execution model — its Pipeline Builder application, built on Spark and Flink, operates on distributed compute rather than the set-based operations of a relational database engine. Stored procedure logic cannot simply be lifted and shifted. It needs to be understood and reimplemented in Foundry's transformation paradigm, which requires someone to work through each procedure, understand what it does, validate that understanding against the expected outputs, and then reproduce that logic in Foundry's pipeline model. For a warehouse with hundreds of stored procedures, this is a substantial programme of work that requires people who understand both the SQL Server environment and the Foundry platform — a combination of skills that is genuinely rare in the NHS market.

SSIS Packages and the Fragility of Visual ETL

SSIS packages present a different but equally significant challenge. SSIS is a visual ETL tool — its logic is expressed through a drag-and-drop interface of data flow tasks, conditional splits, lookup transformations, and script components, stored in XML package files that are, in principle, readable but extremely difficult to interpret without the SSIS designer. This means that the business logic encoded in an SSIS package is even harder to extract than that in a stored procedure, because it was never expressed in a queryable form to begin with. The only way to understand what an SSIS package does is to open it, run it against test data, and trace the outputs.

SSIS packages in NHS environments also tend to accumulate dependencies that are not visible in the package itself. A package might rely on a staging table created by a different package, a reference data table updated by a manual process, a linked server connection to an external system, or a Windows scheduled task that must run first to prepare an input file. These dependencies are rarely documented. They are discovered by running the package and watching what fails. In a migration context, this means that every SSIS package needs to be treated as a potential dependency graph that must be traced before any part of it can be safely retired.

The Clinical Coding Inconsistency Layer

Underneath the ETL complexity lies a more fundamental data quality challenge specific to NHS environments and with no direct equivalent in most other sectors: the inconsistency of clinical coding across source systems and over time. An NHS trust's data warehouse typically draws from a patient administration system, an EPR, a theatre management system, a radiology information system, a pathology system, and several others, each of which encodes clinical events using a mixture of national coding standards and local conventions that have evolved independently.

ICD-10 diagnosis codes, OPCS-4 procedure codes, speciality codes, ward and location identifiers, consultant codes, and referral source codes are used inconsistently across these systems in ways that matter enormously for analytical outputs. A diagnosis code that means one thing in the PAS may be applied with subtly different criteria in the EPR. A speciality code that was used consistently for years may have been partially migrated to a new coding scheme following a system upgrade, leaving the warehouse with a mixture of old and new codes that require a lookup table to harmonise — a lookup table that someone created at the time and that may or may not still be current. NHS England's own Data Quality Maturity Index tracks these issues at a national level, but resolving coding inconsistencies is always a local problem that requires local knowledge.

The reason this matters for a Foundry migration is that Foundry's analytical capabilities — its ontology layer, its object model, its AI-ready pipeline architecture — all depend on the data being semantically consistent before it reaches the semantic layer. If diagnosis codes are inconsistent in the raw layer, the ontological model built on top will encode that inconsistency, and every downstream analytical and AI use case will inherit it. The migration is therefore an opportunity — and, more precisely, an obligation — to clean up the coding inconsistencies that have been accumulating in the legacy environment, rather than simply carrying them forward into a more expensive platform.

3. The Ingestion Architecture Decisions

Once the legacy environment has been properly understood, the migration team faces a series of architectural decisions about how data will flow from source systems into Foundry. These decisions appear technical on the surface, but they have significant long-term implications for the trust's analytical capability and are made under time pressure with incomplete information. Getting them right requires both engineering expertise and an understanding of the analytical use cases the platform needs to support.

Batch, Micro-batch, or Streaming: Choosing the Right Ingestion Pattern

The most fundamental decision in ingestion architecture is choosing between batch, micro-batch, and streaming processing. Foundry's Pipeline Builder supports all three patterns, and each has different implications for data latency, compute cost, and engineering complexity.

Legacy NHS data warehouses are almost universally built on overnight batch processing. The SSIS jobs run after the working day, extract data from source systems, apply transformations, and load the results into the warehouse ready for the next morning's reporting. This means that operational reports in the morning reflect last night's data, not this morning's clinical activity. For most traditional reporting purposes, this has been acceptable. For the use cases that FDP is designed to enable — real-time bed management, near-real-time RTT pathway monitoring, intraday theatre utilisation tracking — overnight batch processing is a fundamental limitation.

The question for migration teams is not simply whether to move to real-time data, but which data elements genuinely need to be near-real-time and which can continue on a batch cadence without sacrificing the analytical value the platform is meant to deliver. Making everything real-time is technically achievable, but it's expensive in terms of computing and operationally complex to maintain. Making nothing in real time defeats the purpose of moving to Foundry. The right answer is a tiered approach: high-priority operational datasets on a near-real-time or micro-batch schedule; administrative and reference data on a daily or weekly batch schedule; and historical analytical datasets on a snapshot basis. Designing this tiering correctly requires a conversation with operational and analytical users about which decisions they actually need to make in real time, which is a conversation that is rarely had with sufficient rigour before migration begins.

Incremental Ingestion and the Change Data Capture Problem

Even for data ingested on a batch schedule, the choice between snapshot and incremental ingestion has major long-term consequences. Snapshot ingestion which periodically copies entire tables from source systems is simple to implement and easy to understand, but it creates several problems at scale. For large tables in NHS source systems, such as the patient encounter tables in an EPR that may contain tens of millions of rows, snapshot ingestion is prohibitively slow and expensive. It makes it impossible to build an accurate change history because snapshots only capture the current state of a record, not the sequence of changes that led to it. And it generates large volumes of redundant data on every cycle, increasing storage costs and pipeline execution times.

Incremental ingestion, which captures only the records that have changed since the last extraction, is more efficient but requires source systems to support change detection. In NHS environments, this is highly variable. Modern EPR systems typically provide change data capture mechanisms, either via database-level CDC or API-based event streams. Legacy patient administration systems, laboratory systems, and theatre management systems may have no reliable mechanism for identifying which records have changed since a given timestamp, forcing the migration team to implement workaround approaches comparing hash values across full table scans, relying on update timestamp fields that may not be reliably populated, or accepting that some changes will be missed and building reconciliation processes to catch them.

The documentation for these source system limitations is almost never complete. The migration team discovers them through testing, which means the ingestion architecture must be designed with sufficient flexibility to accommodate source systems whose change-detection capabilities differ from expectations. Trusts that commit to a pure incremental architecture before understanding their source systems' limitations will find themselves building compensating mechanisms under pressure.

Schema Validation and the Data Contract

One of the most important and consistently underestimated aspects of a Foundry migration is establishing a formal data contract between the source systems and the ingestion pipeline. A data contract is a specification of what the ingestion pipeline expects to receive from a source system: the schema of each dataset, the data types of each field, acceptable value ranges, referential integrity relationships between tables, and expected volumes and update frequencies.

In a legacy NHS data warehouse, this contract was often implicit and informal. The SSIS packages were written to handle whatever the source system sent, accommodating schema variations and data quality issues through a combination of explicit transformation logic and quiet failure modes records that failed validation were silently dropped or written to an error table that nobody reviewed. Over time, the accumulation of these quiet failures creates a data environment where nobody is entirely sure which records have been excluded from reporting and why.

Foundry's architecture creates an opportunity to make this contract explicit and enforced. Pipeline Builder supports schema validation as a first-class feature, and the platform's data lineage capabilities trace every record from source to output, enabling you to understand exactly what has been included and excluded at every stage of the pipeline. But enforcing a strict data contract in a Foundry migration requires the migration team to deliberately choose to do so, document the contract formally, implement validation rules in the pipeline, and establish a process for handling the inevitable validation failures. This is more work upfront than the loose, implicit contract of the legacy environment. The organisations that choose to do it correctly discover, often painfully, how much data quality debt they were carrying in their legacy environment without knowing it.

4. The Parallel Running Problem

No serious data migration programme decommissions the legacy environment the day the new platform goes live. There will be a period typically measured in months rather than weeks — during which both the legacy data warehouse and the Foundry environment are running simultaneously, producing analytical outputs that may or may not agree with each other. Managing this parallel-running period is one of the most practically demanding aspects of a Foundry migration and one of the least discussed.

The core challenge is that when the outputs of the legacy and Foundry environments disagree and they will, because migrating complex business logic always introduces discrepancies that must be investigated and resolved someone needs to adjudicate. This means having enough understanding of the business logic in both environments to identify whether the discrepancy reflects an error in the migration, an improvement in the new environment, or a data quality issue in the source systems that the legacy environment was quietly obscuring. It requires clinical informatics expertise and operational knowledge that is genuinely hard to find, and it requires organisational willingness to make definitive decisions about which version of a disputed number is correct.

Report Migration: The Iceberg Below the Waterline

Most legacy NHS data warehouse environments contain far more reports than anyone realises. The inventory exercise described earlier will typically uncover hundreds of reports across SSRS, Business Objects, and Power BI, many of which have not been accessed in years but which cannot be decommissioned because nobody is certain whether they are still needed. These reports represent a significant migration challenge that is often underestimated in project plans.

Not all of these reports need to be migrated to Foundry. Some will genuinely be obsolete. Some will be superseded by new analytical capabilities in Foundry that deliver the same information better. But the process of identifying which reports are genuinely obsolete, which need to be rebuilt in Foundry's analytical tools, and which can be replaced by the standard Foundry products requires a review of each report against current business requirements a process that is slow, requires engagement from operational stakeholders who have limited time, and often surfaces disagreements about what the numbers should mean that were buried in the legacy environment.

The particular challenge with SSRS-to-Foundry migration is that SSRS and Foundry's analytical tools, Contour and Quiver, have fundamentally different presentation models. SSRS produces paginated, pixel-perfect reports that are optimised for printing and regulatory submission. Foundry's tools produce interactive, exploratory visualisations that are optimised for analysis and decision support. Many SSRS reports in NHS environments are submission reports — they exist to produce a specific output in a specific format for a specific national return. These cannot simply be replaced by a Contour dashboard. They need to continue functioning, which means the migration plan must account for ongoing SSRS or equivalent capability alongside the Foundry environment, rather than treating it as a clean cutover.

5. The People Problem: What Skills This Work Actually Requires

Every conversation about NHS data platform migration eventually arrives at the same point: the skills required to do this work well are not the skills that most NHS informatics teams have, and they are not the skills that most technology vendors bring either. The combination of capabilities needed is specific and genuinely uncommon.

The migration team needs people who understand the NHS clinical and operational domain deeply enough to make authoritative decisions about what business logic means and how it should be translated. They need people who understand the SQL Server stack stored procedures, SSIS packages, SSRS reports, SSAS cubes well enough to read the legacy environment accurately and understand what each component does. They need people who understand Palantir Foundry's architecture, pipeline model, and ontological framework well enough to design the target environment correctly. And they need people who can hold conversations with clinical leads, directorate managers, information governance teams, and senior operational staff to resolve the definitional disputes and governance questions that the migration inevitably surfaces.

These are four different skill profiles. Most NHS trusts have some capability in the first two and very little in the third and fourth. Most Foundry implementation partners have capability in the third and limited depth in the first and second. The organisations that navigate this migration most successfully are those that bring together NHS clinical informatics expertise and Foundry engineering capability in the same team, working collaboratively rather than in separate workstreams that hand off to each other at defined interfaces.

The Knowledge Transfer Imperative

There is also a knowledge transfer dimension to the migration that is frequently neglected. When the Foundry environment goes live and the legacy data warehouse is eventually decommissioned, the organisation needs to retain the institutional knowledge that was embedded in the legacy environment — not in the form of archived SSIS packages that nobody can run, but in the form of documented business logic, validated data definitions, and operational understanding of how the data flows and what each transformation does. This documentation needs to be created during the migration, not after it, because the migration is the only time when both the people who understand the legacy environment and the people who are building the new one are working on the same problem simultaneously.

Trusts that treat the migration as a pure engineering exercise and defer the documentation work tend to find that they arrive at a Foundry environment that works, but that is as opaque as the legacy environment it replaced — a new set of Pipeline Builder graphs that produce the right outputs, but whose underlying logic is understood by the implementation team and nobody else. The moment the implementation team moves on, the organisation has the same knowledge fragility it started with, just on a more expensive platform.

6. The Coexistence Phase and What Comes After

The final set of migration decisions that nobody talks about honestly concerns what happens after the initial migration is complete. The NHS's FDP implementation timeline expects trusts to onboard to the platform, connect their source systems, and begin using the core FDP products. But connecting source systems and ingesting data is only the beginning of what a trust needs to do to realise the analytical and AI capabilities that the platform is designed to enable. The journey from initial onboarding to a fully functional, AI-ready analytical environment is measured in years, not months.

During the coexistence phase — when the Foundry environment is live but the legacy data warehouse is still running — the trust needs to be clear about which analytical outputs are authoritative. This is not a technical question. It is a governance question. The trust needs to decide, for each significant analytical domain, at what point the Foundry output becomes the organisation's official number and the legacy output is retired. These decisions need to be made by the people who are accountable for those numbers — finance directors, clinical directors, information governance leads — not by the migration team. Getting these stakeholders engaged in the migration governance at the right moment is one of the most consistently underestimated project management challenges of the whole exercise.

Managed Service as a Migration Outcome, Not an Afterthought

The question of who maintains the Foundry environment after the migration team hands over is one that should be answered before the migration begins, not after. A Foundry environment for an acute trust is not self-maintaining. Pipelines need to be monitored. Schema changes in source systems need to be detected and handled. New data sources need to be onboarded. Business definitions need to be updated as clinical and operational processes change. The data model needs to be extended as new analytical use cases are identified.

All of this requires a sustained, skilled team — a combination of Foundry engineers who can maintain and extend the pipeline layer, data architects who can evolve the CDM and ontological model, clinical informatics specialists who can adjudicate definitional questions, and service management professionals who can operate the environment to agreed quality standards. This team does not need to be large, but it needs to be stable, well-defined, and adequately funded. Trusts that treat the migration as the expensive part and the ongoing management as the cheap part consistently underestimate what good ongoing management actually costs and overestimate how much the platform can maintain itself.

 

The legacy-to-Foundry migration is not a technology project. It is a knowledge management project, a governance project, a clinical informatics project, and an organisational change project, all of which happen to require significant technology expertise. Trusts that understand this framing from the outset — and that bring together the right mix of clinical domain knowledge, NHS data engineering experience, and Foundry technical capability in a single, integrated delivery team — are the ones that emerge from the migration with a Foundry environment that genuinely delivers on the platform's analytical potential. Trusts that treat it as a lift-and-shift exercise with a new interface on top find that they have spent considerably more than planned to arrive at an environment that is not significantly better than the one they left.

VE3 works with NHS trusts and ICBs on FDP+ enablement and assurance, including the legacy data migration work that determines whether a Foundry implementation delivers lasting value. Our teams combine NHS clinical informatics expertise, deep SQL Server data warehouse experience, and Palantir Foundry engineering capability — bringing together the skills that a migration of this complexity genuinely requires. If your organisation is planning or has already begun a legacy-to-Foundry migration and wants a frank conversation about the decisions that will determine its success, we would welcome the dialogue.

 

ve3.global  |  info@ve3.global  |  FDP+ Enablement & Assurance Practice

  • © 2026 VE3. All rights reserved.