Subsurface anomalies are variations in geophysical data that may point to something of interest beneath the surface. Detecting them is often the easy part; determining which anomalies represent real targets is where the challenge begins.

Ask any geophysicist who has spent time in the field what the biggest frustration is, and the answer is rarely the sensors themselves. Modern magnetometers are highly sensitive, remarkably stable, and more affordable than ever. UAV and AUV platforms have transformed data acquisition, making large-area surveys routine and efficient. In many cases, the data arrives clean, consistent, and ready for analysis.

The real challenge begins after the survey is complete.

Consider a 200-square-kilometre survey area where the system has identified 600 potential anomalies. At first glance, that may seem like a successful outcome. In reality, it creates a different problem.

A geophysicist must now determine which of those 600 anomalies represent genuine targets and which are simply geological background variations, sensor artefacts, processing effects, or threshold-triggered false alarms. In practice, a significant proportion—often the majority—will prove to be false positives.

The challenge is that every flagged anomaly carries a cost. Each one requires analysis, prioritisation, and, in many cases, field verification. That means additional mobilisation, equipment deployment, personnel time, and budget. As survey coverage expands and datasets grow larger, the bottleneck is no longer data acquisition—it is efficiently separating the few meaningful targets from the hundreds of distractions.

The industry has long looked to AI to reduce false positives and accelerate target detection. Yet progress has been slower than many anticipated.

The bottleneck is not the algorithms. It is the data - specifically, the lack of large, high-quality labelled datasets needed to train and validate AI models.

This post explores why that shortage exists, why the common workarounds fall short, and why physics-based synthetic data generation may be the most practical path forward.

The Labelled Data Problem Is Structural, Not Incidental

Labelled data is the fuel that powers supervised machine learning. A model is trained on thousands of examples—this is a target, this is background noise—and gradually learns to distinguish between the two. This approach has proven highly effective in domains where labels are readily available, such as medical imaging, manufacturing quality control, and fraud detection.

Geophysical surveying, however, presents a fundamentally different challenge.

Think about what labelling requires in a marine UXO survey. You need to know the exact location, depth, orientation, and material composition of each buried object - before the survey. If you already knew all of that, you would not be running the survey. The ground truth that makes supervised learning work is precisely the thing you are trying to discover.

On land, the situation is marginally better, but not by much. Labelled datasets do exist for certain subsurface detection applications, built painstakingly from excavation records, known burial sites, controlled test ranges, and verified ground-truth data. But these datasets are typically narrow in scope and rarely capture the full range of conditions encountered in real surveys.

A model trained on 155mm artillery shells in sandy loam at 0.5 metre depth does not reliably transfer to a different ordnance type, a different soil composition, or a different sensor altitude. The physics changes. The model does not know that.

Infrastructure inspection faces a similar challenge. You may know a pipeline exists within a survey corridor, but its exact condition, depth, and any anomalous sections are often uncertain. The survey exists to resolve that uncertainty. Yet supervised machine learning depends on labelled data, and reliable labels require the very certainty the survey is trying to provide.

This is not a data collection problem that more budget solves. It is structural. The nature of subsurface survey work means labelled training data will always be scarce relative to what supervised learning needs to perform reliably.

Three Workarounds and Why None of Them Is Enough

The industry has not been sitting still. Teams have developed practical responses to the labelled data problem. Each one is rational. Each one has a ceiling.

Manual interpretation by experienced operators

This remains the backbone of much operational survey analysis today. Geophysicists often work with Total Magnetic Intensity (TMI) data, examining anomaly maps and profiles for signatures that may indicate a subsurface target. A skilled interpreter can often make a reasonable judgement about whether an anomaly warrants further investigation. That expertise is real and should not be dismissed.

But it does not scale, and it is not consistent. Two interpreters looking at the same dataset will not always agree. Interpretation speed becomes a bottleneck on large surveys. And when that experienced operator leaves the organisation, the knowledge goes with them. You cannot version-control a person's intuition.

Rule-based thresholding

Set a signal amplitude threshold for TMI. Flag anything above it. Simple, fast, auditable.

Also brittle in a way that anyone who has worked with real survey data will recognise immediately. The right threshold shifts with sensor altitude, background geological variation, object depth, and object orientation relative to the Earth's magnetic field. Calibrate for one survey area and you are over-flagging in the next one, or worse, missing targets entirely. A threshold is a heuristic pretending to be a method.

Transfer learning from adjacent domains

Another common approach is transfer learning: taking a model trained on a different domain - such as industrial inspection, medical imaging, or non-destructive testing and adapting it to geophysical data. The idea is theoretically sound. In practice, however, the underlying signal characteristics and feature distributions are often too different for the model to transfer effectively. Significant fine-tuning is usually required, which in turn demands labelled geophysical data—bringing you straight back to the original problem.

These are not bad approaches. In many operational contexts they are the best available option, and they get the job done. But they leave a consistent gap between what AI-assisted detection could deliver and what it actually delivers on operational surveys. Closing that gap requires a different starting point.

The Physics Has Always Been There. We Just Were Not Using It Properly.

Here is the thing about magnetic anomaly detection that makes it different from most detection problems. The physics governing how a buried ferromagnetic object disturbs the Earth's ambient field is not mysterious. It is mathematically precise, well-documented, and computationally tractable.

A buried object with a magnetic moment produces a disturbance in the local field that follows the magnetic dipole equations. Given the object's moment vector, its depth, the sensor geometry, and the background field direction, you can calculate what the TMI signal will look like at the sensor. Not approximately - exactly, modulo measurement noise.

That means you do not need field excavations to generate labelled training data. You need a physics model and a parameter range.

Vary the object's magnetic moment across a realistic range. Vary the burial depth. Vary the sensor altitude. Vary the orientation. Add measurement noise. Run the calculation for each combination. What you get is a synthetic dataset that is physically accurate, fully labelled by construction, and as large as you need it to be.

This is not a shortcut. It is not approximation. It is a rigorous application of the same physics that geophysicists have used to interpret magnetic survey data for decades - applied upstream, at the data generation stage, rather than only at the interpretation stage.

The insight sounds simple once stated. The execution requires care - making sure the parameter ranges are realistic, the noise model is appropriate, the feature extraction preserves the physically meaningful signal. But the principle is sound, and the results from applying it are compelling.

What This Looks Like in Practice

Research from VE3 puts this approach to the test in a structured experiment. Synthetic multi-sensor magnetometry data is generated using a magnetic dipole model across a wide range of object configurations - varying moment strength and orientation, burial depth, sensor altitude, and Gaussian noise levels. The output is a multi-channel time-series dataset that looks, statistically, like real survey data.

Rather than feeding raw time-series into a model - expensive and noise-sensitive - the framework extracts a compact set of physically meaningful features: the maximum TMI value, the minimum, the mean, and the peak-to-peak amplitude across each sensor channel. These four features capture what matters about an anomaly response. They discard what does not.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) - a density-based clustering algorithm that does not require you to specify the number of clusters in advance - is applied to that feature space. No labels are provided. The algorithm partitions the data into coherent groups based on density alone. The result is a clean separation between anomaly and non-anomaly clusters.

What stands out in the results is not just that it works. It is that it works predictably. The research provides specific parameter guidance: for a 40-sample dataset, the maximum distance between two data points for them to be considered as neighbors (ε); values between 1.04 and 1.05 with Minimum neighbour points (MinPts) of 3 produce stable clusters with minimal noise points. As the dataset grows to 300 samples, the optimal epsilon drops to 0.6 and MinPts rises to 9. That kind of concrete, actionable guidance is what separates a research result from a research experiment.

Why This Actually Matters for Practitioners

A few things in this research deserve more attention than they typically get in academic writeups.

No labelled field data required - at any stage. The synthetic data generation produces the labels. The clustering finds the structure. An operational team can run this framework on a new survey environment without waiting to accumulate site-specific training data.

The features are physically interpretable. Maximum TMI, minimum TMI, peak-to-peak amplitude - every number means something to a geophysicist. When the model flags an anomaly, you can explain why in terms that domain experts already understand. That matters enormously for safety-critical clearance decisions, where a black-box result is not acceptable.

It scales. The parameter guidance works across dataset sizes from 20 to 300 samples with documented settings at each scale. This is not a proof of concept that breaks when you move to operational data volumes.

It is computationally lightweight. DBSCAN on a four-feature vector runs fast. This opens the door to onboard processing on UAV and AUV platforms - real-time flagging during the survey pass rather than post-processing back at the office.

To be direct about the limitations: this framework is validated on synthetic data. Real survey environments introduce complexities - geological heterogeneity, sensor drift, multi-object interference, platform motion artefacts - that controlled synthetic generation cannot fully replicate. The research is explicit about this. It is a foundation for preliminary anomaly identification, not a complete replacement for the full analysis pipeline.

But that is exactly the right framing. Preliminary identification is where the false positive problem is worst. If a physics-based unsupervised model can reduce the 600-anomaly list to 80 high-confidence targets before human interpretation begins, the impact on survey efficiency is substantial - and the risk of missing a genuine target has not increased.

The Labelled Data Problem Has a Credible Answer. Here It Is.

The labelled data problem in subsurface anomaly detection is not going away. But for the first time, it is not the blocker it used to be.

Physics has always told us what a magnetic anomaly looks like. We just needed to let it speak before the survey, not after.

Download the full paper and see exactly how that works in practice.