When an AI agent gives a confidently wrong answer or takes the wrong action, the instinct is to blame the model. Usually, the model is not the problem. The data underneath it is.

Agents reason over whatever data they can reach. Feed them a clean, trustworthy foundation and a modest model performs reliably; feed them a messy one and the most capable model on the market will inherit every flaw and present it back to you with total confidence. The difference between a chatbot and an agent makes this acute - a chatbot that errs wastes a moment, an agent that errs acts on the error.

Here are the five data problems we see break enterprise agents most often. Each is common, each is fixable, and each is worth checking for before you scale.

1. Duplicate and unresolved records

What it looks like: The same customer, supplier or product exists as several slightly different records across your systems - "Acme Ltd", "ACME Limited", "Acme (UK)" - and nothing tells you they are the same thing.

Why it breaks agents: An agent treats each record as a separate entity. It double-counts, contradicts itself, and acts on the wrong version - updating one record while the truth lives in another. Ask it, "what's our total exposure to this client?" and it answers for one fragment, not the whole.

The fix: Resolve and match records to a single version of the truth, so the agent reasons about one entity rather than five ghosts of it. This is unglamorous, high-impact work and usually the fastest win - it is exactly what our MatchX platform is built for.

2. Inconsistent definitions

What it looks like: "Active customer" means one thing to sales, another to finance, and something else again in the data warehouse. There is no single, agreed definition of your core business terms.

Why it breaks agents: Two agents - or the same agent on two days - give different answers to the same question, because they are quietly using different definitions. The answers are precise and irreconcilable, which is worse than being obviously wrong, because no one notices until a decision has been made on the wrong basis.

The fix: Agree and capture the definitions of your core business concepts in a shared semantic layer, so every agent starts from the same understanding. This is the role Microsoft's Fabric IQ plays - but it only delivers if the definitions feeding it are agreed and trustworthy in the first place.

3. Stale and incomplete records

What it looks like: Records that are out of date, missing fields, or never finished - the contact who left two years ago, the order with no status, the half-populated profile.

Why it breaks agents: Until now, a human in the loop caught the obvious gaps - they knew the contact had left, so they ignored the stale entry. Agents remove that safety net. They take the data at face value and act on it, so an incomplete record becomes an incomplete action, and a stale one becomes a wrong action taken with confidence.

The fix: Establish a baseline of data quality - completeness, freshness, validation - and treat it as a prerequisite for any data an agent will touch, not a nice-to-have. We dig into this in data quality for AI agents.

4. Fragmented, siloed sources

What it looks like: The data an agent needs is scattered across systems that do not talk to each other, each with its own format, access model and quirks.

Why it breaks agents: An agent forced to stitch together five disconnected sources at the moment of a query is slow, brittle and prone to gaps - and it often simply cannot reach part of the picture, so it answers from the fragment it can see. Fragmentation also makes governance nearly impossible, because no one has a view of the whole.

The fix: Bring your data into a connected, accessible foundation - the thinking behind consolidating onto a platform like OneLake - so agents reason over one coherent estate rather than a patchwork. This does not mean copying everything into one place; it means an accessible, governed view across it.

5. Ungoverned, unclassified data

What it looks like: Nobody can say with confidence which of your data is sensitive, who should be able to see it, or where it came from. Classification and lineage are patchy or absent.

Why it breaks agents: An agent inherits the governance of the data it touches. If you do not know what is sensitive, you cannot stop an agent reaching it - so a well-meaning agent becomes a fast, tireless way to expose data it should never have seen. And without lineage, you cannot explain where an agent's answer came from, which is a problem the moment a regulator asks.

The fix: Classify your data by sensitivity, control access on a least-privilege basis, and maintain lineage - the governance half of an agent-ready foundation. This connects directly to agent governance: you cannot govern the agent without governing its data, and platforms like MatchX help bring that data under control.

The common thread

Notice what these five have in common: not one of them is a model problem. They are all foundation problems. That is the uncomfortable, liberating truth about agent reliability - the biggest gains usually come not from a better model but from better data underneath it. It is less exciting than choosing a model, and considerably more decisive.

It is also why the foundation pays compound interest. The work you do to resolve duplicates, agree definitions, raise quality, connect sources and govern data is not spent on one agent - every future agent stands on it.

Where to start

You do not need a six-month audit to know which of these five apply to you; you can usually feel it the moment you read the list. To turn that instinct into a plan, our Agent-Ready Data checklist scores your foundation across exactly these dimensions in an afternoon - so you find the problems before an agent does.

VE3 helps organisations build agent-ready data foundations - quality, matching, semantics and governance - so agentic AI is safe to scale. Start a readiness conversation.