Most organisations move to Microsoft Fabric for the capability. The thing that catches them out is the bill. Unlike a fixed on-premises platform, Fabric charges for what you consume - every query, notebook run, pipeline and data movement draws on a shared pool of compute. Get the sizing and the controls right and Microsoft Fabric capacity is a predictable, attributable cost. Get them wrong and the first ninety days deliver an unpleasant surprise: consultants report that an expected spend can balloon to two or three times the estimate when no one is managing consumption - and, conversely, that deliberate optimisation routinely strips 25-45% off the bill.

That gap between the two outcomes is not luck. It is FinOps: treating cloud analytics spend as an ongoing engineering discipline rather than a one-off procurement decision. Here is how to size Fabric capacity properly and keep it under control.

How Fabric pricing actually works

Fabric is bought as a capacity, measured in Capacity Units (CU) and billed hourly through Azure. Capacities come as "F-SKUs" ranging from F2 up to F2048, and the maths is simple: each doubling of the SKU number doubles the available compute. An F128 has twice the horsepower of an F64.

Two points decision-makers often miss. First, there is a meaningful threshold at F64: at this size and above, report consumers no longer need individual Power BI licences, which can transform the economics for organisations with large numbers of viewers. Second, OneLake storage is billed separately from compute - so your capacity SKU and your storage footprint are two different cost lines to manage.

Step 1 - Size with evidence, not guesswork

Sizing is where most cost problems are created. Use Microsoft's official capacity estimator as a starting point, then size for your peak with sensible headroom - not your average. For many organisations the peak is a known event (a financial close, an admissions cycle, a regulatory deadline) where concurrency spikes. The aim is enough capacity to absorb that peak comfortably, plus a margin, without permanently paying for headroom you only need occasionally (Step 3 covers how to handle the peaks more cheaply).

The doubling ladder makes this easier than it sounds: you are choosing a tier, not fine-tuning a number, and you can resize later as real usage data arrives.

Step 2 - Choose your capacity architecture

A genuinely consequential decision: one large capacity, or several smaller ones?

One large capacity pools resources efficiently and is simpler to manage, and a larger reserved SKU is often cheaper per unit of compute than several small ones. The downside is "blast radius" - one capacity serving everything means one throttling event affects everyone, and it is hard to attribute cost back to individual departments.

Several smaller capacities give you isolation and clean cost attribution (each team or domain on its own), and you can pause non-critical ones independently. The trade-off is less pooling efficiency and more to manage.

The pragmatic answer for most enterprises is a hybrid: a single large capacity for production workloads to capture pooling efficiency, plus smaller, separate capacities for development and test that can be paused outside working hours. If departmental chargeback matters to you, factor that in early - it is far easier to design in than to retrofit.

Step 3 - Use the cost levers

Fabric gives you several ways to pay for peaks without paying for them all the time:

Pause and resume. For non-production capacities, pausing outside working hours simply stops the compute charge. Dev and test environments rarely need to run overnight or at weekends.

Scale up and down. For predictable surges, scale up before the peak and back down afterwards - this can be automated via the Fabric CLI, Azure Automation or the REST APIs so it happens without anyone remembering to.

Mix reserved and pay-as-you-go. A reserved capacity earns a discount for committed, baseline usage; pay-as-you-go covers the occasional spike on top. For a predictable Monday peak, for instance, running a reserved baseline and adding temporary pay-as-you-go capacity on those days can beat reserving the larger SKU permanently.

Autoscale Billing for Spark. For bursty data-engineering workloads, this scales Spark compute automatically and bills it separately at pay-as-you-go rates. As a rule of thumb, if your peak utilisation sits below around 60% of your SKU, or your Spark jobs are heavy at night and idle by day, autoscale is likely to save money.

Step 4 - Protect against runaway cost

Consumption models can spike unexpectedly, so build in guardrails. Surge protection lets you cap background activity (refreshes, AI jobs) so it cannot starve user-facing reports during busy periods, using configurable background rejection and recovery thresholds. Newer capacity overage protection features (in preview in early 2026) go further in preventing runaway costs from unexpected spikes.

One important caveat: surge protection is a safety net, not a substitute for correct sizing. It protects the user experience by deferring or rejecting background work - which can mean reruns - so genuinely critical workloads still need capacity provisioned for them, not just protection around them.

Step 5 - Monitor and attribute relentlessly

You cannot control what you cannot see. The Fabric Capacity Metrics app is the single most important tool here: it shows real-time and historical CU consumption against your SKU, when and how throttling was applied, overage carryforward and burndown, OneLake storage by workspace, and - crucially - an item-level breakdown so you can identify which semantic models, notebooks or pipelines are the heaviest consumers.

Use it to set simple operating rules: sustained utilisation above ~80% is a signal to scale up or enable autoscale; frequent throttling means a workload needs optimising or isolating; and month-on-month storage growth needs watching before it becomes a runaway line. If you have organised workspaces by domain, this is also where per-domain cost attribution and chargeback come from - turning an opaque capacity bill into a number each business area owns.

Step 6 - Design to consume less in the first place

The cheapest CU is the one you never spend. Architecture choices have a direct cost effect:

Direct Lake and Mirroring avoid the compute cost of repeated import-and-refresh cycles, so your CUs only work where they add value.

A well-designed medallion structure - doing heavy transformation once in Silver and Gold rather than repeatedly at query time - keeps consumption down and reports fast.

Storage lifecycle policies that archive cold data to lower-cost tiers stop OneLake storage quietly inflating over time. (Note that mirroring storage, free up to a SKU-based limit, becomes billable if you manually pause the capacity - worth knowing before you pause.)

The operating model: FinOps as a culture, not a project

The tools only work if someone owns them. The organisations that keep Fabric costs predictable treat consumption as an ongoing responsibility: a named owner, a monthly review of the Capacity Metrics app, cost attributed to the teams that generate it, and continuous optimisation built into platform operations. Done this way, Fabric's consumption model stops feeling unpredictable and becomes a lever - you can see exactly what analytics costs and align that spend with the value it returns.

Worried about cloud analytics costs? Ask us for a Fabric capacity and FinOps review to size correctly and design the cost out from the start.