The OR Efficiency Metrics That Actually Matter

A guide to the OR efficiency metrics that actually predict performance — FCOTS, turnover, utilization, cancellations, case-duration accuracy — and the vanity metrics to ignore.

ORbit Surgical·June 24, 2026·9 min read

If you track operating room performance, you are almost certainly measuring some things that matter and several that do not. OR efficiency metrics have proliferated faster than the discipline to use them well — most scorecards are crowded with numbers that look authoritative, move for reasons no one can explain, and change no one's behavior. This guide separates the perioperative KPIs that actually predict efficiency from the vanity metrics that just decorate a slide, and lays out how to keep the good ones honest enough that your surgeons will trust them.

The stakes are concrete. A 2018 JAMA Surgery cost analysis put the mean cost of operating room time at roughly $37.45 per minute in inpatient settings and $36.14 per minute in ambulatory settings — on the order of $2,200 an hour, covering staff, supplies, and overhead but not surgeon fees. At that rate, every metric you track should ultimately answer one question: where are we losing minutes we could get back?

~$37/min

Mean cost of inpatient operating room time (about $2,200 per hour), covering staff, supplies, and overhead but not surgeon professional fees.

Childers & Maggard-Gibbons, JAMA Surgery, 2018

Why most OR scorecards measure the wrong things

Two failure modes show up again and again.

The first is aggregation without action. A facility-wide utilization figure or an average start time is a fine headline and a terrible lever. It hides the distribution — the two surgeons, the one service line, the Tuesday block — where the actual problem lives. By the time a number is rolled up to the facility level, it has been smoothed into something you cannot pull on.

The second is gameability. A startling share of OR metrics can be improved without changing anything real, simply by editing the definition. That is the central warning of a 2020 British Journal of Surgery review by Charlesworth and Pandit, which argues that several routinely tracked theatre metrics — start time, utilization, cancellations, number of operations, gap time — are flawed and gameable as commonly used. Their alternative is a tighter definition of efficiency itself: completing the scheduled list within the allocated time, with no over-runs or under-runs. It is a useful north star precisely because you cannot fake it with a generous grace period.

On the evidence behind these metrics

Much of the OR-efficiency literature is quality-improvement (QI) work and cost modeling rather than randomized trials. It is genuinely useful for direction and order-of-magnitude, but specific figures are context-dependent. Throughout this guide we cite the source and its type so you can weight it accordingly.

The metrics that hold up

A short list does more than a long one. These five map cleanly to behaviors you can change, and together they cover the surgical day from its first minute to its last. The table summarizes them; the sections below add the detail.

Metric	What it measures	Leading or lagging	Most common way it's gamed
First case on-time starts	Share of first cases that begin on time	Both	Generous grace period; lenient start event
Turnover time	Gap between cases in the same room	Both	Quoting the average, hiding slow outliers
Block utilization	Share of allocated block actually used	Lagging	Mixing raw and adjusted; short windows
Cancellation rate	Cases cancelled on the day of surgery	Lagging	No reason codes; excluding categories
Case-duration accuracy	Scheduled vs. actual case time	Leading	Optimistic, round-number estimates

First case on-time starts (FCOTS)

The percentage of first cases that begin on time. It earns its place because the first case is the only delay with the whole day to compound, which makes it the cheapest minute to save. One multi-service-line Six Sigma project moved FCOTS from 49 percent to a 92 percent peak and sustained about 78 percent — the kind of durable swing that ripples into every downstream case. We go deep on definitions and benchmarks in the complete FCOTS guide.

Turnover time

The interval between one case finishing and the next beginning in the same room. It is worth tracking, but it is also the most over-blamed metric in the building, so set realistic targets and watch the distribution rather than the average — your slowest turnovers, not your typical one, are where the money is. A 2025 systematic review in Surgery mapping the changeable factors behind turnover time concluded that the biggest gains come from parallel processing, team coordination, and a "focused factory" approach — not from telling staff to move faster. See turnover benchmarks and how to reduce it.

Block utilization (raw and adjusted)

How much of allocated block time is actually used. The nuance is that raw and adjusted utilization tell different stories, and neither should be read alone or used in isolation to reallocate time. Foundational scheduling work by Dexter and colleagues showed that even the choice of scheduling algorithm can shift achievable utilization by a few percentage points — utilization is an input to a decision, not the decision itself. We unpack it in block time utilization explained.

Day-of-surgery cancellation rate

The share of scheduled cases cancelled on the day of surgery. Reported rates vary enormously across the literature — from low single digits in high-performing centers to far higher elsewhere — with high-income programs generally treating a rate under 5 percent as the target. The value here is less the headline rate than the reason codes underneath it, because most day-of cancellations fall into a handful of avoidable categories. See the hidden cost of cancellations.

Case-duration accuracy

How close scheduled case times are to actual ones. This is the quiet metric that makes every other scheduling decision better or worse: if your booked times are systematically wrong, your utilization and your start times inherit the error. Interestingly, Dexter and Traub found that perfect duration prediction reduces overtime by only a few minutes per OR versus well-used historical data — so the goal is not a crystal ball, it is good surgeon-and-procedure-specific history instead of optimistic guesses. More in why your scheduled case times are wrong.

The vanity metrics to ignore (or at least demote)

Just as important as the keepers is what to stop celebrating. A metric is a vanity metric when it looks impressive but changes no decision. Common offenders:

Total case volume, quoted alone. More cases is not more efficient; it can simply mean more rooms or longer days. Volume without utilization and duration context tells you nothing actionable.
Raw room utilization with no definition. Without knowing whether it counts turnover, or over what window, the percentage is unanchored — and easy to inflate.
A facility-average on-time rate propped up by a generous grace period. It can look great while specific surgeons and services run chronically late.
"Rooms running" counts. A busy-looking OR can still be leaking minutes to slow turnovers and unfilled blocks.

The test for any metric is simple: if this number moves, does someone know what to do differently? If not, it belongs in a footnote, not on the dashboard.

Rational vs. gameable metrics

If you take one idea from this guide, make it this: a metric you can flatter by changing its definition will, eventually, be flattered instead of fixed. Not maliciously — it is just the path of least resistance.

The clearest example is the grace period on on-time starts. A program that counts a case as "on time" if it starts within ten minutes of schedule will post a number that looks excellent next to one measuring incision time with no grace period at all. When one QI case study switched to the strict definition — incision, no grace period — its on-time rate dropped to about 74 percent, not because performance fell but because the measure stopped flattering it. That honesty is what made improvement possible.

The defense is simple to state and hard to maintain: pick a defensible definition, write it down, and apply it identically everywhere and every week. The threshold matters far less than the consistency.

Leading vs. lagging indicators

Most OR reporting is lagging — it tells you, accurately and too late, what last month looked like. Lagging metrics are essential for trend and accountability, but they cannot change the outcome of a day that is already over.

Leading indicators are the ones you can still act on: a first case trending late at 7:25, a turnover running long right now, a case whose pre-op readiness is incomplete an hour before wheels-in. The difference is who the metric is for. A monthly utilization report is for the administrator; a live signal that the 10:00 room is slipping is for the charge nurse who can still do something about it. A complete metric set has both, but the leading half is what turns measurement into a better day rather than a better post-mortem — which is the whole point of an OR day board.

Tie every metric back to dollars (and cases)

Metrics earn executive attention when they translate into money and capacity. The bridge is the per-minute cost of OR time: at roughly $37 a minute, a recurring ten-minute daily inefficiency in one room runs to six figures a year. Even more intuitive for a mixed audience is converting wasted minutes into lost-case scoring — the cases you could have done with the capacity you already paid for. A surgeon hears "two more cases a week"; a CFO hears the margin those cases carry; a charge nurse hears the minutes behind them. Anchoring your metric set to that translation is what gets it taken seriously beyond the OR.

Building a metric set your surgeons will trust

Metrics only change behavior if the people being measured believe them. Trust is built less by the choice of KPIs than by how they are handled:

Keep the list short. Five metrics that everyone understands beat fifteen no one trusts.
Fix the definitions and freeze them. Consistency over cleverness; a stable, slightly imperfect definition beats a "better" one that keeps changing.
Show the distribution, not just the average. Surgeon-level and service-line-level views are where action happens — and where credibility is won or lost.
Attribute fairly. Blame-neutral reason codes, applied evenly, do more for adoption than any benchmark.
Use medians, not means. A single marathon case or one disastrous morning will drag an average around and hand any skeptic a reason to dismiss the whole report. Medians are harder to distort and easier to defend.

That last point is a principle, not a preference. ORbit is built median-first for exactly this reason: every metric in this guide — FCOTS, turnover, utilization, cancellations, case-duration accuracy — is computed on medians, broken out by surgeon and service line, and shown both as a live signal for today and a trend over time. The aim is a single, honest view of the OR that the surgeon and the CFO can look at together without arguing about whose numbers are right.

If you want to see where your own minutes are leaking — and these same metrics are also what underpins the shift of high-value cases like outpatient total joints to the ASC — the fastest way is to see them on your own data.

Frequently asked questions

What are the most important OR efficiency metrics?

The metrics that hold up best are first case on-time starts (FCOTS), turnover time, block utilization (raw and adjusted), day-of-surgery cancellation rate, and case-duration accuracy (scheduled versus actual). Each maps to a specific, fixable operational behavior, unlike broad vanity numbers that move for reasons no one can act on.

What is a vanity metric in the OR?

A vanity metric is one that looks good on a dashboard but does not change a decision — either because it is too aggregated to act on, or because it can be improved by redefining it rather than by running the OR better. Raw room utilization quoted without context, or an on-time-start rate propped up by a generous grace period, are common examples.

Why do OR metrics need consistent definitions?

Because most of them can be flattered by changing the definition. A 2020 British Journal of Surgery review argued that several routinely tracked theatre metrics are flawed and gameable, and that efficiency is better defined as completing the scheduled list within the allocated time without over- or under-runs. A metric you can improve by editing its definition will eventually be improved that way instead of by real change.

What is the difference between a leading and a lagging OR metric?

A lagging metric tells you what already happened (last quarter's utilization); a leading metric gives you a chance to intervene before the day is lost (a first case trending late, a turnover running long right now). A good metric set has both, but real-time leading indicators are what let a charge nurse change the outcome of today.

How much does operating room time cost per minute?

A 2018 JAMA Surgery cost analysis of California hospitals estimated mean operating room cost at about $37.45 per minute in inpatient settings and $36.14 per minute in ambulatory settings — roughly $2,200 per hour — covering staff, supplies, and overhead but not surgeon professional fees. That per-minute cost is why small, repeated inefficiencies add up to real money.

How many OR metrics should a surgery center track?

Fewer than most do. A short set of five or six metrics that everyone understands and trusts beats a crowded dashboard of fifteen that no one acts on. The goal is a focused set that maps to fixable behaviors — FCOTS, turnover, utilization, cancellations, and case-duration accuracy — reported consistently and broken out by surgeon and service line.

Should OR metrics use mean or median?

Median. A single marathon case or one disastrous morning will drag a mean around and give any skeptic a reason to dismiss the whole report. Medians are far harder to distort, which makes them both more accurate and more defensible in front of surgeons — a platform-wide principle for trustworthy analytics.