Measuring What Actually Matters
Why "measure outcomes, not effort" is true, useless on its own, and what to do instead.
Almost every operator has heard the advice to measure outcomes, not effort. Don’t track work done; track results achieved. It’s true, and it’s nearly useless.
It’s useless because most of the work people do every day looks like effort, not outcome. Calls made. Code committed. Features shipped. Meetings held. Documents drafted. If you only measure the outcome and pretend the work doesn’t exist, you can’t tell whether your team is on track until the quarter ends and the number is already in. By then it’s too late to do anything about it.
The honest version of the advice is more useful: measure outcomes primarily, but build the rest of your dashboard to explain why the outcome is or isn’t happening.
The framework we use breaks every metric a team owns into three buckets: Targets, Guards, and Diagnostics. Each does a different job. Get the split right, and you have a real operating system. Get it wrong, and you end up with either a vanity dashboard or a guilt trip
.
Targets: the outcome you’re actually trying to produce
A target is the metric that shows you won. It has to be objectively measurable. It has to be the thing itself, not a proxy for the thing.
A sales team’s target is not calls made, or even “qualified meetings held with prospects who showed up.” It’s dollars and deals closed. A growth team’s target is not “campaigns shipped.” It’s “new users retained at 30 days.” An engineering velocity team’s target is not “commits per week.” It’s “features shipped that either turned up because they worked or turned off because they didn’t, with the learning captured.” The distinction sounds pedantic until you watch teams hit their proxy metric for four quarters in a row while the actual outcome flatlines.
The rule for choosing a target: if a robot could maximize this number without producing the result your business needs, it’s the wrong target. Calls made can be maximized by dialing nonsense. Qualified meetings can basically be bought. Dollars closed is the real thing.
Targets should be small in number. Two or three per team is usually right. The instinct to add more is almost always wrong. Every additional target dilutes focus on the ones that matter most.
Guards: the boundaries that keep the target from becoming a loophole
The reason most outcome metrics fail isn’t that they’re wrong. It’s that they create incentives to game them.
If your target is four qualified meetings (for a BDR team), a rep might book fourteen meetings of which only four are qualified, hit the number, and call it a week. The target was met. The system was abused. That’s where guards come in.
A guard is a boundary you’re not allowed to cross. “Qualified meetings must be at least 60% of all meetings booked.” “No more than two no-shows per week.” “Customer churn below 5%.” “Bug count not rising quarter over quarter.”
Guards protect quality, efficiency, and the integrity of the target. They’re the answer to the question: “What would have to be true alongside this target for it to actually mean we’re winning?”
Most teams skip guards entirely and then wonder why hitting the target didn’t feel like winning.
Diagnostics: the activity metrics, in their proper place
This is where most of the work-done numbers belong. Calls made. Hours logged. Outreach sent. Lines of code committed. Cycle time. Lead response time.
Diagnostics aren’t goals. You don’t hit them or miss them. Their job is to tell you why the target is or isn’t moving. If qualified meetings are down this week, diagnostics tell you whether the problem is too few calls (an activity problem) or the same calls converting at a lower rate (a quality or targeting problem). One requires more effort. The other requires better aim. The diagnostic is what tells you which.
You can have as many diagnostics as you want. They cost almost nothing to track and they make the targets legible. The mistake to avoid is letting any diagnostic creep into your performance review and become a target by accident. Once “calls per week” becomes something the team is evaluated on, it stops being a diagnostic and starts being a number that gets gamed.
The trickier cases: roles where the outcome is hard to name
Sales is easy. Dollars closed is the target. Marketing is usually easy. Pipeline generated, attributed to specific channels.
The harder cases are the ones where the output isn’t directly tied to revenue. Engineering. HR. Operations. C-suite roles like CTO or COO.
The move for these roles is the same; it just requires more thought. Find the outcome that, if achieved, would mean the function is doing its job. For a CTO, that might be “ship the three roadmap features in the quarter, with uptime above 99.9% and bug count not increasing.” Target: three features shipped. Guards: uptime threshold, bug count ceiling. Diagnostics: average time to close a bug, deploy frequency, on-call alert volume.
For an HR function, the target might be “key roles filled within X weeks of opening, with first-year retention above Y%.” For ops, “monthly close completed by the fifth business day with no material errors.” None of these are revenue numbers. All of them are outcome numbers, in the sense that achieving them means the function did what it was supposed to do.
The trick is to stop looking for the exact equivalent of a sales quota and start looking for the result the function exists to produce.
One last principle: Keep the list short
Cascading goals down to every level of the organization sounds great in theory. In small companies, it’s usually a mistake.
A small company runs on focus. Two or three top-level targets that everyone is aware of and working within will outperform a system where every team has its own pyramid of objectives rolling up to the company OKRs.
Cascading becomes useful once you’re large enough that teams can actually operate independently on different problems. Until then, a few focused goals, with everyone pulling in the same direction, beats a complete OKR architecture every time.
The bigger principle underneath all of this is that measurement is itself a design problem. Every metric you choose is a behavior you’re incentivizing. Every metric you don’t choose is a behavior you’re either tolerating or trusting to take care of itself.
Targets define the win. Guards keep the win clean. Diagnostics explain the score. Get those three buckets right, and you’ll finally be doing more than just looking at numbers.






