// Playbook 02 — Design

Strategic planning · setting OKRs as bets.

A six-week playbook for the work between diagnosis and delivery. Turn the maturity baseline into three OKRs the sponsor can defend and the sceptic can falsify. Calibrated confidence, backcasted key results, a premortem on the slate. Borrowed openly from Annie Duke's Thinking in Bets. Worked example throughout: Halcyon Financial's annual OKR slate.

6wks From diagnosis
· to commitment
3 OKRs
· not eight, not twelve
4 Phases
· translate, calibrate, test, commit
3/5 Target confidence
· not 5, not 1
01 —

Meet Halcyon Financial.

Worked example · the OKR slate that follows the diagnosis

A diagnosis in hand. A slate still to commit.

Halcyon Financial has just finished Playbook 01. The brief landed. The CEO summarised it back to the room in her own words: "we're better on strategy and capability than I thought; we're materially worse on governance and people-impact than I thought." The two priority dimensions are named. The cohort agrees.

What hasn't happened is the work of turning that diagnosis into three OKRs the executive can fund, the operators can deliver, and the sceptic can falsify. The next instinct is to write ten OKRs and call it ambition. The discipline of this playbook is the opposite move.

Three OKRs. Each scored 3/5 on confidence — not 5, not 1. Each with key results that can be measured by someone who doesn't already believe in them. Each pressure-tested against a premortem before the slate is committed.

By the end of the six weeks, Halcyon has an OKR slate, a quarterly review cadence with truth-seeking norms named, and a year-end retrospective discipline already on the calendar. OKRs are bets. They get set as bets, reviewed as bets, and — at year end — judged as bets.

The worked example. This playbook follows Halcyon's annual OKR slate from diagnosis-translation to commitment. Three OKRs target Governance & Risk, People Impact, and Capability & Fluency. The KR that gets reworked most — KR1 of the Governance OKR, on shadow AI usage — is the recurring illustration of how measurement honesty earns its keep.
3 OKRs committed
· three per twelve months
7 Candidates drafted
· four cut, three kept
3/5 Confidence target
· stretch, not fantasy
1 Premortem
· before commitment

// And the people you'll meet · the room that sets the bets

// Cast 01

The Transformation Lead

You, the reader

Holds the pen on the OKR slate. Translates the diagnosis into candidates. Runs the calibration and the premortem. Does not own the OKRs themselves — the executive sponsor does.

// Cast 02

Diana Whitfield

CEO · the sponsor

Owns the OKR slate at year-end. The person who has to defend the bets when they're set, and judge the reasoning honestly when the results land — good or bad.

// Cast 03

Sam Patel

CTO · the sceptic

The room's falsifier. His job is to argue the OKRs won't land — and to name precisely why. If he can't find a way to falsify them, they're too soft. If he can find five, they're too hard.

// Cast 04

Helen Bautista

COO · the operator

Reality check on what can actually be delivered alongside the day-to-day. Holds the line on whether the slate is achievable with the people and time available.

// Cast 05

Daniel Okafor

Head of P&C · the people lens

Owns the People Impact OKR. The voice that protects the staff-side of every KR — making sure the slate measures what changes for them, not just what changes around them.

// Cast 06

Priya Nair

Head of CX · the voice from the desk

Sits in this room of executives, but brings the desk-level reality with her. The reason the OKRs reflect what Maya's role actually looks like — even though Maya isn't in the room. The OKRs are about the people doing the work; the people doing the work are not setting them.

// Cast 07

External OKR practitioner

Advisor · brings the discipline

One outside voice who has set, missed, and re-set OKRs before. Cheap insurance against the room talking itself into something it would later regret.

02 —

Four phases. Translate, calibrate, pressure-test, commit.

The framework

// The framework is deliberately small. Most OKR exercises fail not because they pick the wrong targets — they fail because they treat OKRs as promises rather than bets. The discipline here is to set them as bets, review them as bets, judge them as bets.

Phase 01

Translate

Week 1

Convert the diagnosis from Playbook 01 into candidate objectives. Each candidate tied to a maturity dimension and a stage transition.

Phase 02

Calibrate

Weeks 2–3

Score confidence on each candidate. Drop the too-easy (4+) and the fantasy (below 2). Backcast each key result. Rework the measures that can't be measured.

Phase 03

Pressure-test

Week 4

Premortem on the slate. "It's twelve months from now. We hit none of these. What killed us?" The failure modes become the watchlist.

Phase 04

Commit & cadence

Weeks 5–6

Final commitment to 3 OKRs. Set the quarterly review forum with truth-seeking norms named. Lock in the year-end retrospective discipline.

03 —

Phase one · Translate.

Week 1

The first week is the work of turning the diagnosis into candidates. Not three OKRs yet — six or seven candidates, each tied to a maturity dimension and a stage transition. The discipline is to generate more than you'll commit to, so the calibration phase has something to cut. Translation isn't poetry. The candidate text can be ugly. The shape matters more than the wording.

01

Re-read the brief.

Start with the three-page brief from Playbook 01. The priority dimensions are already named. The stage gaps are already measured. Every candidate OKR has to connect to a specific row of the maturity grid and a specific stage transition. Anything that doesn't connect that way is the start of mission creep — usually someone's pet project — and gets parked, not committed.

02

Draft six or seven candidate objectives.

One sentence each. Move [dimension] from Stage X to Stage Y by [horizon]. No metric language yet. No key results yet. Just the objective. Generate more than you'll commit — three OKRs is the target; six candidates gives the calibration phase room to cut. Resist drafting fewer to "save time"; the cutting is the work, not the drafting.

03

Score initial confidence per candidate.

For each candidate, the executive sponsor and the operator score their confidence (1–5) that the organisation will land the objective with current resources. The target is a 3. A 4 or 5 is sandbagged — the room knows it can hit it. A 1 or 2 is fantasy — the room knows it can't. Adapted from Annie Duke's calibrated-confidence technique. The 3/5 is the bet worth taking; the others aren't bets at all.

04

Cut the obvious mismatches.

Any candidate scored 4+ on confidence is too soft — either drop it or sharpen it until it's a 3. Any candidate scored 1 on confidence is fantasy — either drop it or split it into a smaller objective the organisation can actually take a swing at. By the end of week one, the candidate list should be four to five sharpened objectives, all of them genuine 3-out-of-5 bets, ready for the calibration phase.

// In practice — Halcyon Financial

Seven candidates by Friday, five by Monday.

The Transformation Lead and Diana (CEO) spend Wednesday and Thursday drafting candidate objectives from the Playbook 01 brief. By Friday, seven candidates are on the table.

The Monday confidence-scoring session is brutal. Two candidates score a 4 from Helen (COO) and Sam (CTO) within the first ten minutes — both involve work Halcyon already has running, dressed up as new ambition. They're cut. One candidate scores a 1.5"reach Stage 6 on AI-Native architecture within twelve months". Sam falsifies it in two sentences (the data estate isn't there; rebuilding it is a three-year programme, not a one-year OKR). Cut.

Four candidates remain. By the end of week one, Daniel (Head of P&C) adds a fifth — the People Impact candidate — that hadn't been drafted because it didn't feel "strategic enough." It scores a clean 3. Five candidates go into Phase 2.

  • The cut list2 candidates · sandbagged at 4+ · already running work
  • The fantasy1 candidate · 1.5/5 · Sam's three-sentence falsification
  • The savePeople Impact candidate · added late · scored 3 · the human work that almost got cut for "not strategic enough"
  • Into Phase 25 sharpened candidates · all 3/5 confidence · three rows of the maturity grid
i
// The signature insight · OKRs are bets, not promises

Most organisations set OKRs they're confident they'll hit, then celebrate hitting them. That isn't ambition; it's theatre. Annie Duke's framing helps: a good OKR sits at the edge of what the organisation can plausibly do — not at the edge of what makes a deck look brave. A 3/5 confidence rating means the room genuinely doesn't know whether the OKR will land. That uncertainty is the feature, not a bug. OKRs are bets. They get set as bets, reviewed as bets, judged as bets — and missed bets aren't failures if the reasoning was sound.

!
// Watch for · the comfortable five

When every candidate scores 3/5 within fifteen minutes of debate, the room may have learned the answer rather than the question. Pressure-test by asking the sceptic: "what would a candidate scored 1.5 look like, in this dimension?" If they can name one, the slate is calibrated. If they can't, the room has narrowed prematurely — go broader before you go sharper.

// Phase 1 deliverables
4–5 candidate objectives · each tied to a maturity dimension
Confidence scores · each candidate at 2–3/5
Cut list · candidates rejected, with reasoning
Dimensions in scope · which rows of the grid the slate targets
04 —

Phase two · Calibrate.

Weeks 2–3

Two weeks on the key results. For each surviving candidate, draft two to four KRs — then test each one by backcasting from the success state. "It's twelve months from now. We hit this KR. What happened?" If the narrative isn't believable, the KR is wrong. If the KR can't be measured by someone who doesn't already believe in it, the measure is wrong. Most OKRs fail not at the objective level but at the KR level.

01

Draft two to four KRs per candidate.

Each KR is a single sentence. Each one names what gets measured, the direction of movement, and the threshold. Mix leading and lagging indicators — a slate of three pure lagging KRs is a slate you can't course-correct on. Aim for one leading indicator per KR set. Resist composite KRs. A KR that joins two measures with an "and" is two KRs pretending to be one.

02

Backcast each KR.

For each KR, the room walks the success state out loud. "It's twelve months from now. We hit this KR. Walk me through what happened — month by month — that got us there." If the narrative isn't believable, the KR is wrong. The most common failure: the narrative requires three things to all go right, and the team can't name which is the dependency. That KR gets broken in half or dropped. Adapted from Annie Duke's mental-time-travel discipline.

03

Rework the unmeasurable.

Any KR that can't be measured by someone who doesn't already believe in it is broken. Most KRs about "shadow" anything fail this test — including, awkwardly, Halcyon's shadow AI KR. The discipline is to swap a single unreliable measure for a paired-measure pattern: one measure of the positive direction (e.g. sanctioned-tool penetration, captured by tool logs), one measure of the residual gap (e.g. repeated anonymous pulse survey). Both flawed; together, defensible.

04

Add the guardrail KR.

For each objective, one KR explicitly protects against the worst-case version of hitting the others. If the Capability OKR can be hit by burning out the CX team, the guardrail KR measures team load. If the Governance OKR can be hit by surveillance, the guardrail KR measures trust. The guardrail KR is the room's promise that the OKR will be hit honestly, not just hit.

// In practice — Halcyon Financial · three OKRs, twelve KRs

The Governance OKR · where measurement honesty earns its keep.

By the end of week two, the five candidates have been cut to three. The Capability candidate that scored a 4 in week one is re-sharpened to a 3 by tightening the cohort scope. The People Impact candidate, which Daniel almost lost, is the clearest of the three. The Governance OKR is where the team spends most of week three.

  • OKR 01 · Governance & RiskMove Governance from Stage 2 (practice) to Stage 4 by Q4 · confidence 3/5 · owner Anna (Compliance), accountable Diana (CEO)
  • KR1 · paired measureSanctioned AI tooling reaches 80% penetration in CX-facing roles and 60% organisation-wide by Q3 (tool admin logs) · shadow AI rate falls from 41% to <15% in the six-monthly pulse (repeated Playbook 01 instrument, anonymous)
  • KR2 · policy + behaviourAI policy revised, published, and acknowledged by 95% of staff with a one-line behavioural commitment by Q2 · validated by quarterly random-sample interviews (5 per quarter)
  • KR3 · governance forum operationalQuarterly governance forum running by Q2 with audit trail across all AI initiatives · zero unreported incidents over a rolling six months
  • KR4 · guardrail · trustStaff trust in AI governance ("the organisation is fair in how it manages AI usage") maintains ≥75% favourable across quarterly pulses · no quarter-on-quarter drop >10pp

The KR1 rework is the recurring illustration. The first draft read "shadow AI usage drops from 41% to under 10%." Sam (CTO) falsified it in the calibration session: "how would we know? Are we monitoring network traffic? If yes, we've just bought a surveillance problem. If no, this number is whatever the pulse survey says, and once governance starts being enforced, the pulse number drops because people stop telling us — not because behaviour changed."

The room reworked it. The paired-measure form admits the limitation honestly: sanctioned-tool penetration is the leading positive measure (it goes up when alternatives are good enough to draw usage); the repeated pulse is the lagging residual measure (anonymous, so people remain honest); and the guardrail KR4 protects against hitting the others by surveillance. Three measures, none of them perfect, all of them defensible.

i
// The signature insight · paired measures for AI-era KRs

Most AI-era OKRs target behaviour that staff have reason not to fully report — shadow usage, unauthorised tool adoption, productivity gains people don't want to surface. A single instrument always gets gamed. The honest pattern is two measures: one captures the positive direction (something the organisation provides, that has logs), one captures the residual gap (something staff report, anonymously, on a stable instrument). Both are imperfect. Together they're defensible. Naming the imperfection in the KR itself is the move that earns trust — including with the regulator, the board, and the staff being measured.

!
// Watch for · the believable narrative test

When the backcasting narrative requires three or more things to all go right, the KR is fragile. "We hit this by having engineering deliver on time AND adoption running ahead of plan AND the regulator not changing the rules" is a fantasy dressed as a plan. Break it. One KR per dependency. The room will hate this — it makes the slate look bigger. It also makes it honest.

// Phase 2 deliverables
3 OKRs with 9–12 KRs · drafted and backcasted
Each KR measurable · by someone who doesn't already believe in it
One guardrail KR · per OKR
Reworked measures · paired where single measures fail
05 —

Phase three · Pressure-test.

Week 4

A half-day premortem on the full OKR slate. "It's twelve months from now. We hit zero of these. What killed us?" Capture every failure mode the room can name — political, technical, capacity, regulatory, the things no one wanted to say. The premortem is the cheapest insurance the slate buys. Most of what surfaces won't change the OKRs themselves — but it'll change how the room defends them, monitors them, and reviews them through the year.

01

Run the premortem.

Three hours, the full room. The slate on the wall. The prompt is fixed: "It's twelve months from now. We hit zero of these OKRs. Walk me through what killed us." Round the room. Each person names at least three failure modes. Sam (the sceptic) goes last — anchoring on his list before the room has generated its own narrows the thinking. From Gary Klein's premortem technique, popularised in Annie Duke's Thinking in Bets.

02

Cluster the failure modes.

Group the failure modes by category — political risk, capacity risk, technical risk, regulatory risk, people risk, measurement risk. The clusters reveal which OKRs are most fragile, and which failure modes the room sees more than once across different OKRs. Repeated failure modes (the ones that surface in multiple clusters) are the watchlist for the year — usually one or two underlying problems dressed up as several.

03

Decide what changes.

Most failure modes lead to a small adjustment, not a rewrite. Three honest categories of response: "change the KR" (rare, but happens — usually when the measure is broken); "add a leading indicator" (most common — track the failure mode itself, not the outcome); "name the risk and accept it" (also common — some risks can't be mitigated, only watched). The point of the premortem is not to remove all risk; it's to name it before the year starts.

04

Re-score confidence after the premortem.

After the failure modes are clustered and the slate adjusted, re-score the confidence on each OKR. The number should move down — usually from 3 to between 2 and 3. That's healthy. The premortem has surfaced risks the room didn't fully see at the start of the week. A slate that stays at 3/5 confidence after a premortem usually means the premortem was too gentle.

// In practice — Halcyon Financial

Two underlying risks, three OKRs at stake.

The premortem session runs three hours. Sixteen failure modes surface in the first hour. By the end of the second hour, when the cluster work is done, those sixteen have collapsed to two underlying problems, each touching all three OKRs.

  • Underlying risk 01Engineering capacity is over-committed · the data work for KR1 of Governance, the platform work for the Capability OKR, and the workflow integration for People Impact all hit the same team in Q2 and Q3
  • Underlying risk 02Trust in governance is the fragile prerequisite · KR4 (guardrail on trust) is named as a "watch with care" measure; if it dips below 60%, the room agrees in advance that the Governance OKR pace will slow, not accelerate
  • The changeOne KR moved · KR2 of Capability rescoped from "all twenty CX officers complete training" to "the first ten complete by Q2, the next ten by Q3" · acknowledges capacity reality
  • The watchlistEngineering capacity reviewed monthly at the OKR forum · trust pulse reviewed quarterly · regulator change in scope (flagged but not actionable) noted as accepted risk
  • Re-scored confidenceGovernance OKR moved from 3 to 2.5/5 · others held at 3 · slate as a whole more honest, harder, and unanimously committed

The room emerges with the same three OKRs it walked in with — but one rescoped KR, a named watchlist of two underlying risks, and a lower-and-more-honest confidence level on the most ambitious OKR. Diana (CEO) closes the session by saying out loud, on the record: "this is the slate. We'll judge it at year-end by the reasoning, not just the outcomes." That sentence is the year-end discipline being installed twelve months in advance.

i
// The signature insight · the premortem makes the OKRs honest

Without the premortem, the OKR slate is the room's case for. With it, the slate is the room's case for plus the room's case against, structured so the case against can be heard. This is the move that distinguishes OKRs-as-bets from OKRs-as-promises. A bet acknowledges the case against and chooses to make the bet anyway. A promise hides the case against and hopes for the best. The premortem is twenty minutes that protects twelve months of work.

!
// Watch for · the premortem-as-theatre

If the premortem produces only safe failure modes — "competitor moves faster", "market shifts", "unforeseen events" — the room hasn't done the work. The honest failure modes are usually internal: "engineering won't deliver on time because we've over-committed them", "the regulator change in March will block KR3", "P&C doesn't have the bandwidth to do the change comms". Name them. If no one in the room is uncomfortable, the premortem is theatre.

// Phase 3 deliverables
Premortem session run · 3 hours, full room
Failure-mode clusters · 2–4 underlying risks named
Slate adjustments · usually 1–2 KRs reworked
Re-scored confidence · most OKRs settle at 2.5–3/5
Watchlist of risks · review cadence named for each
06 —

Phase four · Commit & cadence.

Weeks 5–6

Two weeks to commit the slate, stand up the review rhythm, and install the year-end discipline before the work begins. The OKR slate that ships from Phase 3 is the slate the organisation lives with for twelve months. What this phase adds is the operating mechanism that will keep the bets honest through the year — and the framing that will let the room judge the bets honestly at year-end.

01

Commit the slate.

Diana (CEO) signs off on the final three OKRs and twelve KRs. The slate is published — same instinct as the maturity grid, visible to the organisation, not hidden in a planning doc. The published slate names confidence levels explicitly: "the room is 2.5/5 confident on Governance, 3/5 on People Impact, 3/5 on Capability. These are bets, not promises." Naming the bet shape inoculates the room against the year-end performance theatre.

02

Stand up the quarterly review forum.

90 minutes, quarterly, the room from Phase 1 plus the operational owners of each KR. Fixed agenda. For each OKR: where is the KR trending? What's the confidence level now? Has any failure mode from the premortem watchlist surfaced? Do we adjust the path or the slate? Truth-seeking norms named in the charter — same ones from Phase 1, with one addition: "a KR going red is data, not failure. Red means we're learning. Green means we may have set the bet too low."

03

Install the mid-year recalibration.

At month six, a half-day session. Not a review — a recalibration. For each OKR: given what we've learned in six months, is the bet still the right one? Most OKRs survive the recalibration. Some get reshaped. Occasionally one gets retired — because the premortem failure mode happened, or because new information arrived that changes the priority. Recalibration is the move that distinguishes annual OKRs from annual decoration.

04

Schedule the year-end retrospective.

Twelve months out. On the calendar. Two hours. Full room. The prompt is fixed — and it is not "did we hit the OKRs." The prompt is: "given what we knew when we set these OKRs, was the reasoning sound? Where did we get lucky? Where did we get unlucky? What would we ask differently next time?" This is the Annie Duke discipline at its sharpest — judge the decisions, not the outcomes. Section 07 of this playbook covers the year-end discipline in full.

// In practice — Halcyon Financial · the slate that ships

Three OKRs, twelve KRs, a year of cadence on the calendar.

By end of week six, Halcyon has its slate committed and its review machinery scheduled before the work begins. Diana publishes the slate to the organisation on a Monday morning — a one-page summary that names the three OKRs, the confidence levels, the year-end discipline, and the names of the operational owners.

  • OKR 01 · Governance & RiskStage 2 → Stage 4 by Q4 · confidence 2.5/5 · owner Anna · 4 KRs with paired measures
  • OKR 02 · People ImpactStage 2 → Stage 3 by Q3 in AI-impacted cohorts · confidence 3/5 · owner Daniel · 3 KRs including one guardrail
  • OKR 03 · Capability & FluencyStage 2 → Stage 4 in CX-facing roles by Q3 (rescoped staged delivery) · confidence 3/5 · owner Priya · 3 KRs
  • Quarterly forumScheduled Q1 wk 12, Q2 wk 24, Q3 wk 36, Q4 wk 48 · in the calendar before the year starts
  • Mid-year recalibrationHalf-day, month 6 · OKRs reshaped if learning warrants
  • Year-end retrospectiveMonth 12 · prompt fixed: "judge the reasoning, not the outcome"

The most important sentence in the published slate is the smallest. At the bottom, in the same font as the rest: "These are bets. We will judge them at year-end by whether the reasoning was sound — not only by whether they landed." That sentence is the year-end discipline installed twelve months in advance, in writing, where the room can be held to it.

i
// The signature insight · the year-end discipline is set on day one

The hardest moment in OKR work isn't setting the OKRs. It's the year-end review, when the room has to honestly distinguish between OKRs that missed because the bet was unsound and OKRs that missed because the bet was sound but luck went the other way. The discipline of judging decisions, not outcomes, has to be installed before the year starts — written into the published slate, named in the forum charter, scheduled into the calendar. Installing it on day three-hundred-sixty-five is too late. The room will already have spent twelve months performing the OKRs rather than betting them.

!
// Watch for · the OKR-as-performance-review

The fastest way to corrupt an OKR slate is to tie individual performance reviews to OKR attainment. The KR that went red because of an honest premortem failure mode becomes the KR no one wants on their record. The room learns to set softer bets next year. The OKRs become wallpaper. Keep the OKR review forum and the individual performance review machinery separate — different cadence, different room, different conversation. OKRs are organisational bets; performance reviews are individual judgements. Combining them breaks both.

// Phase 4 deliverables
Published slate · 3 OKRs, 12 KRs, confidence levels named
Quarterly forum charter · attendees, agenda, norms
Mid-year recalibration · scheduled, six months out
Year-end retrospective · scheduled, twelve months out, prompt fixed
The bet-frame statement · in writing, on the slate
The hardest part of OKR work isn't picking the targets — it's resisting the urge to celebrate the soft ones twelve months later.
07 —

Three cadences. Quarterly, mid-year, year-end.

The discipline that follows commitment

// Setting OKRs is six weeks. Living with them is twelve months. The three cadences below are the operating mechanism — each with a different job. The year-end retrospective is the most important and the most often skipped. Schedule all three before the work begins.

Quarterly · 90 min

The forum.

  • Where is each KR trending?
  • What's the current confidence level?
  • Any premortem failure modes surfacing?
  • Red is data · green may be soft
Mid-year · ½ day

The recalibration.

  • Is the bet still the right one?
  • What have we learned in six months?
  • Reshape · rarely retire · never quietly drop
  • Republish if the slate changes
Year-end · 2 hours

The retrospective.

  • Judge the reasoning, not the outcome
  • Where did we get lucky? Where unlucky?
  • What would we ask differently next time?
  • Decisions and outcomes · separated honestly
08 —

Four ways this fails.

Common pitfalls

// Every OKR slate that didn't survive its first year failed in one of these four ways. Watch for them across the six weeks — and especially at the quarterly forum where they first show themselves.

// Pitfall 01

The sandbag slate.

Every OKR scores 4 or 5 on initial confidence and the room celebrates the slate as "ambitious." Twelve months later, every OKR lands green. The room performs success. Nothing about the organisation actually changed because the targets were chosen to be hit.

The fix

Hold the 3/5 confidence target as non-negotiable. Anything scored 4+ is either dropped or sharpened until the room genuinely doesn't know if it'll land. The discomfort of "we might not hit this" is the discomfort that makes the bet honest.

// Pitfall 02

The twelve-OKR sprawl.

The room can't agree on the three most important things, so it commits to twelve "to be inclusive." By month three, the organisation has stopped paying attention to any of them. OKRs that try to cover everything cover nothing.

The fix

Three OKRs maximum. If the room wants a fourth, one of the first three has to come out. The discipline of the cut is the value. An OKR that didn't make the slate isn't dead — it's a candidate for next year, or a workstream inside an existing OKR.

// Pitfall 03

The unmeasurable KR.

A KR ships with a measure that sounds rigorous and isn't. "Improve customer trust", "reduce shadow AI", "increase AI fluency". By Q2, the team is fighting about what the measure even means. By year-end, the answer is whatever the team that owns the KR says it is.

The fix

No KR ships without a defined instrument, baseline number, and target. Where a single instrument doesn't work — common in AI-era KRs — use a paired measure (positive + residual) and name the imperfection in the KR itself. The honesty earns trust; the false precision destroys it.

// Pitfall 04

The outcome-only review.

Year-end review judges every OKR by whether the KR number was hit. Missed KRs become failures the team has to defend. Hit KRs become victories no one interrogates. The room learns to set softer bets next year. The OKR culture rots.

The fix

Install the year-end retrospective discipline on day one, in writing, on the slate. The question is "was the reasoning sound?" not "did we hit the number?" A missed KR with sound reasoning is a better signal than a hit KR that was sandbagged. This is the Annie Duke move that distinguishes OKRs-as-bets from OKRs-as-promises.

09 —

Three disciplines underneath.

What the methodology page covers in full

// The four phases are the mechanics. These three are the thinking habits that keep them honest. They're surfaced in full on the methodology page; named here so the playbook is honest about what it rests on.

// Discipline 01

Set OKRs as bets, not promises.

  • 3/5 confidence is the target
  • 4+ is sandbagged · 1 is fantasy
  • Backcast every KR · believable narrative
  • Premortem before commitment
// Discipline 02

Measure with honest instruments.

  • Paired measures for shadow phenomena
  • Leading + lagging · not all lagging
  • One guardrail KR per OKR
  • Name the imperfection in the KR
// Discipline 03

Judge reasoning, not outcomes.

  • Year-end retrospective scheduled day one
  • Hit KRs may be sandbagged · interrogate
  • Missed KRs may be sound bets · honour
  • Luck and skill · separated explicitly
10 —

By week six, you should have.

Starter checklist

// If you can tick all fifteen, the slate is committable. Anything missing at week six is debt — it'll surface at the first quarterly forum, or worse, at the year-end retrospective.

A candidate list · 6–7 candidates drafted from the diagnosis
Phase 1
Confidence scored · each candidate rated 1–5
Phase 1
A cut list · sandbags and fantasies removed, reasoning captured
Phase 1
Each surviving candidate tied to a maturity dimension
Phase 1
2–4 KRs per OKR · drafted, with measurement instruments named
Phase 2
A backcasted narrative for each KR · believable to the room
Phase 2
Paired measures used where single measures fail
Phase 2
A guardrail KR per OKR · protects against bad-faith wins
Phase 2
A premortem session held · 3 hours, full room
Phase 3
Failure modes clustered · 2–4 underlying risks named
Phase 3
Confidence re-scored · most OKRs settle at 2.5–3/5
Phase 3
A published slate · 3 OKRs, with confidence levels named
Phase 4
A quarterly forum charter · attendees, agenda, truth-seeking norms
Phase 4
A mid-year recalibration · scheduled, month 6
Phase 4
A year-end retrospective · scheduled, month 12, prompt fixed
Phase 4
11 —

Using this in practice.

Closer

OKRs are bets, not promises.

This playbook is a starting point, not a prescription. Every organisation has its own gravity — political, technical, cultural — that bends the OKR work in different ways. Halcyon Financial is one shape; yours will be different. The number of OKRs may move from three to four. The KR instruments will fit your stack, not Halcyon's.

What travels is the discipline, not the artefacts. The 3/5 confidence target. The backcasted narrative. The premortem before commitment. The quarterly forum with truth-seeking norms. And the move that distinguishes OKRs-as-bets from OKRs-as-promises: at year-end, judge the reasoning — not just the outcome.

The hardest part is the year-end retrospective. Most organisations install it on day three-hundred-sixty-five, by which point the room has already spent twelve months performing the OKRs rather than betting them. Install it on day one, in writing, on the published slate.

If you're setting OKRs for the first time — or trying to fix a slate that's slid into theatre — I'm happy to talk through it.