BearPlex Arsenal · deep research

The CTO Succession Framework.

Forty-eight checks for the company one departure from stalling: bus-factor measurement, knowledge transfer, documentation standards, deputy development, and org design.

Checks

Disciplines

Primary sources

Most software organizations are at most two departures away from stalling. In a study of 133 popular GitHub systems, 65 percent had a truck factor of 2 or less, and technology leaders expect to stay in their current role an average of just 3.3 years.

This framework treats CTO succession as an engineering discipline, not an HR formality: measure where knowledge concentrates, transfer what is tacit, document what is durable, grow the deputy deliberately, and redesign the organization so the risk stops regenerating. Forty-eight checks, each paired with what failing it costs.

Succession is an engineering discipline: measure the concentration, transfer the tacit, and redesign so the risk stops regenerating.

Pillar 01

Measure the bus factor before it measures you.

Knowledge concentration is the default state of software, not an anomaly: 65% of the popular codebases studied depend on at most two people. The measurement methods exist, from peer-reviewed truck-factor algorithms to commercial tools like CodeScene, but git history alone misses code review, meetings, and coordination work. Treat the audit as quantified risk management, and assume you are in the danger bucket until the numbers say otherwise.

01
Run a truck-factor analysis on every production repository using an established method such as Avelino's degree-of-authorship algorithm or CodeScene's knowledge maps. A company that has never measured should assume it sits in the 65% of systems where losing two people stalls the project.
02
Cross-check git-derived scores against code review participation and meeting load before trusting them. The Bus Factor In Practice study (a survey of 269 engineers, tested on JetBrains projects) found existing tools rely solely on version control data even though knowledge also moves through code reviews and meetings, so a clean git score can hide a fatal dependency.
03
Identify the glue people whose coordination work, onboarding, and unblocking never appear in commit history. Commit-based analysis ranks them as low risk, and their departure stalls the organization anyway because what they held was coordination knowledge, not code.
04
Run the vacation test: send each key person genuinely offline for two to three weeks and keep a written log of everything that stalls and everyone who pings them anyway. A leader who cannot take an uninterrupted vacation is carrying unpriced organizational risk, and the gap list is next quarter's succession plan.
05
Extend the audit past the org chart into your dependency chain, including critical open source libraries. Before Heartbleed, OpenSSL ran on about $2,000 a year in donations with only one person working on it full time, while securing huge swaths of the internet.
06
Simulate planned departures before they happen, using offboarding simulation in tooling or a structured tabletop exercise. Discovering the gap during a four-week notice period leaves you four weeks to transfer years of context.
07
Maintain a skills matrix that targets at least three people per critical capability. A bus factor of two means one vacation plus one resignation equals zero coverage.
08
Re-run the full audit quarterly and govern the trend, not the snapshot. Refactors, hires, and attrition silently move knowledge concentrations, so a stale audit is false comfort dressed as diligence.

Pillar 02

Knowledge transfer that actually transfers.

The standard knowledge-transfer session held during someone's notice period is theater: nobody retains a passive walkthrough delivered weeks before the knowledge is needed. The protocols that work are hands-on, including disaster rehearsal, recorded expert interviews, and immediate application, which is how NASA's Spacesuit Knowledge Capture program has preserved safety-critical engineering judgment since 2007. Start in peacetime, because transfer under departure pressure mostly fails.

09
Ban the standalone walkthrough meeting as your primary transfer mechanism. Nobody remembers the details of a complex decision engine five weeks after listening to a presentation about it.
10
Pair every transfer with immediate hands-on work on the system, with the expert reviewing rather than driving. Understanding comes from actions taken on the system, not from listening, so transfer without application evaporates.
11
Run disaster role-play on critical systems regularly, in the style of Google SRE's Wheel of Misfortune. Rehearsal transfers the tacit operational knowledge that written runbooks never capture, and it surfaces decisions normally made only under incident pressure.
12
Record structured expert interviews with your key engineers while they are still engaged, not during their exit. NASA's Spacesuit Knowledge Capture program has run expert lectures and interviews since 2007 precisely because waiting for the departure is too late.
13
Hold after-action reviews immediately after incidents and major launches, while memories are fresh. Lessons written down months later feed the famously unread lessons-learned database instead of the next engineer's judgment.
14
Treat documentation as necessary but insufficient for judgment-heavy knowledge, and plan socialization time alongside it. The hardest CTO knowledge resists writing down and transfers only through working together.
15
Start succession transfer years before any expected departure. In a study of 1,932 popular GitHub projects, 16% were abandoned by all their core developers, and only 41% of those survived through new maintainers taking over.
16
Verify every transfer by having the receiver operate the system with the expert out of the room. A ticked 'KT session held' checkbox proves a meeting happened; demonstrated capability is the actual deliverable.

Pillar 03

Documentation standards wired into the work.

The numbers are damning: 93% of surveyed open source contributors call incomplete or outdated documentation a pervasive problem, yet 60% rarely or never contribute to docs themselves, so any standard that relies on voluntary upkeep is structurally doomed. The fix is structure plus enforcement: established formats like ADRs, Diataxis, arc42, and C4, with updates wired into the definition of done. DORA's research turns this from hygiene into leverage, because quality documentation amplifies the payoff of every technical capability it measured.

17
Adopt Architecture Decision Records for every significant technical decision, stored in the repo in Nygard's context, decision, consequences format. Architectural rationale lives in two or three senior heads, and ADRs are the cheap insurance against an unexplainable codebase after they leave.
18
Structure all documentation using the Diataxis split of tutorials, how-to guides, reference, and explanation. Knowledge capture without structure produces an unusable wiki dump that nobody trusts and nobody reads under pressure.
19
Document the system itself with an arc42-style template and C4 diagrams at context, container, and component levels. This gives an at-risk architect a concrete deliverable checklist instead of the vague and unfinishable instruction to document the system.
20
Go handbook-first: one source of truth, updated by merge request, with named section owners. GitLab proved the model scales to a multi-thousand-person all-remote company, and it directly removes dependence on any individual's memory.
21
Wire documentation updates into the definition of done, for example by blocking release checklists that lack a link to the updated runbook section. Even people who feel the pain of bad docs do not fix them voluntarily, so behavior contracts beat exhortation every time.
22
Execute runbooks on a schedule instead of reviewing them. Runbooks rot faster than systems change, and once an on-call engineer hits one dead dashboard link they learn to distrust the entire wiki.
23
Give every document an owner and a review date, and expire anything nobody renews. A page that is confidently and authoritatively wrong is more dangerous than no page at all.
24
Measure documentation quality and findability, not page count. DORA found quality documentation amplifies the organizational payoff of every technical practice studied: trunk-based development's estimated lift is 1525% with above-average docs versus 36% below.

Pillar 04

Deputy development, not deputy designation.

Naming a successor on paper is not succession: only 20% of HR leaders report having successors ready for critical roles, and algorithmically recommended successors were correct just 34% to 48% of the time in the ICSE 2016 knowledge-loss study. A real deputy program starts with the leader auditing their own invisible work, hands off real responsibility on a cadence, and grows more than one candidate. The hard part is emotional, not procedural, which is why Molly Graham's give-away-your-Legos framing matters more than any template.

25
Have the CTO audit their invisible work across Will Larson's categories: meeting roles, hiring, recurring processes, individual support, inbound questions, six months of calendar and TODO patterns, and external relationships. Leaders fill a hundred little holes they never think about, and those holes are the real job description a successor inherits.
26
Split every identified gap into closable in under four hours versus needing months, and schedule both classes of work. Undifferentiated gap lists rot, and the months-long items are the ones that actually break transitions.
27
Develop at least two candidates for every critical role rather than anointing one. Recommended successors were right only 34% to 48% of the time in the ICSE 2016 knowledge-loss study, so a single bet is a coin flip.
28
Hand off one real responsibility every quarter, permanently and visibly. The blocker to founder-CTO succession is identity fused to the system they built, and only repeated handoffs dissolve it.
29
Move deputies through genuine leadership passages, not just system knowledge, following the Leadership Pipeline model. A brilliant engineer who skipped the people-leadership passage fails as CTO for reasons that have nothing to do with the codebase.
30
Prefer growing an internal successor over betting on a cold external hire. External CEO hires are paid 15% more than internal hires and have an 84% greater chance of turnover than insiders in the first three years, usually for poor performance.
31
Actively manage the key person's load while they teach. 22% of engineering leaders report critical levels of burnout, and the burned-out bus-factor-of-one is exactly the person most likely to walk before the transfer completes.
32
Vet anyone who volunteers to take load off an exhausted key person, inside the company or in your dependencies. The xz utils backdoor reached global Linux infrastructure because sock puppet accounts spent years pressuring a burned-out solo maintainer into accepting a helpful new co-maintainer.

Pillar 05

Org design that lowers the bus factor by default.

Mitigating key-person risk one person at a time is a treadmill; the durable fix is structural. If knowledge concentrates in one individual, the team boundaries were probably drawn around that person rather than around a value stream, and the code ownership model is probably amplifying the problem. Team Topologies, ownership models, and constraint management give you levers that keep working after any individual leaves.

33
Draw team boundaries around value streams and products, not around individuals. Team Topologies' core insight applies directly: knowledge concentrates in one person when the structure was built around that person.
34
Move from strong individual code ownership toward weak or collective ownership backed by review and test discipline. Strong ownership concentrates key-person risk, while undisciplined collective ownership degrades into nobody owning anything, so the discipline is non-negotiable.
35
Put a buffer in front of your most depended-on engineer and stop direct escalations to them. The Phoenix Project's Brent problem is real: every act of heroism deepens the dependency because heroes never have time to document or teach.
36
Stop rewarding indispensability in performance reviews and promote people for making others capable. An irreplaceable engineer is a failure of the system around them, and most organizations quietly pay people to stay irreplaceable.
37
Size what each team owns by cognitive load, not headcount. An overloaded team defaults to letting its one expert hold the system in their head, which recreates the bus factor you just fixed.
38
Rotate engineers through critical systems on a planned cadence. Rotation is the cheapest standing cross-training program and it keeps the skills matrix honest instead of self-reported green.
39
Refuse to let haunted forests form: carve unowned legacy systems behind strong API boundaries and rewrite incrementally. Code nobody understands gets routed around with protective shims and quietly taxes every adjacent team for years after the expert leaves.
40
Track scope creep on your engineering leaders as a standing metric. 65% of engineering leaders reported their scope expanded in the past year and 40% took on more direct reports, so the overloaded single leader is the norm, which is precisely the risk scenario.

Pillar 06

Board-grade governance: make succession a number.

Boards already treat CEO succession as ongoing scenario planning, but engineering key-person risk rarely gets the same rigor, and mainstream executive-search offerings are role-generic, with nothing specific to codebase knowledge or architecture continuity. ISO 30414 supplies the vocabulary, the knowledge-at-risk research supplies the math, and the gap between board-grade process and engineering-grade transfer is where most companies fail. Close it with metrics a board can interrogate twice a year.

41
Report successor coverage and readiness for engineering-critical roles using ISO 30414 succession-planning definitions. Boards cannot govern what is never quantified, and only 20% of HR leaders currently have successors ready for critical roles.
42
Model knowledge loss as a tail-risk number, adapting the knowledge-at-risk approach from the ICSE 2016 research. Studied projects were susceptible to losses more than three times larger than the expected loss, and the tail event is the one that kills companies.
43
Designate successors proactively and in writing, before any departure signal. Having readily available successors reduced expected knowledge loss by as much as 15% in the same study, which is cheap insurance by any standard.
44
Put a clock on the dependency and plan as if roughly three years of CTO tenure remain. Technology leaders expect to leave within 3.3 years on average, and a record 40 technology-sector CEOs departed in 2024, up 90% on the prior year.
45
Pair every named successor with verified technical readiness, not just a name in a planning document. Role-generic succession metrics count whether a successor exists but cannot tell you whether that person can actually run the systems.
46
Review the succession plan at board cadence, at least twice a year, as scenario planning rather than a one-time event. A plan written once and filed is stale by the next reorg, and stale plans fail exactly when invoked.
47
Budget honest ramp time into any external-hire scenario. 72% of engineering leaders say it takes a new hire more than a month to submit their first three meaningful pull requests, and a cold CTO hire ramps far slower across every undocumented system.
48
Write down what happens if the CTO disappears tomorrow, including credentials, vendor relationships, and decision authority. Roughly half of US business owners surveyed either plan to close or have no succession plan, so the absence of a written answer is itself the plan.

Sources

Every statistic in this framework was re-verified against its primary source in June 2026. The receipts ship with the page.

What now

Use it. Then bring us the bill.

If the kit shows red flags you can't fix in a quarter, that's the conversation we're built for. If the framework shows a bus factor you cannot fix from inside, fractional senior leadership is a bridge we have built before.

Talk to engineering

The door is open

Bring the problem.We bring the discipline.

Tell us which world your problem lives in, or let the diagnostic find out. The first conversation is with an engineer, not an account manager.

Start the conversation See the proof

NDA-first process · SOC 2 Type II audit in progress · GDPR compliant