BearPlex Arsenal · deep research

The AI Readiness Audit.

Forty-eight checks across six dimensions for CTOs and VPs of engineering. AI readiness is organizational, not technical, and this audit scores the parts no procurement decision can fix.

Audit points

Dimensions

Primary sources

Most AI initiatives do not fail on model quality. They fail on problem selection, data readiness, evaluation discipline, and organizational wiring. S&P Global found 42% of companies abandoned most of their AI initiatives before production in 2025, up from 17% a year earlier, and Gartner predicts 60% of AI projects will be abandoned through 2026 because the data underneath them is not AI-ready.

This audit gives engineering leaders 48 checks across six dimensions, each scored independently, built from the strongest public research and practitioner postmortems rather than any single vendor's maturity model. Every statistic was re-verified against its primary source in June 2026.

Score each dimension independently, and audit strategy first: a weak answer there invalidates everything downstream.

Interactive

Score yourself in 4 minutes.

The AI Readiness Scorer runs these 48 checks as yes-or-no calls and hands back your score, readiness band, and highest-impact gaps. Free, no email needed for the result.

Take the assessment

Pillar 01

Strategy and value definition.

RAND found more than 80% of AI projects fail, twice the rate of ordinary IT projects, and the leading root cause is misunderstanding the problem rather than the technology. Most readiness checklists score tooling; the failure data points at problem selection and value definition. Audit this pillar first, because a weak answer here invalidates everything downstream.

01
Define the business problem and the metric it moves before any model work starts. RAND found the top cause of AI project failure is stakeholders misunderstanding or miscommunicating the problem, not the technology, and that mistake is locked in on day one.
02
Assign a P&L owner and a target number to every AI initiative. Gartner names unclear business value as a leading reason at least 30% of gen AI projects were predicted to die after proof of concept by end of 2025.
03
Run a buy-versus-build decision with real cost data for every use case. MIT NANDA found externally purchased tools and partnerships reached deployment about 67% of the time versus roughly 33% for internal builds, so defaulting to building burns budget and time.
04
Set explicit kill criteria for every pilot before it starts. S&P Global found the average organization already scraps 46% of AI proofs of concept before production, and without criteria the wrong half survives.
05
Measure AI investment outcomes on a fixed cadence, not just at launch. Cisco's 2025 AI Readiness Index found only 32% of organizations measure the value of their AI investments, which turns every roadmap debate into opinion.
06
Refresh AI cost models at least quarterly. Stanford's AI Index documents a more than 280-fold fall in GPT-3.5-class inference cost between November 2022 and October 2024, so build and capacity decisions priced on last year's tokens are simply wrong.
07
Require a concrete answer to what a proposed agent does that a workflow plus an LLM call does not. Gartner expects over 40% of agentic AI projects to be canceled by end of 2027 and estimates only about 130 of the thousands of self-described agentic AI vendors are real.
08
Weigh back-office and operations use cases before flashier customer-facing ones. MIT NANDA found the strongest ROI in back-office automation while more than half of gen AI budgets go to sales and marketing tools.

Pillar 02

Data infrastructure.

Gartner predicts organizations will abandon 60% of AI projects through 2026 because the data underneath them is not AI-ready, and 63% of organizations either lack or are unsure they have the right data management practices for AI. The EDM Association's 2026 benchmark of more than 435 organizations found only 31% report advanced data strategy capability. The model is the smallest box in the system diagram, and this is where most of the real engineering work lives.

09
Inventory the data each approved use case needs and confirm it exists with rights to use it. RAND lists lacking the data needed to train an effective model among the five leading causes of AI project failure, and it usually surfaces only after months of sunk work.
10
Stand up lineage and quality monitoring on every dataset that feeds a model. Informatica found 67% of data leaders are struggling to move gen AI pilots to production, with data quality, completeness, and readiness cited by 43% as a top obstacle.
11
Centralize or federate data access so teams can discover and reuse datasets and features. Cisco found 64% of organizations struggle to centralize data, and every duplicated extract adds silent inconsistency to model behavior.
12
Extend data management to AI-specific capabilities: vector stores, chunking, embeddings, and RAG integration. Gartner's 60% abandonment prediction applies precisely to projects unsupported by AI-ready data in this sense.
13
Score data management maturity against an external model such as DCAM, not a self-assessment. The EDM Association's 2026 benchmark found only 31% of organizations report advanced data strategy capability even though more than 70% have CDOs and formal governance structures on paper.
14
Track data dependencies and undeclared consumers of every pipeline. Sculley's hidden technical debt research showed unmanaged data dependencies make a change to one system a change to all of them.
15
Classify sensitive data and enforce access controls before it reaches any model or prompt. IBM found 97% of organizations breached through AI models or applications lacked proper AI access controls.
16
Test retrieval quality separately from model quality in every RAG system. When retrieval is wrong, the best model available produces confident garbage and the failure gets misdiagnosed as a model problem.

Pillar 03

Model pipelines and evaluation.

Google's MLOps levels and Microsoft's maturity model still define the pipeline mechanics, but the LLM era moved the bottleneck to evaluation. Hamel Husain's field work finds the most common root cause of stalled LLM products is the absence of a robust evaluation system, not bad prompts. Teams that have shipped at scale, like GoDaddy, learned to treat malformed outputs, provider outages, and version regressions as routine engineering problems.

17
Automate training and deployment pipelines with versioned data, code, and models. Google's MLOps level 0, the all-manual notebook stage, is where models silently rot because retraining never actually happens.
18
Build an evaluation suite before scaling any LLM feature: unit-test assertions, human and LLM-judge trace review, then A/B tests. Hamel Husain finds unsuccessful AI products almost always share one root cause: the failure to create robust evaluation systems.
19
Validate every LLM judge against human labels before trusting its scores. Husain's guidance is to track correlation between model-based and human evaluation before relying on automatic scores; an unvalidated judge launders failure modes into a green dashboard.
20
Review production traces on a regular cadence with near-zero-friction tooling. The same eval research treats regular trace review as the foundation of error analysis, and skipping it means flying blind on real user failures.
21
Pin model versions and run regression evals before any provider or version migration. Applied LLMs documents Voiceflow taking a 10% drop on intent classification just migrating between GPT-3.5 versions, caught only because evals existed.
22
Handle malformed output and provider outages in code with retries, fallbacks, and multi-provider switching. GoDaddy measured invalid output on about 1% of GPT-3.5 structured calls and lost its chatbots to a multi-hour provider outage.
23
Gate irreversible or sensitive actions with deterministic code, never the model's judgment. GoDaddy gates human handoff with code-identified stop phrases instead of the model's judgment, because a model that is right most of the time still fires the wrong action constantly at scale.
24
Monitor for drift and define exactly what triggers retraining or prompt revision. Microsoft's MLOps maturity model puts drift-triggered automatic retraining at its top level, and without it quality decays invisibly.

Pillar 04

Team capabilities.

DORA's 2024 research delivered the sharpest finding in this report: as AI adoption increased, software delivery throughput dipped an estimated 1.5% and delivery stability fell an estimated 7.2%. AI amplifies existing engineering discipline rather than substituting for it. Readiness here means strong delivery basics, the right hiring sequence, and proficiency training that goes well beyond awareness decks.

25
Verify engineering fundamentals first: small batches, robust automated tests, fast feedback loops. DORA found AI adoption correlated with an estimated 7.2% drop in delivery stability, consistent with AI-enlarged change batches overwhelming weak review and testing practices.
26
Hire in sequence: product and platform engineers first, data instrumentation next, ML specialists last. Applied LLMs practitioners found hiring an ML engineer too early wastes money and causes churn when there is no product or data foundation for them to optimize.
27
Train for role-specific AI proficiency, not generic awareness sessions. Prosci attributes 38% of AI implementation struggles to user proficiency gaps, and EY found only 12% of employees receive sufficient AI training.
28
Put AI initiatives in line teams with empowered managers rather than a central lab that owns everything. MIT NANDA found successful deployments put adoption ownership with empowered line managers and domain leaders, not central AI labs.
29
Measure actual employee AI usage instead of relying on leadership estimates. McKinsey found employees self-report heavy gen AI use at three times the rate their C-suite estimates, so plans built on the official picture mistarget.
30
Give every engineer sanctioned, paid AI tooling with clear usage guidance. MIT NANDA found workers at over 90% of companies use personal AI tools while only 40% of companies hold official LLM subscriptions, and that gap is filled with company data on personal accounts.
31
Define ownership of prompts, retrieval logic, and eval assets the way you define code ownership. Unowned prompts become the new configuration debt: edited by everyone, tested by no one.
32
Budget upskilling as an ongoing program line, not a one-off workshop. EY finds AI on a strong talent foundation can unlock up to 40% more productivity gains, value that weak training strategies leave on the table.

Pillar 05

Governance and risk.

IBM's 2025 breach data puts hard dollars on ungoverned AI: 13% of organizations reported breaches of AI models or applications, and 97% of those lacked proper AI access controls. Regulation is still coming, just on a revised clock: the EU's May 2026 omnibus agreement deferred Annex III high-risk obligations to December 2027, while the Air Canada ruling already establishes liability for AI output today. Use NIST AI RMF as the working floor and ISO/IEC 42001 as the ceiling where certifiable governance matters.

33
Adopt a named risk framework, with NIST AI RMF as the floor, and map every AI system to it. IBM found 63% of breached organizations either had no AI governance policy or were still developing one.
34
Implement AI-specific access controls on models, prompts, and training data. IBM found 97% of organizations breached through AI models or applications lacked proper AI access controls.
35
Detect and inventory shadow AI from observed behavior, not policy attestations. IBM found high shadow AI usage added an average $670,000 to breach costs, and one in five organizations reported a breach due to shadow AI.
36
Assign a named owner for the accuracy of every customer-facing AI output. A BC tribunal made Air Canada honor its chatbot's invented bereavement-fare policy, establishing that a company is liable for what its AI tells customers.
37
Classify your systems against the EU AI Act's high-risk categories now, even though the deadline moved. The May 2026 omnibus agreement deferred Annex III high-risk obligations to December 2027 and Annex I to August 2028, but Article 99 penalties still reach EUR 35 million or 7% of worldwide turnover at the top tier, and classification work does not fit in a last-minute sprint.
38
Stand up agentic AI guardrails before deploying agents, not after. Deloitte found 74% of organizations expect at least moderate AI agent use by 2027 while only 21% have a mature governance model for agentic AI.
39
Build and rehearse an AI incident response runbook. Stanford's AI Index counts 233 documented AI incidents in 2024 rising to 362 in 2025, and scaling adoption multiplies your share of them.
40
Audit regularly for unsanctioned AI use even where policies exist. IBM found that among organizations with AI governance policies, only 34% perform regular audits for unsanctioned AI, which makes the policy decorative.

Pillar 06

Change management and operating model.

McKinsey tested 25 organizational attributes and found workflow redesign had the biggest effect on EBIT impact from gen AI, yet only 21% of organizations have done it. Cisco's data says the same thing from another angle: 91% of its top-performing Pacesetters run change management plans versus 35% of everyone else. This is the pillar where most audits go thin and most value is lost.

41
Redesign the workflow around the AI capability instead of layering AI onto the old process. McKinsey found workflow redesign is the single biggest driver of EBIT impact from gen AI among 25 attributes tested, and roughly four in five organizations have not done it.
42
Write a change management plan for every AI deployment with the same rigor as the technical plan. Cisco found 91% of Pacesetters have change management plans versus 35% overall, and Pacesetters move pilots to production four times more often.
43
Allocate effort roughly 70% to people and process, 20% to technology, and 10% to algorithms. BCG's leading adopters follow this 70-20-10 split, and inverting it is how 74% of companies end up struggling to achieve and scale value.
44
Put a senior executive, ideally the CEO, visibly accountable for AI outcomes. McKinsey found CEO oversight of AI governance was the element most correlated with bottom-line gen AI impact at larger companies, yet only 28% of organizations using AI have it.
45
Pair every efficiency target with a quality guardrail metric. Klarna leaned on AI to do the work of roughly 700 customer service agents, then its CEO admitted the cost focus produced lower quality service and the company publicly returned to hiring humans, as reported by Fortune.
46
Keep a human escalation path in every customer-facing AI flow. Klarna's CEO now says it is critical that customers can always reach a human if they want one, a lesson bought with public brand damage.
47
Close the trust gap by level: involve and train frontline staff, not just leadership. Prosci measured executive trust in AI at roughly three times frontline levels, and unconvinced users quietly route around the tool.
48
Track adoption depth and process change, not deployment counts. Deloitte found 37% of organizations use AI at a surface level with little or no change to existing processes, which produces tool spend without transformation.

Sources

Every statistic in this audit was re-verified against its primary source in June 2026. The receipts ship with the page.

What now

Use it. Then bring us the bill.

If the kit shows red flags you can't fix in a quarter, that's the conversation we're built for. The same six dimensions are how we scope every AI engagement, so a scored audit translates directly into a plan.

Talk to engineering

The door is open

Bring the problem.We bring the discipline.

Tell us which world your problem lives in, or let the diagnostic find out. The first conversation is with an engineer, not an account manager.

Start the conversation See the proof

NDA-first process · SOC 2 Type II audit in progress · GDPR compliant