The Executive AI Scorecard: A Weighted Framework for Project Selection

Feb 10

By Warren Fridy and Bill Yeagle

Executive Summary

Despite record investment in Generative AI, many organizations remain stuck in the "GenAI Divide"—spending heavily on pilot programs with limited measurable impact. At Retrivika, we believe the solution isn't more "magic buttons," but better project selection.

This article introduces the Executive AI Scorecard, a weighted framework (rubric) designed to objectively evaluate AI initiatives before a single line of code is written. By scoring potential projects across the following four critical dimensions leaders can separate costly "science projects" from high-value strategic wins.

Data Readiness - Accessibility & Quality
Net ROI - Operational Leverage
Risk Profile - Oversight
Operational Friction - Urgency

Inside this framework:

The Scoring Matrix: A step-by-step guide to calculating the viability of any AI proposal.
Comparative Analysis: A real-world look at two construction industry use cases, demonstrating why a "Schedule Prediction Oracle" gets a Red Light while a "Submittal Automation AI Agent" gets a Green Light.
Strategic Focus: Why the highest returns are often found in "boring" back-office operations rather than flashy consumer-facing features.

In the rush to adopt Artificial Intelligence, many organizations fall into what's often described as the "GenAI Divide": widespread investment, limited measurable impact. Industry research (including MIT's recent work on AI adoption) consistently highlights this gap.

Why? Because they invest in "magic buttons" rather than integrated systems. The distinction matters: a chatbot answers, but an AI Agent acts — ingesting digital content, discovering information, and completing multi-step workflows, making decisions and taking action along the way.

At Retrivika, we believe the highest ROI comes from Human Enablement — deploying AI Agents to remove drudgery so your experts can focus on high-value decision-making.

We use this Weighted Scoring Matrix to evaluate potential AI initiatives, separating "science projects" (high cost, low return) from "strategic wins."

The Weighted Scoring Matrix

Assign a score of 0 to 5 for each category. Multiply the score by the weight to get the weighted value.

  
        Category
        Weight
        Scoring Guide (0 = Low, 5 = High)
      
          1. Data Readiness
          (Feasibility)
        
        30%
        
          0: Knowledge exists only in employee minds or scanned notes.

          3: Data is digital (files/email) but lacks API access.

          5: AI Agent Ready—Data is structured (SQL) and accessible via MCP.
        
          2. Net ROI Multiplier
          (Magnitude)
        
        30%
        
          0: Negative value—Cost of compute exceeds labor saved.

          3: Meaningful efficiency—AI drafts/summarizes with low overhead.

          5: Scale efficiency— AI Agent processes volumes impractical for manual staff.
        
          3. Risk Profile
          (Safety)
        
        20%
        
          0: Unchecked Autonomy— AI Agent executes high-stakes actions without review.

          3: Human-on-the-Loop—AI Agent acts, but logs are audited daily.

          5: Human-in-the-Loop—Human acts as final gatekeeper.
        
          4. Operational Friction
          (Demand)
        
        20%
        
          0: Solution looking for a problem—No clear pain point.

          3: Routine drudgery—Work done daily by specialists.

          5: "Shadow AI" Signal—High burnout; employees already using AI to survive.

Category	Weight	Scoring Guide (0 = Low, 5 = High)
1. Data Readiness (Feasibility)	30%	0: Knowledge exists only in employee minds or scanned notes. 3: Data is digital (files/email) but lacks API access. 5: AI Agent Ready—Data is structured (SQL) and accessible via MCP.
2. Net ROI Multiplier (Magnitude)	30%	0: Negative value—Cost of compute exceeds labor saved. 3: Meaningful efficiency—AI drafts/summarizes with low overhead. 5: Scale efficiency— AI Agent processes volumes impractical for manual staff.
3. Risk Profile (Safety)	20%	0: Unchecked Autonomy— AI Agent executes high-stakes actions without review. 3: Human-on-the-Loop—AI Agent acts, but logs are audited daily. 5: Human-in-the-Loop—Human acts as final gatekeeper.
4. Operational Friction (Demand)	20%	0: Solution looking for a problem—No clear pain point. 3: Routine drudgery—Work done daily by specialists. 5: "Shadow AI" Signal—High burnout; employees already using AI to survive.

Detailed Case Study: Construction Submittal Automation

To illustrate the Scorecard in action, let's analyze a high-value process in the Construction Industry: Creating the Submittal Register.

The Scenario: Before a commercial building can be built, a Project Engineer (PE) must read a 1,500-page PDF "Project Manual" (the specs). They must identify every single instance where the architect requires a "Submittal" (e.g., "Submit tile samples for approval," "Submit concrete mix design").

The Drudgery: A PE spends 2–3 weeks (often 80+ hours) manually reading, copying, and pasting these requirements into Excel.
The Risk: If they miss one, the material isn't ordered, causing construction delays.

The Proposal:

"Deploy an Agentic AI workflow to parse the Project Manual PDF, identify all 'Action Submittals,' extract the Section Number and Description, and populate the Submittal Register Excel file for PE review."

The Rubric Analysis

1. Data Readiness (Score: 5/5)

Analysis: Construction specifications follow a rigid industry standard (CSI MasterFormat). The documents are almost always digitally born PDFs (not scanned). The structure is predictable (Section 03 30 00 is always Concrete).
Verdict: The data is clean, structured, and digital.
Weighted Score: 5 x 0.30 = 1.5

2. Net ROI Multiplier (Score: 5/5)

The Math (illustrative):
- Human cost: 80 hours × $65/hr (PE rate) ≈ $5,200 per project.
- AI cost: A large-document run (model + orchestration) can be ~$25 per project, depending on model choice, guardrails, and volume.
Analysis: If the system produces a draft register quickly and cheaply, the ROI can be compelling—provided the review workflow is strong.
Weighted Score: 5 x 0.30 = 1.5

3. Risk Profile (Score: 4/5)

Analysis: This is a Human-in-the-Loop workflow. The AI does not order the concrete; it simply builds the list of things to order.
Safety Valve: The system provides a citation (a link to the exact page/section) for every row it creates. The PE reviews the output, verifying against the source.
Why not 5/5? Missing a submittal creates a delay risk, so the review process is critical.
Weighted Score: 4 x 0.20 = 0.8

4. Operational Friction (Score: 5/5)

Analysis: This is the most hated task in the industry. It causes high burnout among Junior Engineers. Automating this improves retention and morale.
Weighted Score: 5 x 0.20 = 1.0

Final Score: 4.8 / 5.0 (Strong Candidate)

Comparative Analysis: A Tale of Two Projects (Construction Industry)

Often, applying the rubric helps kill bad ideas early. Let's compare two common proposals in the construction sector.

Proposal A: "The Generative Schedule Oracle"

The Idea: "Let's feed 20 years of old Microsoft Project (.mpp) files into an AI so it can predict the perfect timeline for our new hospital project."

  
    
        Category
        Score
        Weight
        Weighted Value
        Note
      

    
        Data Readiness
        1
        30%
        0.3
        Garbage In—Old schedules are rarely updated to match "as-built" reality.
      

        Net ROI
        2
        30%
        0.6
        Low—Every site is unique (soil, weather). A generic prediction has low value.
      

        Risk Profile
        2
        20%
        0.4
        High Risk—If the AI hallucinates a timeline that is physically impossible, it creates liability.
      

        Op Friction
        3
        20%
        0.6
        Scheduling is difficult, but it is high-skill strategic work, not repetitive drudgery.
      

        TOTAL
        
        
        1.9 RED LIGHT
        Verdict: Do not build.
      

  

Category	Score	Weight	Weighted Value	Note
Data Readiness	1	30%	0.3	Garbage In—Old schedules are rarely updated to match "as-built" reality.
Net ROI	2	30%	0.6	Low—Every site is unique (soil, weather). A generic prediction has low value.
Risk Profile	2	20%	0.4	High Risk—If the AI hallucinates a timeline that is physically impossible, it creates liability.
Op Friction	3	20%	0.6	Scheduling is difficult, but it is high-skill strategic work, not repetitive drudgery.
TOTAL			1.9 RED LIGHT	Verdict: Do not build.

Proposal B: "The RFI Triage & Draft Agent"

The Idea: "Project Managers spend 15 hours/week answering RFIs (Requests for Information). AI should read the PDF specs and draft a response."

  
    
        Category
        Score
        Weight
        Weighted Value
        Note
      

    
        Data Readiness
        5
        30%
        1.5
        High—We have the "Truth" (Project Manual & Plans) in clean, searchable PDF format.
      

        Net ROI
        5
        30%
        1.5
        Huge—15 hours/week x 10 PMs = 150 expensive hours saved.
      

        Risk Profile
        5
        20%
        1.0
        Safe—The AI drafts the email; the PM reviews it. Human-in-the-loop.
      

        Op Friction
        5
        20%
        1.0
        Severe—PMs are burning out from email overload.
      

        TOTAL
        
        
        5.0 GREEN LIGHT
        Verdict: Pilot now with clear review gates.
      

  

Category	Score	Weight	Weighted Value	Note
Data Readiness	5	30%	1.5	High—We have the "Truth" (Project Manual & Plans) in clean, searchable PDF format.
Net ROI	5	30%	1.5	Huge—15 hours/week x 10 PMs = 150 expensive hours saved.
Risk Profile	5	20%	1.0	Safe—The AI drafts the email; the PM reviews it. Human-in-the-loop.
Op Friction	5	20%	1.0	Severe—PMs are burning out from email overload.
TOTAL			5.0 GREEN LIGHT	Verdict: Pilot now with clear review gates.

Insights from the Field (The GenAI Divide)

Three patterns we see repeatedly across failed and successful AI initiatives:

1. The "Learning Gap" Kills Pilots

Teams often fail pilot projects because they implement “static” tools that lack context, provenance, or integration. Users quickly abandon systems that don’t fit real workflows.

Our Take: If your Data Readiness score is low, you cannot build a "learning" system (RAG or Agentic), and your project will likely fail.

2. Back-Office > Front-Office

While 50% of AI budgets go to Sales & Marketing, the highest measurable returns are often found in Operations and Finance.

Our Take: Look for "boring" high-volume tasks like the Submittal Register above. These are the hidden gold mines of the enterprise.

3. The "Build vs. Partner" Reality

Organizations that attempt to build entirely in-house often underestimate integration and governance costs, while internal staff — however capable — rarely have deep experience in AI Agent architecture. Strategic partnerships close both gaps, accelerating time-to-value while actively building internal capability through shared knowledge and hands-on mentoring — leaving your team stronger, not dependent.

In practice, the rubric tends to be driven by data readiness and operational leverage — get those two right and the other dimensions follow.

If Your Score is 4.0+

Here’s a pragmatic, low-risk 90‑day path:

Discovery (Weeks 1–3): Map the data reality, systems-of-record, and permissions.
Pilot (Weeks 4–8): Build one workflow with human-in-the-loop review and audit trails that trace outputs to their data sources, systems-of-record, and security permissions — validated by subject matter experts. Establish baseline metrics before launch and track them continuously; if the pilot isn't demonstrating measurable cost reduction or time savings, Step 3 is a gamble, not a strategy. Evaluate why the pilot showed no value, adjust the rubric, and launch a new pilot — either refining the current workflow or selecting a higher-scoring candidate from your backlog
Scale (Weeks 9–12): Expand to adjacent tasks and harden governance.

Our Take: Don't reinvent the wheel. Partnering with experts allows you to leverage established architectures (like the Model Context Protocol) rather than building brittle internal wrappers.

Ready to run your initiatives through the scorecard? Retrivika's Information Science architects will validate your assumptions and help you build a pipeline of strategic wins.

Contact Retrivika

Warren Fridy