Orion AI Outcomes

Overview

Pillar 5 answers the question: Does the math work?

This pillar establishes the economic framework that ensures AI investments are sustainable, measurable, and deliver real return—with metrics the CFO can present to the board.

Why Pillar 5 Matters

The Problem We're Solving

After four pillars of work—identifying opportunities, validating data readiness, establishing governance, and designing experiences—a critical question remains: What does it cost to run, and how do we know it's working financially?

This is fiduciary responsibility. Organizations have seen too many technology initiatives launch with enthusiasm and wither without economic accountability.

Common economic failures:

No cost model before deployment; shock when bills arrive
Successful pilots that become uncontrolled cost centers at scale
No visibility into which agents deliver value and which drain budgets
Finance excluded from AI decisions until it's too late

What Success Looks Like

By the end of Pillar 5, the client organization has:

Economic thesis for each prioritized agent (bottom-line or top-line value)
12-month financial forecast with monthly tracking
Budget controls with alerts and circuit breakers
Real-time dashboards showing cost vs. value
CFO sign-off: "Now I can explain this to the board"

What is AI FinOps?

FinOps (Financial Operations) is the practice of bringing financial accountability to variable cloud spend. It emerged as organizations moved from predictable capital expenses (buy a server) to unpredictable operating expenses (pay per API call).

How AI Cost Differs from Traditional IT

Traditional IT	AI Systems
Pay for capacity (servers, licenses)	Pay for usage (tokens, calls)
Costs scale with infrastructure	Costs scale with intelligence
Predictable monthly spend	Variable, usage-dependent spend
Cost visible in procurement	Cost hidden in API calls
Optimize by right-sizing	Optimize by prompt engineering

The OAIO FinOps Methodology

Pillar 5 establishes four components for every prioritized agent:

Component	Purpose	Key Questions
Economic Thesis	Define why this agent exists economically	Is it saving money (bottom line) or making money (top line)?
Financial Forecast	Predict costs over time	What will it cost to operate at expected volumes?
Financial Controls	Prevent runaway costs	What guardrails prevent budget overruns?
Financial Observability	Track actual vs. predicted	How do we monitor cost and value in real-time?

Economic Value Definition: Bottom Line vs. Top Line

The first step is defining the economic purpose of each agent. This is not abstract—it determines how success is measured.

Bottom Line Agents (Cost Reduction)

These agents reduce operational costs through:

Labor cost reduction — Automating manual processes
Quality cost reduction — Reducing errors and rework
Opportunity cost reduction — Accelerating cycle times

Example: QA Validation Agent

Metric	Before	After	Impact
Manual QA hours/month	480	120	75% reduction
Average hourly cost	$65	$65	—
Monthly labor cost	$31,200	$7,800	$23,400 saved
Agent operating cost	—	$2,100	Token + compute
Net monthly savings	—	—	$21,300

Top Line Agents (Revenue Generation)

These agents create or accelerate revenue through:

Capacity expansion — Enabling faster time-to-market
Conversion improvement — Better customer outcomes
New offerings — Unlocking new services

Example: Client Insights Agent

Metric	Before	After	Impact
Reports delivered/month	45	78	73% increase
Average report value	$12,000	$12,000	—
Monthly revenue capacity	$540,000	$936,000	+$396,000
Agent operating cost	—	$8,500	Token + compute
Net revenue enablement	—	—	+$387,500

Understanding AI Operating Costs

Unlike traditional software with fixed infrastructure costs, AI agents have variable operating costs driven by usage. Understanding these costs is essential for accurate forecasting.

Key Terms

Term	Definition
Token	Unit AI models use to process text (~4 characters). You pay per token.
Inference	Single call to an AI model. One user request may trigger multiple inferences.
Agent	AI system that reasons and acts autonomously. Makes multiple inference calls per task.
RAG	Retrieval-Augmented Generation—fetching documents to include in context. Multiplies token consumption.

Token Economics

LLMs measure usage in tokens—roughly 4 characters or 0.75 words per token. Costs differ between:

Input Tokens: The context and prompt you send to the model
Output Tokens: The response the model generates (typically 2-5x more expensive)

Agent Unit Economics

Unlike simple chatbot interactions, agents perform multi-step reasoning—each "thought" is a separate LLM call.

Metric	What It Measures	Why It Matters
Cost per task completion	Total tokens to finish one request	Captures full reasoning chain
Calls per task	LLM invocations per request	Agents may call 10-50x internally
Cost per tool use	Tokens when invoking tools	Tool descriptions add context
Conversation cost trajectory	How cost grows across turns	Context accumulation causes exponential growth

Agent Cost Patterns

Different agent architectures have dramatically different cost profiles:

Pattern	Calls/Task	Cost Profile	Example
Simple extraction	1-2	Low, predictable	"Summarize this survey response"
Guided workflow	3-5	Moderate	"Validate report against template"
Autonomous reasoning	10-30	High, variable	"Analyze trends and recommend actions"
Multi-agent orchestration	30-100+	Very high	"Coordinate client, data, and QA agents"

💡

AWS accomplishes this with Amazon Bedrock Agents — action groups with optimized tool definitions. Define concise action group schemas to minimize token overhead when agents invoke tools. Learn more →

The Cost Revolution

The AI industry has experienced unprecedented cost reduction. Understanding this trajectory is essential for financial planning:

The AI Cost Revolution

Frontier AI API costs per 1 million tokens (Mar 2023 - Dec 2024)

99.7%

Cost reduction in 21 months

From $60 to $0.40 per million output tokens

Mar

GPT-4 LaunchMar 2023

Input

$30.00

Output

$60.00

Nov

GPT-4 TurboNov 2023

Input

$10.00

Output

$30.00

Mar

Claude 3 OpusMar 2024

Input

$15.00

Output

$75.00

May

GPT-4oMay 2024

Input

$5.00

Output

$15.00

Jul

GPT-4o miniJul 2024

Input

$0.15

Output

$0.60

Dec

Gemini 2.0 FlashDec 2024

Input

$0.10

Output

$0.40

📉

What This Means for AI Adoption

A task that cost $60 in March 2023 now costs $0.40 in December 2024. Cost is no longer the barrier—the question is whether you have the right strategy, governance, and measurement in place to capture value.

Prices shown per 1M tokens. Click milestones for details.

Self-Hosted vs. API: The Build vs. Buy Decision

For organizations considering self-hosted models, the economics shift significantly:

Factor	API/Cloud	Self-Hosted
Upfront cost	$0	$10,000-50,000 (GPUs)
Monthly infrastructure	Pay-per-token	$5,000-70,000+
Engineering overhead	Minimal	2-3 FTEs minimum
Break-even point	N/A	~1M queries/month
Data residency	Provider-controlled	Full control
Model flexibility	Provider's models	Any open model

Building the Financial Forecast

For each prioritized agent, create a 12-month financial forecast.

Forecast Components

Volume Projections — Expected usage based on business drivers
Token Estimates — Input/output per interaction (validated through prototyping)
Model Selection — Cost-optimized model choice for the use case
Growth Assumptions — How usage scales with adoption
Contingency — Buffer for prompt iteration and model changes

Example Forecast

Month	Volume	Est. Cost	Actual	Variance
M1 (Pilot)	500	$450	TBD	—
M2	1,500	$890	TBD	—
M3	2,500	$1,400	TBD	—
M4-6	3,000/mo	$1,650/mo	TBD	—
M7-12	3,300/mo	$1,800/mo	TBD	—
Year 1 Total	—	$18,750	—	—

The forecast becomes a living document, updated monthly with actuals to refine predictions and catch anomalies early.

Financial Controls: Guardrails Before Runaway

Financial controls must be in place before any agent goes live.

Budget Controls

Control Type	Implementation	Purpose
Daily spend limits	Alert at 80%, hard stop at 100%	Catch problems fast
Monthly budget allocation	By agent and department	Accountability
Quarterly review triggers	Automatic when variance exceeds 20%	Course correction

Rate Limiting

Control	Purpose
Requests per minute	Prevents sudden spikes
Concurrent sessions	Caps simultaneous usage
Token limits per request	Prevents context window abuse

💡

Microsoft Azure accomplishes this with Azure OpenAI Service — deployment quotas and tokens-per-minute limits. Allocate TPM quotas across deployments and configure rate limits to manage capacity. Learn more →

Circuit Breakers

Control	Trigger	Action
Cost anomaly detection	Spend exceeds threshold	Automatic pause
Error rate threshold	Failures exceed limit	Disable agent
Loop detection	Repeated identical calls	Force termination

Controlling Runaway Agents

Autonomous agents can enter infinite loops or pursue unproductive reasoning paths. Essential controls:

Control	Implementation	Example
Call limits	Hard cap on LLM calls per task	Max 25 calls/task
Token budgets	Per-task ceiling triggering termination	50,000 tokens/task
Timeout thresholds	Maximum execution time	5 minutes
Cost circuit breakers	Spend threshold triggers pause	$10/hour cap

Cost Optimization Strategies

Prompt Caching for Agents

Multi-turn agent conversations repeat significant context on every call: system prompts, tool definitions, conversation history.

How it works: Cache the static prefix of prompts
Savings: Up to 90% reduction on cached tokens
Best for: Agents with long system prompts, many tools, extended conversations
Caveat: Cache expires (typically 5 minutes); only helps active sessions

💡

Google Cloud accomplishes this with Vertex AI — context caching for Gemini models. Cache large contexts like documents or code repositories to reduce costs on subsequent queries. Learn more →

Model Selection

Not every task requires the most capable model:

Task Complexity	Recommended Model	Savings vs. Flagship
Simple Q&A, extraction	Smaller/faster models	70-80%
Standard workflows	Mid-tier models	40-50%
Complex reasoning	Flagship models	Baseline
Multi-step research	Flagship with caching	30%

Context Window Management

Agent conversations accumulate tokens rapidly:

Sliding window: Keep only the last N turns
Summarization: Compress older context
Selective memory: Store key facts, discard routine exchanges
Session boundaries: Clear context at natural breakpoints

💡

Google Cloud accomplishes this with Vertex AI Agent Builder — conversation history and session management. Control how much conversation history is retained and when to reset context for cost efficiency. Learn more →

Financial Observability

The final component is visibility—seeing economics in real-time.

Dashboard Requirements

View	Shows	Audience
Cost per agent, per day	Token consumption trends	Operations
Cost vs. forecast variance	Budget tracking	Finance
Cost vs. value delivered	ROI tracking	Leadership
Anomaly alerts	Unexpected spikes	All

Alerting Configuration

Alert Type	Trigger	Recipient
Budget threshold	80% of daily/monthly limit	Operations, Finance
Anomaly detection	2x normal spend rate	Operations
Forecast deviation	>20% variance	Finance, Leadership

Reporting Cadence

Report	Frequency	Audience
FinOps summary	Monthly	Finance, Leadership
Business review metrics	Quarterly	Executive team
AI economics review	Annually	Board

The FinOps Framework Phases

The FinOps Foundation defines three phases forming a continuous cycle:

1. Inform

Build visibility into AI costs and usage before you can manage them.

Real-time cost dashboards by team, project, use case
Usage attribution and trend analysis
Anomaly detection for spend spikes
Token consumption at the task level

2. Optimize

Reduce costs without reducing value—more intelligence per dollar.

Model selection (right-size to task complexity)
Prompt engineering for efficiency
Caching strategies
Architecture optimization

3. Operate

Manage costs as continuous practice, not one-time project.

Budget setting and enforcement
Showback and chargeback
Regular review cadences
Continuous improvement

Session Structure

Pillar 5 is delivered through a focused working session with finance involvement.

Required Personas

Persona	Role	Why They're Essential
CFO or Finance Lead	Financial accountability	Owns budget approval and board reporting
CIO	Technology accountability	Owns implementation decisions
Agent Owners	Per-agent accountability	Own value delivery
OAIO Facilitators	FinOps expertise	Guide economic modeling

Session Agenda (Half-Day)

Time	Focus	Purpose
0:00–0:30	Context Setting	Review agents from Pillars 1-4, introduce FinOps framework
0:30–1:30	Economic Thesis Definition	Define bottom-line or top-line purpose per agent
1:30–1:45	Break
1:45–2:45	Financial Forecast Building	Create 12-month cost projections
2:45–3:30	Controls and Observability	Define guardrails, dashboards, alerts
3:30–4:00	Sign-off and Next Steps	CFO approval, implementation planning

Partner Ecosystem for AI FinOps

OAIO integrates with specialized platforms for comprehensive financial management.

Jellyfish (jellyfish.co)

Engineering Intelligence & DevFinOps

For organizations deploying coding assistants and development agents:

Multi-agent benchmarking (Copilot, Cursor, Claude Code, Amazon Q)
Usage-to-outcome correlation
Spend tracking by tool, team, initiative
Adoption insights

Pay-i (pay-i.com)

GenAI Cost Management & Observability

For production agent deployments:

Token-level tracking through complete workflows
Real-time margin calculations
Anomaly detection at the API call level
Workflow analytics for multi-step tasks

Outputs and Handoffs

Pillar 5 Deliverables

Deliverable	Description	Example
Agent Economics Register	Economic thesis per agent	View Example
Cost Forecast Model	12-month projection with assumptions	—
Financial Controls Spec	Budget caps, rate limits, circuit breakers	View Example
Observability Requirements	Dashboard specs, alerting rules	—
Partner Integration Plan	Tooling selection and roadmap	—

Handoff to Implementation

Pillar 5 completes the OAIO blueprint. Outputs flow to Orion Innovation's AI Service Delivery team for implementation:

Economic models inform build vs. buy decisions
Cost constraints shape architecture choices
Observability requirements drive monitoring implementation
Financial controls become operational procedures

Delivery Timeline

1-2 weeks total — click a week for details

Week 1

Week 2

Preparation & Working Session

Half day

Documentation & Sign-off

2-3 hours

Client effort shown in bars

Common Pitfalls

Facilitator Guidance

Mission & Charter

The Pillar 5 FinOps Session exists to make AI economically accountable. This is not a technology cost estimation exercise or a finance compliance checkpoint. The mission is to:

Define clear economic thesis for each agent (bottom-line cost reduction or top-line revenue enablement)
Build financial forecasts that can withstand CFO scrutiny
Establish controls that prevent runaway costs before they happen
Create observability that makes AI economics visible in real-time

What this session is NOT:

A procurement exercise to negotiate vendor pricing
An abstract ROI calculation divorced from specific agents
A rubber-stamp approval process after decisions are made
A technical architecture review

Session Inputs

Participants should have reviewed:

Pillar 1-4 outputs: agents, data, governance, experience designs
Current operational costs for processes agents will affect
AI model pricing for planned integrations

Orion enters with prepared artifacts:

Economic thesis templates (pre-populated with agent names)
12-month forecast model templates
Financial controls specification templates
Example dashboards from similar engagements

Preparation Checklist

Gather Pillar 1-4 outputs — Understand each agent's purpose, data requirements, governance constraints, and interaction patterns
Research current pricing — Model pricing for AWS Bedrock, Azure OpenAI, Anthropic, or planned integrations
Prepare economic templates — Economic thesis, forecast, and controls templates ready for population
Brief Finance stakeholders — Ensure CFO/Finance lead understands token economics basics before session
Collect operational baselines — Current costs for processes agents will affect (labor hours, error rates, cycle times)
Review Pillar 4 usage projections — Interaction patterns and volume estimates inform cost modeling

Delivery Tips

Opening:

Lead with the CFO question: "What does it cost, and how do we know it's working?"
Establish that this is about accountability and visibility, not restriction
Briefly educate on token economics if Finance is unfamiliar (5-10 minutes max)

Managing the Room:

Keep agent owners honest about value projections—challenge optimism with "what if adoption is 50% of projections?"
Help Finance understand token economics without overwhelming technical detail
Ground abstract discussions in specific agent examples with real numbers
Watch for optimistic adoption curves—base forecasts on conservative scenarios
Ensure controls are operationally realistic, not just theoretically sound

Economic Thesis Development:

Force the sentence completion: "This agent exists to [reduce cost X / generate revenue Y] by [specific mechanism]"
If the sentence can't be completed clearly, the agent isn't ready for production

Controls Design:

Start with circuit breakers—what stops a runaway agent?
Then layer in budget caps and alerts
Ensure someone is named for each alert (not a role—a person)

Closing:

Confirm economic thesis for each agent—document in plain language
Get explicit CFO sign-off on forecasts and controls
Schedule monthly tracking reviews (put them on the calendar now)
Confirm dashboard requirements and who receives weekly summaries

Output Artifacts Checklist

By session end, confirm you have:

Economic thesis for each agent (bottom-line or top-line, specific mechanism)
12-month financial forecast with documented assumptions
Budget caps defined (daily, monthly, quarterly triggers)
Rate limits specified (requests/minute, concurrent sessions, tokens/request)
Circuit breakers defined (cost anomaly, error rate, loop detection)
Observability requirements (dashboards, alerts, reporting cadence)
Named recipients for alerts and reports
CFO sign-off documented
Monthly review cadence scheduled

When Variations Are Required

Extended session may be needed when:

Multiple agents with complex economics require individual deep-dives
Finance stakeholders are new to AI—add education component
Organization has existing FinOps practice that requires integration

Scope adjustments:

If no FinOps practice exists: add 30-45 minutes for framework introduction
If agents span multiple cost centers: may need separate forecasts by department
If pilot phase: focus on controls and observability; defer full forecast to post-pilot

Follow-up requirements:

Partner integration (Jellyfish, Pay-i) may require separate technical session
Dashboard implementation needs handoff to IT/Operations
Monthly reviews should have standing calendar invites before session ends

Pricing and Positioning

Scope Options

Scope	Duration	Description
Economics Workshop	1 week	Half-day session with documentation
Comprehensive FinOps	2-3 weeks	Full framework with partner integration
Ongoing FinOps	Retainer	Continuous economic management and optimization

Integration with Cloud Partners

Pillar 5 connects to cloud partner cost management:

AWS — Cost Explorer, Budgets, anomaly detection
Microsoft — Azure Cost Management, Power BI integration

Required Collateral

Pillar 5 Collateral Status

AI FinOps Workshop GuideCOMPLETE →
Economic Thesis TemplateTODO
Cost Forecast WorksheetTODO
Financial Controls Specification TemplateTODO
Observability Requirements TemplateTODO
FinOps Review ChecklistTODO
CFO Briefing Deck TemplateTODO

Reference Materials

NorthRidge Case Study: Pillar 5 — Story-based walkthrough
Pillar 4: AI Experience Design — Prerequisites for Pillar 5
The OAIO Pitch — How to earn commitment to the full engagement

External Resources

FinOps Foundation (finops.org):

FinOps Foundation — Industry body for cloud financial management (95,000+ community members, 93 Fortune 100 companies)
FinOps Framework — The Inform-Optimize-Operate methodology
FinOps for AI Working Group — Dedicated working group for AI cost management
FinOps for AI Overview — Comprehensive guidance for AI workloads
FinOps Principles — Core principles for cloud financial management
FinOps Domains — Key functional areas
FinOps Capabilities — Essential capabilities for FinOps practice
FinOps Maturity Model — Crawl-Walk-Run maturity assessment
FinOps Personas — Key stakeholder roles
FinOps Certification — Professional certification programs (62,000+ trained)
FOCUS Specification — Unifying specification for cloud billing data

Agentic AI Infrastructure Forum (aaif.io):

AAIF Foundation — Open foundation for agentic AI standards and interoperability
AAIF Members — Founding members include AWS, Anthropic, Google, Microsoft, OpenAI
Key Projects:
- Model Context Protocol (MCP) — Open protocol for LLM-to-tool integration
- goose — Extensible AI agent for code development
- AGENTS.md — Standard for AI agent instructions in repositories

Cloud Provider Pricing:

AWS Bedrock Pricing — Amazon foundation model costs
Azure OpenAI Service Pricing — Microsoft AI pricing
Google Vertex AI Pricing — Google Cloud AI costs

Model Provider Pricing:

Anthropic Claude Pricing — Claude model costs
OpenAI API Pricing — GPT model costs
Google Gemini Pricing — Gemini model costs

Static ContentServing from MDX file

Source: content/methodology/05-finops-economics-guide.mdx

Overview

Why Pillar 5 Matters

The Problem We're Solving

What Success Looks Like

What is AI FinOps?

How AI Cost Differs from Traditional IT

The OAIO FinOps Methodology

Economic Value Definition: Bottom Line vs. Top Line

Bottom Line Agents (Cost Reduction)

Top Line Agents (Revenue Generation)

Understanding AI Operating Costs

Key Terms

Token Economics

Agent Unit Economics

Agent Cost Patterns

The Cost Revolution

The AI Cost Revolution

What This Means for AI Adoption

Self-Hosted vs. API: The Build vs. Buy Decision

Building the Financial Forecast

Forecast Components

Example Forecast

Financial Controls: Guardrails Before Runaway

Budget Controls

Rate Limiting

Circuit Breakers

Controlling Runaway Agents

Cost Optimization Strategies

Prompt Caching for Agents

Model Selection

Context Window Management

Financial Observability

Dashboard Requirements

Alerting Configuration

Reporting Cadence

The FinOps Framework Phases

1. Inform

2. Optimize

3. Operate

Session Structure

Required Personas

Session Agenda (Half-Day)

Partner Ecosystem for AI FinOps

Jellyfish (jellyfish.co)

Pay-i (pay-i.com)

Outputs and Handoffs

Pillar 5 Deliverables

Handoff to Implementation

Delivery Timeline

Delivery Timeline

Common Pitfalls

Facilitator Guidance

Mission & Charter

Session Inputs

Preparation Checklist

Delivery Tips

Output Artifacts Checklist

When Variations Are Required

Pricing and Positioning

Scope Options

Integration with Cloud Partners

Required Collateral

Reference Materials

Related Content

External Resources