Enterprise Cost Considerations · v1.1 · April 2026
secd3v Claude Code Service

Enterprise cost considerations for DevSecOps (Brownfield), Agentic (Greenfield), and Spec-Driven development with Claude Code and/or GitLab (via the secd3v GitLab MCP service). Covers model access policies, prompt caching, TTL selection, developer optimisation, and how spec-driven workflows modify token consumption, session shape, and model allocation within the Agentic pattern. Scoped for logically airgapped secd3v deployments via Bedrock ap-southeast-2 (Sydney) and ap-southeast-4 (Melbourne) regional endpoints.

ModelsHaiku 4.5 · Sonnet 4.6 · Opus 4.6
Regionap-southeast-2 (Sydney) · ap-southeast-4 (Melbourne)
EndpointRegional +10%
CurrencyUSD
ValidatedAnthropic Claude Code Docs
Section 01

Executive Summary

$87 DevSecOps heavy dev
std policy / month
$178 Agentic heavy dev
std policy / month
~$155 Agentic spec-driven heavy
estimated / month
46–57% Cost reduction from
enabling prompt caching

For secd3v, Claude Code usage organises into two foundational patterns — DevSecOps (Brownfield) and Agentic (Greenfield) — with a third mode, Spec-Driven Development, that modifies how Agentic sessions run. All three operate across Haiku 4.5, Sonnet 4.6, and Opus 4.6 on AWS Bedrock AU regional endpoints. All costs include the 10% regional endpoint premium required for data sovereignty.

The analysis covers: (1) why the development pattern determines the dominant cost driver; (2) how spec-driven development (SDD) restructures Agentic session shape, token consumption, and model allocation; (3) the financial impact of restricting Opus access to senior developers — and why spec authoring creates a justified exception for standard developers; (4) how prompt caching reduces costs by 46–57%; and (5) Anthropic-validated developer behaviours that directly control per-user costs.

The biggest cost lever remains prompt caching, not model selection. Spec-driven development (SDD) adds a structural second lever: by front-loading reasoning into a stable cached specification, it reduces regular input tokens per execution turn by ~48% and eliminates the most expensive tail-cost scenario — wrong-direction agentic sessions. The 1-hour TTL becomes non-negotiable for spec-driven execution.
Section 02

Infrastructure & Pricing

secd3v uses only AWS Bedrock regional endpoints for data sovereignty compliance, pinning all inference to ap-southeast-2 (Sydney) and ap-southeast-4 (Melbourne) AU regions. AWS Bedrock charges per token on a pay-as-you-go basis, and the use of regional endpoints carrys a 10% premium over global pricing. For clarity all figures in this document use AU regional pricing (in USD).

Claude Haiku 4.5
Fast · Routine tasks · Sub-agents
Input$1.10 / MTok
Output$5.50 / MTok
Cache write 5-min$1.375 / MTok
Cache write 1-hr$2.20 / MTok
Cache read$0.11 / MTok
Base global $1.00/$5.00 · +10% regional
Claude Sonnet 4.6
Balanced · Primary workhorse · Code review
Input$3.30 / MTok
Output$16.50 / MTok
Cache write 5-min$4.125 / MTok
Cache write 1-hr$6.60 / MTok
Cache read$0.33 / MTok
Base global $3.00/$15.00 · +10% regional
Claude Opus 4.6
Maximum intelligence · Senior-approved · Spec-write exception
Input$5.50 / MTok
Output$27.50 / MTok
Cache write 5-min$6.875 / MTok
Cache write 1-hr$11.00 / MTok
Cache read$0.55 / MTok
Base global $5.00/$25.00 · +10% regional
Cache multipliers (all models): 5-min write = 1.25× base input · 1-hr write = 2.0× base input · Cache read = 0.1× base input. 1-hr TTL available on Bedrock for Haiku 4.5, Sonnet 4.6, and Opus 4.6. Default if no TTL specified is 5 minutes. Minimum 1,024 tokens per cache checkpoint; up to 4 checkpoints per request.

Tool overhead: Claude Code's built-in tool suite (Bash, Read, Write, Glob, Grep, Task, TodoWrite, WebFetch) adds approximately 16,600 tokens to every session's system prompt. This is included in all context size figures throughout.
Section 03

Development Patterns & Session Definitions

Why DevSecOps Is a Brownfield Pattern

Brownfield development refers to working on an existing production system, which has live users, an established architecture, years of accumulated code, technical debt, and security constraints actively being enforced. Unlike greenfield work, one cannot start fresh, cannot discard wrong implementations without consequence, and cannot grant broad autonomous permissions without risk. Every change to a brownfield system affects real users and real infrastructure. In government, defence and high compliance contexts, this describes the overwhelming majority of day-to-day development work.

DevSecOps is not merely a methodology that happens to be applied to brownfield systems — it is the direct, necessary response to the constraints imposed by brownfield development. When a codebase has an existing security posture that cannot be accidentally degraded, existing patterns that must be understood before being changed, and a production environment where mistakes have immediate consequences, the human-gated, incremental, review-at-every-step workflow of DevSecOps becomes mandatory rather than optional discipline.

This shapes how Claude Code is used. The developer cannot say "explore this codebase and make the changes you think are needed" — they must say "look at auth.py lines 42–89 and identify any SQL injection risk." They cannot grant Claude Code broad file-write permissions — they must approve each individual change. Sessions are short, targeted, and bounded because the work itself demands precision over autonomy. The brownfield constraint and the DevSecOps pattern are two sides of the same coin.

Why Agentic Is Primarily the Greenfield Pattern

Greenfield development focusses on building a new service, a new module, a new application, where there are no live users to disrupt, no established architecture to accidentally break, and no accumulated constraints that an AI might misunderstand. A wrong-direction implementation can simply be discarded; the cost of an incorrect autonomous attempt is low.

This is where the Agentic development pattern is most appropriate. Anthropic describes Claude Code as an environment where "Claude explores, plans, and implements" — it reads many files, builds its own context map of the codebase, and works autonomously over long sessions. This works well precisely because Greenfield work has a low-consequence failure mode.

Agentic development can also be used for specific Brownfield scenarios — large-scale migration sprints, comprehensive test generation, or automated documentation across an existing codebase — provided the developer uses Plan Mode before any execution, works in a Git worktree for isolation, and treats every checkpoint as a safety gate. In these cases, the automation is targeted and reversible. Agentic development is not appropriate for routine brownfield maintenance, security operations, or any work where an incorrect autonomous change has immediate production consequences. The session risk profile, not the label alone, determines whether agentic development is appropriate.

Spec-Driven Development (SDD) — a Structured Variant of the Agentic Pattern

Spec-driven development (SDD) is a workflow overlay that modifies how Agentic sessions run. The developer authors a structured specification: interface contracts, data shapes, acceptance criteria, file layout, security constraints, and test coverage requirements, and then executes against the spec rather than exploring freely.

The cost effect is front-loading: reasoning cost is concentrated at the start of the workflow in a short spec-writing session, and all subsequent execution sessions become cheaper and more predictable. The spec functions as a persistent, cross-session Plan Mode — wrong-direction errors surface at spec-review stage (~500 tokens to correct) rather than after 20 agentic turns (tens of thousands of tokens wasted). The dominant cost driver in standard agentic — regular input from file exploration and history growth — is cut by approximately 48% in spec-driven execution.

SDD is critical for high-compliance, government, and defence sectors because it fundamentally inverts the relationship between intent and implementation, establishing the specification as the authoritative source of truth and treating code as a derivative artefact. With Agentic development, SDD serves as the essential governance bridge that prevents "vibe coding"—the reliance on loose, non-deterministic prompts—by providing autonomous agents like Claude Code with unambiguous, executable contracts that ground their reasoning in architectural and security constraints. This methodology naturally facilitates the rigorous traceability required by the Australian Information Security Manual (ISM), ISO 27001 and other government and defence compliance standards, ensuring that every code change is validated against documented requirements through automated validation gates in CI/CD pipelines. By shifting from reactive verification to proactive governance, SDD ensures that security corrections propagate across future regeneration cycles, thereby mitigating architectural drift and preserving individual human accountability for AI-enabled outcomes in mission-critical environments.

Phase 1 · Spec Writing
Author before executing
Interface contracts, acceptance criteria, file layout, security constraints, test scope. Quality here has compounding leverage — the spec directs all downstream execution.
Opus — justified, all tiers
8–15 turns · 15–30 min
Regular input: ~4,000 tok/turn
Output: 1,500–2,500 token spec
Est. cost: ~$8–15 / spec session
Phase 2 · Spec Execution
Implement against spec
Spec replaces file-discovery turns. Regular input ~48% lower than standard agentic. Do not /clear between Phase 1 and 2 — the spec is the context that must persist.
Sonnet — primary Haiku — simple impl turns
40–65 turns · up to 4 hrs
Regular input: ~13,000 tok/turn
1-hr TTL mandatory · /compact at 80%
Spec cached at 0.1× cost from turn 2
Phase 3 · Conformance Review
Check output vs spec
Pattern matching against spec criteria — does implementation satisfy contracts, acceptance criteria, and security constraints? Haiku-eligible. Failures return to Phase 2 with specific deviation notes.
Haiku — primary Sonnet — edge cases
5–10 turns · 15–30 min
Regular input: ~3,000 tok/turn
Est. cost: ~$1.50–3 / review session

Pattern Contrast

DevSecOps (Brownfield)
  • Existing production system — live users, established architecture, active security constraints, technical debt
  • 8–11 sessions/day, 5–45 min each — short, targeted, bounded by task scope
  • Developer provides explicit file refs, specific line ranges — Claude Code doesn't explore freely
  • Human approves every response before Claude Code proceeds — accountability is non-negotiable
  • 5–15 API turns per session · avg 2,500–9,000 tokens regular input/turn
  • Surgical output: patches, analysis, test stubs — 400–1,000 tokens/turn
  • Plan Mode before any multi-file change — mandatory practice
  • Cache writes amortise poorly per session — cross-session sharing is the key optimisation
  • Haiku suitable for ~25–30% of interactions
Agentic (Greenfield) — Standard & Spec-Driven Variant
  • New codebase under construction — no live users, no constraints to violate, wrong implementations are cheap to discard
  • Standard: 1–2 long sessions/day · Spec-driven: Phase 1 (30 min) + Phase 2 (up to 4 hrs) + Phase 3 (30 min)
  • Standard: Claude Code explores freely — 25,000 tok avg regular input/turn
  • Spec-driven: Claude executes against spec — ~13,000 tok avg regular input/turn (−48%)
  • 50–65 API Turns/heavy session · auto-compaction at ~80% context fill
  • Spec-driven: single Opus spec session amortised across 40–65 Sonnet execution turns
  • Cache writes amortise very well — spec-driven improves cache hit rate further (spec = stable fixed context)
  • Standard: Haiku ~10–15% · Spec-driven: Haiku rises to ~20% (Phase 3 conformance eligible)

Session Type Definitions — DevSecOps

Micro
5 turns · 5–10 min
Doc update, syntax check, single-function review, pipeline triage, quick explanation
API turns5
Avg regular input/turn2,500 tok
Avg output/turn400 tok
Expected 5-min re-writes0.5
Standard
9 turns · 15–25 min
Code review, targeted bug fix, test generation, single-file security check, MR feedback
API turns9
Avg regular input/turn5,000 tok
Avg output/turn700 tok
Expected 5-min re-writes1.2
Extended
15 turns · 30–45 min
Multi-file security audit, SAST triage, compliance check, refactoring plan + execute
API turns15
Avg regular input/turn9,000 tok
Avg output/turn1,000 tok
Expected 5-min re-writes2.8

Session Type Definitions — Agentic

Light
5 × 10-turn sessions · 10–20 min each
Small feature additions, single-module builds, code explanations, focused bug fixes
API turns/day50 (5 sessions)
Avg regular input/turn4,000 tok
Avg output/turn600 tok
Expected 5-min re-writes1.0 / session
Medium
2 × 25-turn sessions · 30–60 min each
Feature implementation, module construction, multi-file build, MR creation
API turns/day50 (2 sessions)
Avg regular input/turn13,000 tok
Avg output/turn800 tok
Expected 5-min re-writes6.0 / session
Heavy
1 × 65-turn + 1 × 25-turn session
New service construction, greenfield architecture, large autonomous implementation from spec
API turns/day90 (2 sessions)
Avg regular input/turn25,000 tok
Avg output/turn1,100 tok
Expected 5-min re-writes22 (65Turn) / 6 (25Turn)

Session Type Definitions — Agentic (Spec-Driven Variant) Spec-Driven

Spec-driven sessions replace the open-ended exploration of standard agentic with a three-phase lifecycle. The session cards below represent Phase 1, 2, and 3 respectively — each with a distinct turn count, token profile, and primary model.

Spec Write Phase 1
8–15 turns · 15–30 min · Opus primary
Author spec: interface contracts, data shapes, acceptance criteria, file layout, security constraints, test coverage scope
API turns8–15
Avg regular input/turn4,000 tok
Avg output/turn800 tok (spec body)
Spec output size1,500–2,500 tok
Est. session cost~$8–15 (Opus)
Spec Execution Phase 2
40–65 turns · up to 4 hrs · Sonnet primary
Implement against spec. Do not /clear between Phase 1 and 2. /compact at 80% context fill.
API turns/day40–65
Avg regular input/turn~13,000 tok (−48%)
Avg output/turn1,000 tok
Spec cached at0.1× from turn 2
TTL requirement1-hr mandatory
Conformance Review Phase 3
5–10 turns · 15–30 min · Haiku primary
Verify output against spec criteria. Pattern matching — not novel reasoning. Failures return to Phase 2 with specific deviation notes.
API turns5–10
Avg regular input/turn~3,000 tok
Primary modelHaiku (pattern matching)
Est. session cost~$1.50–3 (Haiku)

Daily Usage Tier Definitions

TierSession Mix / DaySessions TurnsDeveloper Profile
DevSecOps
Light5 micro + 3 standard852Part-time AI assistance; quick reviews and consultations
Medium3 micro + 4 standard + 1 extended866Active developer; daily code review, security checks, bug fixes
Heavy5 micro + 4 standard + 2 extended1191Lead developer / security engineer; audits, MR reviews, compliance
Agentic — Standard
Light5 × light sessions550Light autonomous tasks; feature additions, focused builds
Medium2 × medium sessions250Active builder; feature implementation, module construction
Heavy1 × heavy + 1 × medium session290Power developer; new service construction, long autonomous sessions
Agentic — Spec-Driven Variant (per sprint day)
Spec Day1 × spec-write + 1 × execution start2~25Phase 1 + Phase 2 kickoff; spec authored, execution begins
Exec Day1–2 × execution sessions1–265–90Phase 2 sustained; building against spec, /compact as needed
Review Day1 × conformance + 1 × correction2~20Phase 3 review; deviation notes → Phase 2 correction turn
Section 04

Token Modelling & Fixed Context

Claude Code's context is cumulative — every API call processes the full conversation history to date. The fixed system context (tool definitions, system prompt, project memory) is the prime candidate for prompt caching: written once per session and read cheaply on every subsequent turn.

Fixed Cached Context Per Session

Use Case A — Standalone (no GitLab MCP)
System prompt (agent instructions)3,900 tok
Built-in tool definitions (Bash, Read, Write, Glob…)16,600 tok
CLAUDE.md + memory files1,500 tok
Total fixed cached context22,000 tokens
Use Case B — With GitLab MCP
All standalone context22,000 tok
GitLab MCP tool names (deferred by default)~500 tok
GitLab tool schemas (loaded on demand)~8,500 tok
Max fixed cached context31,000 tokens
With MCP tool search (default): only names load upfront; schemas load on demand. Cost model assumes eager-load scenario for conservative estimates.
MCP Tool Search changes GitLab overhead. Anthropic's Claude Code now defers MCP tool definitions by default — only tool names load at session start (~500 tokens for GitLab MCP). Full schemas (~8,500 tokens) load on demand when Claude needs a specific tool. This means the 9,000-token GitLab MCP overhead is now largely avoided unless you deliberately disable tool search. The cost model uses the full 31,000-token figure as a conservative worst case. In practice, the actual GitLab overhead is closer to 500–3,000 tokens depending on which tools are used.

Daily Token Volumes

↔ scroll if needed
PatternUse CaseTier Cache WritesCache Reads Regular InputOutput
DevSecOps / Brownfield — daily token totals
DevSecOpsStandalone Light187,000968,000174,00028,900
Medium192,0001,276,000320,00046,200
Heavy265,0001,760,000467,50065,200
DevSecOps+ GitLab MCP Light259,0001,364,000226,50033,400
Medium264,0001,798,000393,30053,100
Heavy364,0002,480,000569,60074,700
Agentic / Greenfield — daily token totals
AgenticStandalone Light120,000990,000185,00030,000
Medium64,0001,056,000626,00040,000
Heavy79,0001,936,0001,914,00091,500
Agentic+ GitLab MCP Light165,0001,395,000207,50030,000
Medium82,0001,488,000661,00043,000
Heavy97,0002,728,0001,971,50096,000
All figures use 1-hr TTL caching (recommended default). Cache writes include fixed system context plus incremental session context. Cache reads are fixed context re-reads on turns 2–N. Regular input is non-cached conversation context (messages, file contents, tool outputs). Agentic heavy regular input is high because growing conversation history in long autonomous sessions is not fully cached — context window management is the primary cost driver.
Section 05

Model Access Policy & Recommended Splits

It is recommended that organisations restrict Opus 4.6 access due to cost (5× Sonnet input). This section defines two developer tiers and the recommended model split for each. Under standard agentic, the split is fixed by role. Under spec-driven agentic, the split becomes phase-aware — the optimal model depends on which phase the developer is in, not just their tier.

Standard Developer No Opus
DevSecOps: 30% Haiku / 70% Sonnet / 0% Opus
Agentic (standard): 15% Haiku / 85% Sonnet / 0% Opus
Agentic (spec-driven): 20% Haiku / 72% Sonnet / 8% Opus
  • Applies to the majority of an engineering organisation
  • DevSecOps Haiku share unchanged — doc updates, pipeline triage, syntax checks are genuinely Haiku-suitable
  • Spec-driven opens Opus access for standard developers at Phase 1 (spec authoring) — see amortisation note below
  • Spec-driven Haiku share rises to 20% — Phase 3 conformance review is pattern matching, not reasoning
  • Quality impact minimal on SWE-bench standard tasks — Sonnet 79.6% vs Opus 80.8%
Senior / Approved Limited Opus
DevSecOps: 25% Haiku / 65% Sonnet / 10% Opus
Agentic (standard): 10% Haiku / 75% Sonnet / 15% Opus
Agentic (spec-driven): 18% Haiku / 67% Sonnet / 15% Opus
  • Applies to tech leads, security engineers, principal developers
  • Spec-driven: Opus percentage unchanged — but front-loaded into Phase 1 rather than spread across all turns
  • This is more efficient: Opus reasoning concentrated where it has maximum downstream leverage
  • Haiku rises from 10% to 18% — Phase 3 conformance sessions are Haiku-eligible
  • Service layer enforces model access via role-based routing; Phase 1 spec sessions need explicit Opus permit
Why spec-driven opens Opus access for standard developers: The cost model normally gates Opus by role. Spec authoring creates a justification that bypasses this: a single Opus session (~$8–15) producing a tight 1,500–2,500 token specification amortises across 40–65 Sonnet execution turns. The Opus overhead is recovered within the first execution session through reduced file-exploration turns and eliminated wrong-direction corrections. The service layer should implement a "spec-write" session profile that permits Opus and enforces Sonnet-only for subsequent sessions in the same sprint.
Why the Haiku split differs between patterns: In DevSecOps, a substantial fraction of daily interactions — documentation strings, pipeline status checks, dependency lookups, boilerplate generation — genuinely don't require Sonnet-level intelligence. In standard Agentic sessions, even "simple" tasks involve sustained multi-turn reasoning where Haiku creates quality drag and more wrong-direction attempts. In spec-driven Agentic, Phase 3 conformance review is pattern matching against defined criteria — Haiku handles it at 3× lower cost without quality loss.

Split Bar Comparison — Agentic Standard vs Spec-Driven

Standard developer — Agentic standard: 15% Haiku / 85% Sonnet / 0% Opus
Standard developer — Agentic spec-driven (blended): 20% Haiku / 72% Sonnet / 8% Opus
Senior developer — Agentic standard: 10% Haiku / 75% Sonnet / 15% Opus (distributed)
Senior developer — Agentic spec-driven (blended): 18% Haiku / 67% Sonnet / 15% Opus (front-loaded)
Spec-driven blended splits are weighted averages across all three phases at heavy usage (1 spec session per 5 execution sessions). Phase 2 Haiku share ≈20% for simple implementation turns; Phase 3 is 70% Haiku. Opus is Phase 1 only. Actual splits vary by sprint cadence.
Section 06

Complete Cost Reference

The following constraints are used for the complete cost reference: Cost monthly per developer (USD), 22 working days per month, 1-hour TTL caching, Bedrock AU regional pricing (+10%). Standard = no-Opus split. Senior = limited Opus split. Per-model pure costs shown only for building custom blends.

DevSecOps (Brownfield) Use Case A: Standalone

Policy / ModelSplit Light / moMedium / moHeavy / mo
Standard developer 30%70% $45.84$61.72$87.09
Senior (limited Opus) 25%65%10% $51.57$69.43$97.98
Saving: Standard vs Senior −$5.73 (11%)−$7.71 (11%)−$10.89 (11%)
Per-model pure reference
Haiku 4.5 $19.10$25.71$36.29
Sonnet 4.6 $57.30$77.14$108.86
Opus 4.6 $95.51$128.57$181.44

DevSecOps (Brownfield) Use Case B: + GitLab MCP

Policy / ModelSplit Light / moMedium / moHeavy / mo
Standard developer 30%70% $60.86$79.37$111.46
Senior (limited Opus) 25%65%10% $68.47$89.29$125.39
Saving: Standard vs Senior −$7.61 (11%)−$9.92 (11%)−$13.93 (11%)
Per-model pure reference
Haiku 4.5 $25.36$33.07$46.44
Sonnet 4.6 $76.08$99.22$139.33
Opus 4.6 $126.80$165.36$232.21

Agentic (Greenfield) Use Case A: Standalone

Policy / ModelSplit Light / moMedium / moHeavy / mo
Standard developer 15%85% $44.04$69.23$177.93
Standard developer — spec-driven est. 20%72%8% ~$40~$60~$155
Saving: Standard vs Spec-Driven (est.) ~$4 (9%)~$9 (13%)~$23 (13%)
Senior (limited Opus) 10%75%15% $50.56$79.49$204.29
Senior — spec-driven est. 18%67%15% ~$46~$69~$178
Saving: Standard vs Senior (role restriction) −$6.52 (13%)−$10.26 (13%)−$26.36 (13%)
Per-model pure reference
Haiku 4.5 $16.31$25.64$65.90
Sonnet 4.6 $48.93$76.93$197.70
Opus 4.6 $81.55$128.21$329.50

Agentic (Greenfield) Use Case B: + GitLab MCP

Policy / ModelSplit Light / moMedium / moHeavy / mo
Standard developer 15%85% $54.04$77.68$190.68
Stardard developer — spec-driven est. 20%72%8% ~$49~$68~$167
Saving: Standard vs Spec-Driven (est.) ~$5 (9%)~$10 (13%)~$24 (13%)
Senior (limited Opus) 10%75%15% $62.04$89.18$218.93
Saving: Standard vs Senior (role restriction) −$8.00 (13%)−$11.50 (13%)−$28.25 (13%)
Per-model pure reference
Haiku 4.5 $20.01$28.77$70.62
Sonnet 4.6 $60.04$86.31$211.87
Opus 4.6 $100.07$143.84$353.11

Pattern Comparison — Standard Developer Policy

PatternUse Case Light / moMedium / moHeavy / mo Key Cost Driver at Heavy
DevSecOpsStandalone $45.84$61.72$87.09 Small regular input (467k tok/day) dominates
AgenticStandalone $44.04$69.23$177.93 4.1× more regular input (1,914k tok/day)
Spec-DrivenStandalone est. ~$40~$60~$155 ~48% lower regular input via spec; Phase 3 on Haiku
DevSecOps+ GitLab MCP $60.86$79.37$111.46 MCP adds +28% for light users, +7% for heavy
Agentic+ GitLab MCP $54.04$77.68$190.68 Heavy agentic 72% more expensive than DevSecOps heavy
Spec-Driven+ GitLab MCP est. ~$49~$68~$167 Spec-driven narrows gap vs DevSecOps to ~50% at heavy
Blended costs = weighted average of per-model monthly costs using stated split. All computed at token level (cache writes at 2.0× input, reads at 0.1×, regular input at 1.0×, output at 5.0× ratio), summed over 22 working days. Spec-driven estimates include Phase 1 Opus session overhead (1 spec per 5 execution sessions) and Phase 3 Haiku conformance sessions. Spec-driven regular input estimated at 13,000 tok/turn (vs 25,000 standard). Marked est. — actual results depend on spec quality and sprint cadence.
Section 07

Prompt Caching & TTL Analysis Cache & TTL

Prompt caching is the single largest cost lever available — more impactful than any other optimisation. It is enabled in secd3v by default and without it, the 22,000-token system context is billed as full-price regular input on every API call. With prompt caching, a prompt is written once per session and read at 10% of input price on every subsequent turn. The Claude Code CLI application applies prompt caching automatically.

Three Billing Scenarios (Sonnet 4.6 regional, 22k context, per turn)

No caching: 22,000 × $3.30/MTok = $0.073 per turn. At 91 turns/day (DevSecOps heavy), that's $6.63/day in context cost alone — before any conversation input or output.
5-min TTL: Write once at 1.25× ($4.125/MTok). Reads at 0.1× ($0.33/MTok). If a developer pauses >5 min between turns, the cache expires and must be re-written at 1.25×. Each re-write costs $0.091 (22k context) or $0.128 (31k GitLab MCP).
1-hr TTL (secd3v default): Write once at 2.0× ($6.60/MTok). Reads at 0.1×. TTL resets on every cache hit — active sessions stay warm automatically. One write covers the full session regardless of developer pauses.

Per-Session Cache Costs

Session TypeTurns No Cache5-min TTL1-hr TTL 1-hr saves vs 5-minRe-writes (5-min)
DevSecOps sessions — 22k context, Sonnet 4.6 regional
Micro5 $0.363$0.162$0.174 −$0.013 (5-min wins)0.5
Standard9 $0.653$0.249$0.203 +$0.0461.2
Extended15 $1.089$0.426$0.247 +$0.1792.8
Agentic sessions — 22k context, Sonnet 4.6 regional
Light10 $0.726$0.240$0.210 +$0.0291.0
Medium25 $1.815$0.766$0.319 +$0.4476.0
Heavy65 $4.719$2.392$0.610 +$1.78222.0
Micro sessions slightly favour 5-min TTL. With only 0.5 expected re-writes, the lower write cost (1.25×) outweighs the re-write risk for the shortest DevSecOps sessions — saving $0.013/session. Monthly impact: $0.02 for DevSecOps light users. For all sessions of 9+ turns, 1-hr TTL is cheaper.

Break-Even

Extra cost of 1-hr write vs 5-min write per session: 22k (Standalone): 22,000 × ($6.60 − $4.125) / MTok = $0.054 31k (GitLab MCP): 31,000 × ($6.60 − $4.125) / MTok = $0.077 Cost of one unexpected re-write at 5-min TTL: 22k context: 22,000 × $4.125 / MTok = $0.091 per event 31k context: 31,000 × $4.125 / MTok = $0.128 per event Break-even re-writes to avoid per session: $0.054 / $0.091 = 0.60 re-writes — identical ratio for both context sizes CONCLUSION: For any interactive session where a developer is likely to pause >5 min at least once every two sessions, 1-hr TTL is cheaper. This covers every DevSecOps standard/extended session and every Agentic session.

Monthly Cost by Caching Scenario — All Patterns (Standard Developer Policy with Sonnet 4.6)

PatternUse CaseTier No Cache / mo 5-min TTL / mo 1-hr TTL / mo ✓ Saved (no→1hr) Extra (5min→1hr)
DevSecOps
DevSecOpsStandalone Light$106.16$57.31$57.29$48.87 (46%)$0.02
Medium$145.41$81.94$77.13$68.28 (47%)$4.81 (6%)
Heavy$202.97$116.06$108.88$94.09 (46%)$7.18 (6%)
DevSecOps+ GitLab MCP Light$145.58$76.74$76.05$69.53 (48%)$0.69 (1%)
Medium$196.35$106.92$99.20$97.15 (49%)$7.72 (7%)
Heavy$273.27$150.81$139.33$133.94 (49%)$11.48 (8%)
Agentic
AgenticStandalone Light$104.20$50.69$48.95$55.25 (53%)$1.74 (3%)
Medium$139.81$93.65$76.91$62.90 (45%)$16.74 (18%)
Heavy$315.91$241.64$197.69$118.22 (37%)$43.95 (18%)
Agentic+ GitLab MCP Light$138.48$63.09$60.04$78.44 (57%)$3.05 (5%)
Medium$176.13$111.08$86.31$89.82 (51%)$24.77 (22%)
Heavy$380.52$275.87$211.86$168.66 (44%)$64.01 (23%)
No-cache baseline reprices all cached context as regular input at full rate every turn. 5-min TTL models expected re-writes per session type; actual rates vary by developer behaviour. Savings scale with model: Haiku saves 3× less in absolute terms; Opus 1.67× more. All Sonnet 4.6 regional.

Key Insights

46–57% Cost reduction from enabling
caching vs no-cache baseline
$44–64 Monthly saving: 1-hr over
5-min TTL (heavy agentic)
$0–11 Monthly saving: 1-hr over
5-min TTL (DevSecOps any tier)
TTL choice matters far more for Agentic than DevSecOps. For DevSecOps (short sessions), the 5-min vs 1-hr difference is at most $11.48/month. For heavy agentic users, 1-hr TTL saves $43.95–$64.01/month — equivalent to 18–23% of total monthly cost. Heavy agentic sessions accumulate 22 expected re-writes per session (35% risk × 64 inter-turn gaps for test runs, build waits, and review). 5-min TTL nearly quadruples the cache cost on those sessions.

TTL Selection Guide

5-Minute TTL — Use When
  • Fully automated CI/CD pipelines with no human in the loop
  • Sequential scripted invocations (claude -p) with <5 min gaps
  • Pre-commit hook automation running in-process
  • DevSecOps micro sessions only (marginal saving, $0.02/month)
1-Hour TTL — Default for Everything Else
  • All interactive DevSecOps sessions (standard and extended)
  • All agentic sessions regardless of length
  • All spec-driven Phase 2 execution sessions — non-negotiable. The spec must survive as warm cached context across the full execution session without re-write cost
  • Any session involving human review, test execution, or build waits
  • GitLab MCP workflows where CI pipeline waits create natural pauses
  • Cross-session DevSecOps blocks on the same project within an hour
Section 08

Developer Cost Optimisation Factors

After model policy and caching configuration, cost is controlled by developer behaviour. Each factor below is validated against Anthropic's Claude Code best practices documentation. Spec-driven development (SDD) is included as a first-class optimisation factor — it addresses the dominant agentic cost driver (regular input from file exploration and wrong-direction turns) at the workflow level rather than the session level.

🎯
Prompt Specificity & Context Front-Loading
Highest impact · All patterns
"The more precise your instructions, the fewer corrections you'll need. Reference specific files, mention constraints, and point to example patterns." A prompt like "review @auth.py lines 42–89 for SQL injection — here is the schema" costs 3–5× less than "check my auth code for security issues" because Claude doesn't explore files to find context it was never given.

In DevSecOps, specificity keeps sessions within their session-type scope and prevents drift from Micro into Extended territory. In agentic, vague prompts trigger broad file scanning — each file read compounds into every subsequent turn. In spec-driven execution, the spec itself is the specificity mechanism — but the execution prompt must still reference specific spec sections, not leave Claude to interpret the full spec freely.
DevSecOpsAgentic · 3–5× regular input reduction possible
📋
Plan Mode Before Execution
Highest impact · Agentic
"Claude reads files and answers questions without making changes." Enter Plan Mode by prefixing your prompt with /plan or pressing Shift+Tab. Review the plan file — Claude writes it to your project. Switch back to Normal Mode to execute.

Plan Mode is most useful when you're uncertain about approach, when the change touches multiple files, or when you're unfamiliar with the code. Skip Plan Mode for small, clearly scoped tasks — "if you could describe the diff in one sentence, skip the plan." Wrong-direction correction at the plan stage costs ~500 tokens; correction after 20 turns of wrong implementation costs tens of thousands.

For spec-driven development, the spec is inter-session Plan Mode — it surfaces wrong directions before execution begins. /plan is still valuable within execution sessions for multi-file changes inside the spec scope.
Agentic· prevents expensive wrong-direction sessions · DevSecOps· mandatory for multi-file brownfield changes
📐
Spec-Driven Development — Phase 1 Spec Authoring
Highest impact · Agentic only · ~13% monthly saving at heavy usage
Author a structured specification before any agentic execution begins. The spec defines interface contracts, data shapes, acceptance criteria, file layout, security constraints, and test coverage requirements. Claude Code then executes against the spec rather than discovering scope through open-ended file exploration — the dominant cost driver in standard agentic sessions.

The cost effect: standard agentic heavy sessions average 25,000 tokens regular input per turn from file exploration and history growth. Spec-driven execution sessions average ~13,000 tokens per turn — a 48% reduction — because the spec replaces the file-discovery phase. Wrong-direction turns drop from 4–8 per session to 0–1, eliminating the most expensive tail-cost scenario. Phase 3 conformance review sessions are Haiku-eligible (pattern matching, not reasoning), further reducing the blended model cost.

CLAUDE.md rule for spec-driven projects: the spec belongs in a separate file referenced at session start — not embedded in CLAUDE.md. A 600-line spec in CLAUDE.md adds ~9,000 tokens of cache-read cost per call with no benefit over a tight 150-line CLAUDE.md pointing to the spec file. Target CLAUDE.md under 200 lines regardless of spec-driven status.

Session hygiene rule: do not /clear between Phase 1 (spec write) and Phase 2 (execution) — the spec is the context that must persist. /clear only between unrelated task types. /compact at 80% context fill during Phase 2 execution. Phase 3 conformance sessions can be started fresh; they reference the spec directly.
Agentic · Phase 1 Opus justified all tiers · Phase 2 Sonnet/Haiku · Phase 3 Haiku · ~48% regular input reduction · 1-hr TTL non-negotiable
🧹
Session Hygiene — /clear and /compact
High impact · DevSecOps especially
"Run /clear between unrelated tasks to reset context. If you've corrected Claude more than twice on the same issue in one session, the context is cluttered with failed approaches. Run /clear and start fresh." A clean session with a better prompt almost always outperforms a long session with accumulated corrections. Use /rename before clearing to preserve session identity for later.

For long agentic sessions, /compact summarises conversation history rather than clearing it. You can focus compaction: /compact focus on the API changes. Claude Code also auto-compacts when approaching context limits. Put persistent rules in CLAUDE.md rather than relying on conversation history.

Spec-driven rule: do not /clear between Phase 1 (spec write) and Phase 2 (execution). The spec must remain in context. /compact at 80% during Phase 2. Phase 3 conformance sessions can start fresh — they reference the spec file directly.
DevSecOps· /clear between every task · Agentic· /compact at ~80% · spec-driven: no /clear between Ph1→Ph2
🔑
Model Selection — Haiku First, Effort Levels
High impact · DevSecOps especially
Defaulting to Sonnet for everything is the most common unnecessary cost. In DevSecOps, documentation strings, pipeline triage, dependency lookups, simple formatting, and boilerplate scaffolding are genuinely Haiku-suitable at 3× lower cost.

Anthropic also introduces effort levels (/effort): low, medium, high, and max (Opus only). Medium is recommended for most coding tasks. For simple tasks, "you can reduce costs by lowering the effort level" — it controls adaptive reasoning depth, with lower effort being faster and cheaper. High effort and max provide deeper reasoning for complex problems but consume significantly more output tokens. Set effort per-task, not as a session default.
DevSecOps· 25–30% Haiku achievable · Agentic· 10–15% Haiku realistic · /effort for complex tasks · Spec-Driven· Phase 3 conformance is Haiku-primary (rises to ~20% blended)
🔍
Subagents for Research & Cost Routing
High impact · Agentic
"Delegate research with 'use subagents to investigate X'. They explore in a separate context, keeping your main conversation clean for implementation." When Claude researches a codebase it reads many files — all consuming your context. Subagents run in separate context windows and report back summaries, keeping the main session lean.

Subagents also serve as cost routers: Anthropic's subagent documentation explicitly notes you can "control costs by routing tasks to faster, cheaper models like Haiku." Configure Haiku subagents for file scanning, documentation lookup, and log analysis — they return summaries to a Sonnet main session without the main session paying full Sonnet input prices for verbose exploration results.
Agentic· context preservation + Haiku routing · DevSecOps· useful for large SAST result triage
🗂️
CLAUDE.md & .claudeignore Discipline
Medium impact · Both patterns
Anthropic recommends keeping CLAUDE.md specific and concise — "specific, concise, well-structured instructions work best." A bloated CLAUDE.md (500+ lines) loads into every session, consuming tokens on every turn. Move specialist content to Skills (which load on demand) and keep project-wide rules tight. CLAUDE.md can also include a "Compact Instructions" section to guide what gets preserved during /compact.

The .claudeignore file prevents Claude from accidentally reading node_modules, build artifacts, generated code, and binaries. A single accidental glob-all read of a large brownfield repository can consume 50,000–150,000 tokens in one API call — equivalent to a day's DevSecOps budget. Configure .claudeignore on day one of any brownfield project.

Spec-driven bloat risk — the highest-cost CLAUDE.md failure mode: developers may embed verbose spec prose into CLAUDE.md. A 600-line spec adds ~9,000 tokens of cache-read cost per call with no output benefit over a tight 150-line CLAUDE.md that references the spec as a separate file. The spec belongs as a project file loaded at Phase 2 session start — not in CLAUDE.md. Telemetry signal: CLAUDE.md above 3,000 tokens on a spec-driven project indicates this failure is active.
DevSecOpsAgentic · one bad file read can cost a day's budget
Cross-Session Cache Grouping
High structural impact · DevSecOps
With 1-hr TTL, the service layer can route same-developer, same-project sessions to share a Bedrock cache state within the hour. Sessions 2 and 3 on the same project read the cache (at 0.1× input) instead of re-writing it (at 2.0× input) — saving $0.054–$0.077 per session.

Developer practice: Group related micro and standard sessions into focused 30–60 minute blocks rather than scattering them. Keep CLAUDE.md stable during the block — changes invalidate the cache. The TTL resets on every cache hit, so continuous active sessions stay warm automatically. This cross-session sharing is the most impactful DevSecOps-specific optimisation and has no equivalent in agentic patterns.
DevSecOps· cross-session sharing saves $6–14/dev/month (Sonnet) · Agentic· within-session amortisation
🔌
MCP Tool Search — Default Deferred Loading
Changed behaviour — verify configuration
Anthropic has changed the default MCP behaviour: "Tool search keeps MCP context usage low by deferring tool definitions until Claude needs them. Only tool names load at session start, so adding more MCP servers has minimal impact on your context window." Tool search is enabled by default from recent Claude Code versions.

This significantly reduces the GitLab MCP overhead previously modelled as 9,000 tokens — with tool search active, only tool names (~500 tokens) load at session start; full schemas load on demand. If your deployment disables tool search (ENABLE_TOOL_SEARCH=0), you revert to eager loading of all tool definitions. Verify your Claude Code version and configuration — recent versions (v2.1.x+) default to deferred loading.
DevSecOpsAgentic · deferred by default on v2.1+ · verify with /mcp in session
👥
Agent Teams — Explicit Cost Governance
Critical for agentic power users
"Agent teams use approximately 7× more tokens than standard sessions when teammates run in Plan Mode, because each teammate maintains its own context window and runs as a separate Claude instance." A heavy agentic developer using Agent Teams moves from the $178/month model to potential $1,200+/month exposure.

Agent teams are disabled by default (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 to enable). They are appropriate for genuinely parallelisable work — independent feature branches, parallel test generation, concurrent module construction — not for tasks that can be done serially. Keep teams small; active teammates consume tokens even if idle. Your service-layer rate limits and budget caps are the primary governance mechanism.
Agentic · 7× cost multiplier · disabled by default · service-layer governance essential
🔒
Extended Thinking — Selective Use Only
High cost when unmanaged
Extended thinking tokens are billed as output tokens at standard output rates — the most expensive token category. Anthropic confirms it is enabled by default in Claude Code: "Extended thinking is enabled by default because it significantly improves performance on complex planning and reasoning tasks." Thinking tokens use the default budget unless overridden.

For simpler tasks, disable or reduce: /effort low reduces thinking depth; MAX_THINKING_TOKENS=8000 caps the budget; setting to 0 disables thinking entirely. At Sonnet regional rates, a 4,000-token thinking block adds $0.066/call. In a 25-turn medium agentic session this can add $1.65 to session cost at default settings — worth managing explicitly on routine work.
Opus / seniors· use /effort for high-complexity tasks · Sonnet· /effort low for routine DevSecOps tasks
📊
Token Telemetry — Measure First
Foundation for all other optimisation
The Claude Code services includes token telemetry to allow for cost optimisation. These are aggregated per user, per team, per model, and organisation wide.

Key telemetry signals — all patterns: Cache hit rate under 70% → session hygiene is poor; regular input per turn over 15,000 on DevSecOps → broad prompting or missing /clear; Haiku actual split below 15% on DevSecOps → model discipline not applied. The X-Claude-Code-Session-Id header added in v2.1.86+ lets proxies aggregate requests by session without parsing the body — enabling accurate per-session cost attribution.

Additional signals for spec-driven agentic: Regular input per turn above 18,000 during Phase 2 → spec not being referenced; Claude is still file-exploring. CLAUDE.md above 3,000 tokens on a spec-driven project → spec prose has been embedded in CLAUDE.md. Opus usage outside Phase 1 sessions → policy drift; service layer should enforce Sonnet-only post-spec. Phase 3 sessions using Sonnet for all turns → Haiku is sufficient for conformance checking. The 30-day telemetry review is the single most operationally valuable practice.
DevSecOpsAgentic · service-layer responsibility · 30-day review cadence
Section 09

Opus Approval Guide Senior Access

Opus 4.6 scores 80.8% on SWE-bench vs Sonnet 4.6 at 79.6% — a 1.2pp gap on standard coding tasks. The meaningful Opus advantage appears on ARC-AGI-2 novel reasoning (68.8% vs Sonnet 58.3%) and Terminal-Bench autonomous operation (65.4% vs 59.1%). The approval gate should reflect this: Opus is justified when the task requires novel reasoning under genuine ambiguity, sustained autonomous operation over many turns, or where downstream error costs are high and the 10-point reasoning gap materially changes outcomes.

Spec-driven development introduces a new Opus justification that applies to all developer tiers: spec authoring. A single Opus spec session (~$8–15) producing a tight 1,500–2,500 token specification amortises across 40–65 Sonnet execution turns. The Opus cost is recouped within the first execution session through reduced file-exploration turns and eliminated wrong-direction corrections. The service layer should implement a "spec-write" session profile that permits Opus and enforces Sonnet-only for subsequent sessions in the same sprint.

Task or Scenario
Pattern
Opus Justified?
Rationale
Spec authoring — greenfield service or controlled migration sprint
Agentic
Yes — all tiers
Phase 1 only. Single Opus session amortised across 40–65 Sonnet execution turns. ARC-AGI-2 advantage (68.8% vs 58.3%) most material for architectural scope and interface decisions. Standard developers eligible under spec-write session profile.
Security vulnerability exploit chain assessment — novel threat vectors
DevSecOps
Yes
ARC-AGI-2 gap (68.8% vs 58.3%) is most material for novel reasoning under genuine ambiguity
Compliance gap analysis — assessing system against complex regulatory controls
DevSecOps
Yes
Multi-control reasoning; compliance errors have significant downstream cost
Architectural design for new greenfield service from spec
Agentic
Yes
Single high-value planning session in Plan Mode prevents many expensive wrong-direction turns
Multi-service brownfield refactor with ambiguous legacy coupling
Agentic
Conditional
Use Sonnet first in Plan Mode; escalate to Opus only if plan proposals are inadequate twice
Security threat modelling — genuinely novel threat scenarios
DevSecOps
Conditional
Opus for novel scenarios; Sonnet handles known patterns (OWASP Top 10, CVE triage) well
Spec execution — implementing against an authored spec
Agentic
No
Phase 2. Spec provides the reasoning frame — Sonnet executes against it. Haiku eligible for bounded implementation turns within spec scope.
Standard MR code review (1–5 files)
DevSecOps
No
Sonnet 4.6 at 79.6% SWE-bench is indistinguishable from Opus (80.8%) on routine review
Feature implementation — well-defined greenfield module
Agentic
No
Well-defined implementation is Sonnet territory; Plan Mode compensates for Sonnet's narrower reasoning
Conformance review — verifying output against spec
Agentic
No
Phase 3. Pattern matching against defined spec criteria — Haiku-level task at 3× lower cost than Sonnet, 5× lower than Opus
Documentation, comments, type annotations
DevSecOps
No
Haiku-level task — Opus is 5× input cost for equivalent output quality
Pipeline failure triage and CI configuration
DevSecOps
No
Haiku is appropriate; this is pattern matching, not novel reasoning
Automated CI/CD pipeline tasks (no human in loop)
Agentic
No
No human to verify reasoning; Sonnet provides better cost-reliability tradeoff in automation