Claude Code Service Cost Considerations

Enterprise Cost Considerations · v1.1 · April 2026

secd3v Claude Code Service

Enterprise cost considerations for DevSecOps (Brownfield), Agentic (Greenfield), and Spec-Driven development with Claude Code and/or GitLab (via the secd3v GitLab MCP service). Covers model access policies, prompt caching, TTL selection, developer optimisation, and how spec-driven workflows modify token consumption, session shape, and model allocation within the Agentic pattern. Scoped for logically airgapped secd3v deployments via Bedrock ap-southeast-2 (Sydney) and ap-southeast-4 (Melbourne) regional endpoints.

ModelsHaiku 4.5 · Sonnet 4.6 · Opus 4.6

Regionap-southeast-2 (Sydney) · ap-southeast-4 (Melbourne)

EndpointRegional +10%

CurrencyUSD

ValidatedAnthropic Claude Code Docs

Section 01

Executive Summary

$87 DevSecOps heavy dev
std policy / month

$178 Agentic heavy dev
std policy / month

~$155 Agentic spec-driven heavy
estimated / month

46–57% Cost reduction from
enabling prompt caching

For secd3v, Claude Code usage organises into two foundational patterns — DevSecOps (Brownfield) and Agentic (Greenfield) — with a third mode, Spec-Driven Development, that modifies how Agentic sessions run. All three operate across Haiku 4.5, Sonnet 4.6, and Opus 4.6 on AWS Bedrock AU regional endpoints. All costs include the 10% regional endpoint premium required for data sovereignty.

The analysis covers: (1) why the development pattern determines the dominant cost driver; (2) how spec-driven development (SDD) restructures Agentic session shape, token consumption, and model allocation; (3) the financial impact of restricting Opus access to senior developers — and why spec authoring creates a justified exception for standard developers; (4) how prompt caching reduces costs by 46–57%; and (5) Anthropic-validated developer behaviours that directly control per-user costs.

The biggest cost lever remains prompt caching, not model selection. Spec-driven development (SDD) adds a structural second lever: by front-loading reasoning into a stable cached specification, it reduces regular input tokens per execution turn by ~48% and eliminates the most expensive tail-cost scenario — wrong-direction agentic sessions. The 1-hour TTL becomes non-negotiable for spec-driven execution.

Section 02

Infrastructure & Pricing

secd3v uses only AWS Bedrock regional endpoints for data sovereignty compliance, pinning all inference to ap-southeast-2 (Sydney) and ap-southeast-4 (Melbourne) AU regions. AWS Bedrock charges per token on a pay-as-you-go basis, and the use of regional endpoints carrys a 10% premium over global pricing. For clarity all figures in this document use AU regional pricing (in USD).

Claude Haiku 4.5

Fast · Routine tasks · Sub-agents

Input$1.10 / MTok

Output$5.50 / MTok

Cache write 5-min$1.375 / MTok

Cache write 1-hr$2.20 / MTok

Cache read$0.11 / MTok

Base global $1.00/$5.00 · +10% regional

Claude Sonnet 4.6

Balanced · Primary workhorse · Code review

Input$3.30 / MTok

Output$16.50 / MTok

Cache write 5-min$4.125 / MTok

Cache write 1-hr$6.60 / MTok

Cache read$0.33 / MTok

Base global $3.00/$15.00 · +10% regional

Claude Opus 4.6

Maximum intelligence · Senior-approved · Spec-write exception

Input$5.50 / MTok

Output$27.50 / MTok

Cache write 5-min$6.875 / MTok

Cache write 1-hr$11.00 / MTok

Cache read$0.55 / MTok

Base global $5.00/$25.00 · +10% regional

Cache multipliers (all models): 5-min write = 1.25× base input · 1-hr write = 2.0× base input · Cache read = 0.1× base input. 1-hr TTL available on Bedrock for Haiku 4.5, Sonnet 4.6, and Opus 4.6. Default if no TTL specified is 5 minutes. Minimum 1,024 tokens per cache checkpoint; up to 4 checkpoints per request.

Tool overhead: Claude Code's built-in tool suite (Bash, Read, Write, Glob, Grep, Task, TodoWrite, WebFetch) adds approximately 16,600 tokens to every session's system prompt. This is included in all context size figures throughout.

Section 03

Development Patterns & Session Definitions

Why DevSecOps Is a Brownfield Pattern

Brownfield development refers to working on an existing production system, which has live users, an established architecture, years of accumulated code, technical debt, and security constraints actively being enforced. Unlike greenfield work, one cannot start fresh, cannot discard wrong implementations without consequence, and cannot grant broad autonomous permissions without risk. Every change to a brownfield system affects real users and real infrastructure. In government, defence and high compliance contexts, this describes the overwhelming majority of day-to-day development work.

DevSecOps is not merely a methodology that happens to be applied to brownfield systems — it is the direct, necessary response to the constraints imposed by brownfield development. When a codebase has an existing security posture that cannot be accidentally degraded, existing patterns that must be understood before being changed, and a production environment where mistakes have immediate consequences, the human-gated, incremental, review-at-every-step workflow of DevSecOps becomes mandatory rather than optional discipline.

This shapes how Claude Code is used. The developer cannot say "explore this codebase and make the changes you think are needed" — they must say "look at auth.py lines 42–89 and identify any SQL injection risk." They cannot grant Claude Code broad file-write permissions — they must approve each individual change. Sessions are short, targeted, and bounded because the work itself demands precision over autonomy. The brownfield constraint and the DevSecOps pattern are two sides of the same coin.

Why Agentic Is Primarily the Greenfield Pattern

Greenfield development focusses on building a new service, a new module, a new application, where there are no live users to disrupt, no established architecture to accidentally break, and no accumulated constraints that an AI might misunderstand. A wrong-direction implementation can simply be discarded; the cost of an incorrect autonomous attempt is low.

This is where the Agentic development pattern is most appropriate. Anthropic describes Claude Code as an environment where "Claude explores, plans, and implements" — it reads many files, builds its own context map of the codebase, and works autonomously over long sessions. This works well precisely because Greenfield work has a low-consequence failure mode.

Agentic development can also be used for specific Brownfield scenarios — large-scale migration sprints, comprehensive test generation, or automated documentation across an existing codebase — provided the developer uses Plan Mode before any execution, works in a Git worktree for isolation, and treats every checkpoint as a safety gate. In these cases, the automation is targeted and reversible. Agentic development is not appropriate for routine brownfield maintenance, security operations, or any work where an incorrect autonomous change has immediate production consequences. The session risk profile, not the label alone, determines whether agentic development is appropriate.

Spec-Driven Development (SDD) — a Structured Variant of the Agentic Pattern

Spec-driven development (SDD) is a workflow overlay that modifies how Agentic sessions run. The developer authors a structured specification: interface contracts, data shapes, acceptance criteria, file layout, security constraints, and test coverage requirements, and then executes against the spec rather than exploring freely.

The cost effect is front-loading: reasoning cost is concentrated at the start of the workflow in a short spec-writing session, and all subsequent execution sessions become cheaper and more predictable. The spec functions as a persistent, cross-session Plan Mode — wrong-direction errors surface at spec-review stage (~500 tokens to correct) rather than after 20 agentic turns (tens of thousands of tokens wasted). The dominant cost driver in standard agentic — regular input from file exploration and history growth — is cut by approximately 48% in spec-driven execution.

SDD is critical for high-compliance, government, and defence sectors because it fundamentally inverts the relationship between intent and implementation, establishing the specification as the authoritative source of truth and treating code as a derivative artefact. With Agentic development, SDD serves as the essential governance bridge that prevents "vibe coding"—the reliance on loose, non-deterministic prompts—by providing autonomous agents like Claude Code with unambiguous, executable contracts that ground their reasoning in architectural and security constraints. This methodology naturally facilitates the rigorous traceability required by the Australian Information Security Manual (ISM), ISO 27001 and other government and defence compliance standards, ensuring that every code change is validated against documented requirements through automated validation gates in CI/CD pipelines. By shifting from reactive verification to proactive governance, SDD ensures that security corrections propagate across future regeneration cycles, thereby mitigating architectural drift and preserving individual human accountability for AI-enabled outcomes in mission-critical environments.

Phase 1 · Spec Writing

Author before executing

Interface contracts, acceptance criteria, file layout, security constraints, test scope. Quality here has compounding leverage — the spec directs all downstream execution.

Opus — justified, all tiers

8–15 turns · 15–30 min
Regular input: ~4,000 tok/turn
Output: 1,500–2,500 token spec
Est. cost: ~$8–15 / spec session

→

Phase 2 · Spec Execution

Implement against spec

Spec replaces file-discovery turns. Regular input ~48% lower than standard agentic. Do not /clear between Phase 1 and 2 — the spec is the context that must persist.

Sonnet — primary Haiku — simple impl turns

40–65 turns · up to 4 hrs
Regular input: ~13,000 tok/turn
1-hr TTL mandatory · /compact at 80%
Spec cached at 0.1× cost from turn 2

→

Phase 3 · Conformance Review

Check output vs spec

Pattern matching against spec criteria — does implementation satisfy contracts, acceptance criteria, and security constraints? Haiku-eligible. Failures return to Phase 2 with specific deviation notes.

Haiku — primary Sonnet — edge cases

5–10 turns · 15–30 min
Regular input: ~3,000 tok/turn
Est. cost: ~$1.50–3 / review session

Pattern Contrast

DevSecOps (Brownfield)

Existing production system — live users, established architecture, active security constraints, technical debt
8–11 sessions/day, 5–45 min each — short, targeted, bounded by task scope
Developer provides explicit file refs, specific line ranges — Claude Code doesn't explore freely
Human approves every response before Claude Code proceeds — accountability is non-negotiable
5–15 API turns per session · avg 2,500–9,000 tokens regular input/turn
Surgical output: patches, analysis, test stubs — 400–1,000 tokens/turn
Plan Mode before any multi-file change — mandatory practice
Cache writes amortise poorly per session — cross-session sharing is the key optimisation
Haiku suitable for ~25–30% of interactions

Agentic (Greenfield) — Standard & Spec-Driven Variant

New codebase under construction — no live users, no constraints to violate, wrong implementations are cheap to discard
Standard: 1–2 long sessions/day · Spec-driven: Phase 1 (30 min) + Phase 2 (up to 4 hrs) + Phase 3 (30 min)
Standard: Claude Code explores freely — 25,000 tok avg regular input/turn
Spec-driven: Claude executes against spec — ~13,000 tok avg regular input/turn (−48%)
50–65 API Turns/heavy session · auto-compaction at ~80% context fill
Spec-driven: single Opus spec session amortised across 40–65 Sonnet execution turns
Cache writes amortise very well — spec-driven improves cache hit rate further (spec = stable fixed context)
Standard: Haiku ~10–15% · Spec-driven: Haiku rises to ~20% (Phase 3 conformance eligible)

Session Type Definitions — DevSecOps

Micro

5 turns · 5–10 min

Doc update, syntax check, single-function review, pipeline triage, quick explanation

API turns5

Avg regular input/turn2,500 tok

Avg output/turn400 tok

Expected 5-min re-writes0.5

Standard

9 turns · 15–25 min

Code review, targeted bug fix, test generation, single-file security check, MR feedback

API turns9

Avg regular input/turn5,000 tok

Avg output/turn700 tok

Expected 5-min re-writes1.2

Extended

15 turns · 30–45 min

Multi-file security audit, SAST triage, compliance check, refactoring plan + execute

API turns15

Avg regular input/turn9,000 tok

Avg output/turn1,000 tok

Expected 5-min re-writes2.8

Session Type Definitions — Agentic

Light

5 × 10-turn sessions · 10–20 min each

Small feature additions, single-module builds, code explanations, focused bug fixes

API turns/day50 (5 sessions)

Avg regular input/turn4,000 tok

Avg output/turn600 tok

Expected 5-min re-writes1.0 / session

Medium

2 × 25-turn sessions · 30–60 min each

Feature implementation, module construction, multi-file build, MR creation

API turns/day50 (2 sessions)

Avg regular input/turn13,000 tok

Avg output/turn800 tok

Expected 5-min re-writes6.0 / session

Heavy

1 × 65-turn + 1 × 25-turn session

New service construction, greenfield architecture, large autonomous implementation from spec

API turns/day90 (2 sessions)

Avg regular input/turn25,000 tok

Avg output/turn1,100 tok

Expected 5-min re-writes22 (65Turn) / 6 (25Turn)

Session Type Definitions — Agentic (Spec-Driven Variant) Spec-Driven

Spec-driven sessions replace the open-ended exploration of standard agentic with a three-phase lifecycle. The session cards below represent Phase 1, 2, and 3 respectively — each with a distinct turn count, token profile, and primary model.

Spec Write Phase 1

8–15 turns · 15–30 min · Opus primary

Author spec: interface contracts, data shapes, acceptance criteria, file layout, security constraints, test coverage scope

API turns8–15

Avg regular input/turn4,000 tok

Avg output/turn800 tok (spec body)

Spec output size1,500–2,500 tok

Est. session cost~$8–15 (Opus)

Spec Execution Phase 2

40–65 turns · up to 4 hrs · Sonnet primary

Implement against spec. Do not /clear between Phase 1 and 2. /compact at 80% context fill.

API turns/day40–65

Avg regular input/turn~13,000 tok (−48%)

Avg output/turn1,000 tok

Spec cached at0.1× from turn 2

TTL requirement1-hr mandatory

Conformance Review Phase 3

5–10 turns · 15–30 min · Haiku primary

Verify output against spec criteria. Pattern matching — not novel reasoning. Failures return to Phase 2 with specific deviation notes.

API turns5–10

Avg regular input/turn~3,000 tok

Primary modelHaiku (pattern matching)

Est. session cost~$1.50–3 (Haiku)

Daily Usage Tier Definitions

Tier	Session Mix / Day	Sessions	Turns	Developer Profile
DevSecOps
Light	5 micro + 3 standard	8	52	Part-time AI assistance; quick reviews and consultations
Medium	3 micro + 4 standard + 1 extended	8	66	Active developer; daily code review, security checks, bug fixes
Heavy	5 micro + 4 standard + 2 extended	11	91	Lead developer / security engineer; audits, MR reviews, compliance
Agentic — Standard
Light	5 × light sessions	5	50	Light autonomous tasks; feature additions, focused builds
Medium	2 × medium sessions	2	50	Active builder; feature implementation, module construction
Heavy	1 × heavy + 1 × medium session	2	90	Power developer; new service construction, long autonomous sessions
Agentic — Spec-Driven Variant (per sprint day)
Spec Day	1 × spec-write + 1 × execution start	2	~25	Phase 1 + Phase 2 kickoff; spec authored, execution begins
Exec Day	1–2 × execution sessions	1–2	65–90	Phase 2 sustained; building against spec, /compact as needed
Review Day	1 × conformance + 1 × correction	2	~20	Phase 3 review; deviation notes → Phase 2 correction turn

Section 04

Token Modelling & Fixed Context

Claude Code's context is cumulative — every API call processes the full conversation history to date. The fixed system context (tool definitions, system prompt, project memory) is the prime candidate for prompt caching: written once per session and read cheaply on every subsequent turn.

Fixed Cached Context Per Session

Use Case A — Standalone (no GitLab MCP)

System prompt (agent instructions)3,900 tok

Built-in tool definitions (Bash, Read, Write, Glob…)16,600 tok

CLAUDE.md + memory files1,500 tok

Total fixed cached context22,000 tokens

Use Case B — With GitLab MCP

All standalone context22,000 tok

GitLab MCP tool names (deferred by default)~500 tok

GitLab tool schemas (loaded on demand)~8,500 tok

Max fixed cached context31,000 tokens

With MCP tool search (default): only names load upfront; schemas load on demand. Cost model assumes eager-load scenario for conservative estimates.

MCP Tool Search changes GitLab overhead. Anthropic's Claude Code now defers MCP tool definitions by default — only tool names load at session start (~500 tokens for GitLab MCP). Full schemas (~8,500 tokens) load on demand when Claude needs a specific tool. This means the 9,000-token GitLab MCP overhead is now largely avoided unless you deliberately disable tool search. The cost model uses the full 31,000-token figure as a conservative worst case. In practice, the actual GitLab overhead is closer to 500–3,000 tokens depending on which tools are used.

Daily Token Volumes

↔ scroll if needed

Pattern	Use Case	Tier	Cache Writes	Cache Reads	Regular Input	Output
DevSecOps / Brownfield — daily token totals
DevSecOps	Standalone	Light	187,000	968,000	174,000	28,900
		Medium	192,000	1,276,000	320,000	46,200
		Heavy	265,000	1,760,000	467,500	65,200
DevSecOps	+ GitLab MCP	Light	259,000	1,364,000	226,500	33,400
		Medium	264,000	1,798,000	393,300	53,100
		Heavy	364,000	2,480,000	569,600	74,700
Agentic / Greenfield — daily token totals
Agentic	Standalone	Light	120,000	990,000	185,000	30,000
		Medium	64,000	1,056,000	626,000	40,000
		Heavy	79,000	1,936,000	1,914,000	91,500
Agentic	+ GitLab MCP	Light	165,000	1,395,000	207,500	30,000
		Medium	82,000	1,488,000	661,000	43,000
		Heavy	97,000	2,728,000	1,971,500	96,000

All figures use 1-hr TTL caching (recommended default). Cache writes include fixed system context plus incremental session context. Cache reads are fixed context re-reads on turns 2–N. Regular input is non-cached conversation context (messages, file contents, tool outputs). Agentic heavy regular input is high because growing conversation history in long autonomous sessions is not fully cached — context window management is the primary cost driver.

Section 05

Model Access Policy & Recommended Splits

It is recommended that organisations restrict Opus 4.6 access due to cost (5× Sonnet input). This section defines two developer tiers and the recommended model split for each. Under standard agentic, the split is fixed by role. Under spec-driven agentic, the split becomes phase-aware — the optimal model depends on which phase the developer is in, not just their tier.

Standard Developer No Opus

DevSecOps: 30% Haiku / 70% Sonnet / 0% Opus
Agentic (standard): 15% Haiku / 85% Sonnet / 0% Opus
Agentic (spec-driven): 20% Haiku / 72% Sonnet / 8% Opus

Applies to the majority of an engineering organisation
DevSecOps Haiku share unchanged — doc updates, pipeline triage, syntax checks are genuinely Haiku-suitable
Spec-driven opens Opus access for standard developers at Phase 1 (spec authoring) — see amortisation note below
Spec-driven Haiku share rises to 20% — Phase 3 conformance review is pattern matching, not reasoning
Quality impact minimal on SWE-bench standard tasks — Sonnet 79.6% vs Opus 80.8%

Senior / Approved Limited Opus

DevSecOps: 25% Haiku / 65% Sonnet / 10% Opus
Agentic (standard): 10% Haiku / 75% Sonnet / 15% Opus
Agentic (spec-driven): 18% Haiku / 67% Sonnet / 15% Opus

Applies to tech leads, security engineers, principal developers
Spec-driven: Opus percentage unchanged — but front-loaded into Phase 1 rather than spread across all turns
This is more efficient: Opus reasoning concentrated where it has maximum downstream leverage
Haiku rises from 10% to 18% — Phase 3 conformance sessions are Haiku-eligible
Service layer enforces model access via role-based routing; Phase 1 spec sessions need explicit Opus permit

Why spec-driven opens Opus access for standard developers: The cost model normally gates Opus by role. Spec authoring creates a justification that bypasses this: a single Opus session (~$8–15) producing a tight 1,500–2,500 token specification amortises across 40–65 Sonnet execution turns. The Opus overhead is recovered within the first execution session through reduced file-exploration turns and eliminated wrong-direction corrections. The service layer should implement a "spec-write" session profile that permits Opus and enforces Sonnet-only for subsequent sessions in the same sprint.

Why the Haiku split differs between patterns: In DevSecOps, a substantial fraction of daily interactions — documentation strings, pipeline status checks, dependency lookups, boilerplate generation — genuinely don't require Sonnet-level intelligence. In standard Agentic sessions, even "simple" tasks involve sustained multi-turn reasoning where Haiku creates quality drag and more wrong-direction attempts. In spec-driven Agentic, Phase 3 conformance review is pattern matching against defined criteria — Haiku handles it at 3× lower cost without quality loss.

Split Bar Comparison — Agentic Standard vs Spec-Driven

Standard developer — Agentic standard: 15% Haiku / 85% Sonnet / 0% Opus

Standard developer — Agentic spec-driven (blended): 20% Haiku / 72% Sonnet / 8% Opus

Senior developer — Agentic standard: 10% Haiku / 75% Sonnet / 15% Opus (distributed)

Senior developer — Agentic spec-driven (blended): 18% Haiku / 67% Sonnet / 15% Opus (front-loaded)

Spec-driven blended splits are weighted averages across all three phases at heavy usage (1 spec session per 5 execution sessions). Phase 2 Haiku share ≈20% for simple implementation turns; Phase 3 is 70% Haiku. Opus is Phase 1 only. Actual splits vary by sprint cadence.

Section 06

Complete Cost Reference

The following constraints are used for the complete cost reference: Cost monthly per developer (USD), 22 working days per month, 1-hour TTL caching, Bedrock AU regional pricing (+10%). Standard = no-Opus split. Senior = limited Opus split. Per-model pure costs shown only for building custom blends.

DevSecOps (Brownfield) Use Case A: Standalone

Policy / Model	Split	Light / mo	Medium / mo	Heavy / mo
Standard developer	30%70%	$45.84	$61.72	$87.09
Senior (limited Opus)	25%65%10%	$51.57	$69.43	$97.98
Saving: Standard vs Senior		−$5.73 (11%)	−$7.71 (11%)	−$10.89 (11%)
Per-model pure reference
Haiku 4.5	—	$19.10	$25.71	$36.29
Sonnet 4.6	—	$57.30	$77.14	$108.86
Opus 4.6	—	$95.51	$128.57	$181.44

DevSecOps (Brownfield) Use Case B: + GitLab MCP

Policy / Model	Split	Light / mo	Medium / mo	Heavy / mo
Standard developer	30%70%	$60.86	$79.37	$111.46
Senior (limited Opus)	25%65%10%	$68.47	$89.29	$125.39
Saving: Standard vs Senior		−$7.61 (11%)	−$9.92 (11%)	−$13.93 (11%)
Per-model pure reference
Haiku 4.5	—	$25.36	$33.07	$46.44
Sonnet 4.6	—	$76.08	$99.22	$139.33
Opus 4.6	—	$126.80	$165.36	$232.21

Agentic (Greenfield) Use Case A: Standalone

Policy / Model	Split	Light / mo	Medium / mo	Heavy / mo
Standard developer	15%85%	$44.04	$69.23	$177.93
Standard developer — spec-driven est.	20%72%8%	~$40	~$60	~$155
Saving: Standard vs Spec-Driven (est.)		~$4 (9%)	~$9 (13%)	~$23 (13%)
Senior (limited Opus)	10%75%15%	$50.56	$79.49	$204.29
Senior — spec-driven est.	18%67%15%	~$46	~$69	~$178
Saving: Standard vs Senior (role restriction)		−$6.52 (13%)	−$10.26 (13%)	−$26.36 (13%)
Per-model pure reference
Haiku 4.5	—	$16.31	$25.64	$65.90
Sonnet 4.6	—	$48.93	$76.93	$197.70
Opus 4.6	—	$81.55	$128.21	$329.50

Agentic (Greenfield) Use Case B: + GitLab MCP

Policy / Model	Split	Light / mo	Medium / mo	Heavy / mo
Standard developer	15%85%	$54.04	$77.68	$190.68
Stardard developer — spec-driven est.	20%72%8%	~$49	~$68	~$167
Saving: Standard vs Spec-Driven (est.)		~$5 (9%)	~$10 (13%)	~$24 (13%)
Senior (limited Opus)	10%75%15%	$62.04	$89.18	$218.93
Saving: Standard vs Senior (role restriction)		−$8.00 (13%)	−$11.50 (13%)	−$28.25 (13%)
Per-model pure reference
Haiku 4.5	—	$20.01	$28.77	$70.62
Sonnet 4.6	—	$60.04	$86.31	$211.87
Opus 4.6	—	$100.07	$143.84	$353.11

Pattern Comparison — Standard Developer Policy

Pattern	Use Case	Light / mo	Medium / mo	Heavy / mo	Key Cost Driver at Heavy
DevSecOps	Standalone	$45.84	$61.72	$87.09	Small regular input (467k tok/day) dominates
Agentic	Standalone	$44.04	$69.23	$177.93	4.1× more regular input (1,914k tok/day)
Spec-Driven	Standalone est.	~$40	~$60	~$155	~48% lower regular input via spec; Phase 3 on Haiku
DevSecOps	+ GitLab MCP	$60.86	$79.37	$111.46	MCP adds +28% for light users, +7% for heavy
Agentic	+ GitLab MCP	$54.04	$77.68	$190.68	Heavy agentic 72% more expensive than DevSecOps heavy
Spec-Driven	+ GitLab MCP est.	~$49	~$68	~$167	Spec-driven narrows gap vs DevSecOps to ~50% at heavy

Blended costs = weighted average of per-model monthly costs using stated split. All computed at token level (cache writes at 2.0× input, reads at 0.1×, regular input at 1.0×, output at 5.0× ratio), summed over 22 working days. Spec-driven estimates include Phase 1 Opus session overhead (1 spec per 5 execution sessions) and Phase 3 Haiku conformance sessions. Spec-driven regular input estimated at 13,000 tok/turn (vs 25,000 standard). Marked est. — actual results depend on spec quality and sprint cadence.

Section 07

Prompt Caching & TTL Analysis Cache & TTL

Prompt caching is the single largest cost lever available — more impactful than any other optimisation. It is enabled in secd3v by default and without it, the 22,000-token system context is billed as full-price regular input on every API call. With prompt caching, a prompt is written once per session and read at 10% of input price on every subsequent turn. The Claude Code CLI application applies prompt caching automatically.

Three Billing Scenarios (Sonnet 4.6 regional, 22k context, per turn)

No caching: 22,000 × $3.30/MTok = $0.073 per turn. At 91 turns/day (DevSecOps heavy), that's $6.63/day in context cost alone — before any conversation input or output.

5-min TTL: Write once at 1.25× ($4.125/MTok). Reads at 0.1× ($0.33/MTok). If a developer pauses >5 min between turns, the cache expires and must be re-written at 1.25×. Each re-write costs $0.091 (22k context) or $0.128 (31k GitLab MCP).

1-hr TTL (secd3v default): Write once at 2.0× ($6.60/MTok). Reads at 0.1×. TTL resets on every cache hit — active sessions stay warm automatically. One write covers the full session regardless of developer pauses.

Per-Session Cache Costs

Session Type	Turns	No Cache	5-min TTL	1-hr TTL	1-hr saves vs 5-min	Re-writes (5-min)
DevSecOps sessions — 22k context, Sonnet 4.6 regional
Micro	5	$0.363	$0.162	$0.174	−$0.013 (5-min wins)	0.5
Standard	9	$0.653	$0.249	$0.203	+$0.046	1.2
Extended	15	$1.089	$0.426	$0.247	+$0.179	2.8
Agentic sessions — 22k context, Sonnet 4.6 regional
Light	10	$0.726	$0.240	$0.210	+$0.029	1.0
Medium	25	$1.815	$0.766	$0.319	+$0.447	6.0
Heavy	65	$4.719	$2.392	$0.610	+$1.782	22.0

Micro sessions slightly favour 5-min TTL. With only 0.5 expected re-writes, the lower write cost (1.25×) outweighs the re-write risk for the shortest DevSecOps sessions — saving $0.013/session. Monthly impact: $0.02 for DevSecOps light users. For all sessions of 9+ turns, 1-hr TTL is cheaper.

Break-Even

Extra cost of 1-hr write vs 5-min write per session: 22k (Standalone): 22,000 × ($6.60 − $4.125) / MTok = $0.054 31k (GitLab MCP): 31,000 × ($6.60 − $4.125) / MTok = $0.077 Cost of one unexpected re-write at 5-min TTL: 22k context: 22,000 × $4.125 / MTok = $0.091 per event 31k context: 31,000 × $4.125 / MTok = $0.128 per event Break-even re-writes to avoid per session: $0.054 / $0.091 = 0.60 re-writes — identical ratio for both context sizes CONCLUSION: For any interactive session where a developer is likely to pause >5 min at least once every two sessions, 1-hr TTL is cheaper. This covers every DevSecOps standard/extended session and every Agentic session.

Monthly Cost by Caching Scenario — All Patterns (Standard Developer Policy with Sonnet 4.6)

Pattern	Use Case	Tier	No Cache / mo	5-min TTL / mo	1-hr TTL / mo ✓	Saved (no→1hr)	Extra (5min→1hr)
DevSecOps
DevSecOps	Standalone	Light	$106.16	$57.31	$57.29	$48.87 (46%)	$0.02
		Medium	$145.41	$81.94	$77.13	$68.28 (47%)	$4.81 (6%)
		Heavy	$202.97	$116.06	$108.88	$94.09 (46%)	$7.18 (6%)
DevSecOps	+ GitLab MCP	Light	$145.58	$76.74	$76.05	$69.53 (48%)	$0.69 (1%)
		Medium	$196.35	$106.92	$99.20	$97.15 (49%)	$7.72 (7%)
		Heavy	$273.27	$150.81	$139.33	$133.94 (49%)	$11.48 (8%)
Agentic
Agentic	Standalone	Light	$104.20	$50.69	$48.95	$55.25 (53%)	$1.74 (3%)
		Medium	$139.81	$93.65	$76.91	$62.90 (45%)	$16.74 (18%)
		Heavy	$315.91	$241.64	$197.69	$118.22 (37%)	$43.95 (18%)
Agentic	+ GitLab MCP	Light	$138.48	$63.09	$60.04	$78.44 (57%)	$3.05 (5%)
		Medium	$176.13	$111.08	$86.31	$89.82 (51%)	$24.77 (22%)
		Heavy	$380.52	$275.87	$211.86	$168.66 (44%)	$64.01 (23%)

No-cache baseline reprices all cached context as regular input at full rate every turn. 5-min TTL models expected re-writes per session type; actual rates vary by developer behaviour. Savings scale with model: Haiku saves 3× less in absolute terms; Opus 1.67× more. All Sonnet 4.6 regional.

Key Insights

46–57% Cost reduction from enabling
caching vs no-cache baseline

$44–64 Monthly saving: 1-hr over
5-min TTL (heavy agentic)

$0–11 Monthly saving: 1-hr over
5-min TTL (DevSecOps any tier)

TTL choice matters far more for Agentic than DevSecOps. For DevSecOps (short sessions), the 5-min vs 1-hr difference is at most $11.48/month. For heavy agentic users, 1-hr TTL saves $43.95–$64.01/month — equivalent to 18–23% of total monthly cost. Heavy agentic sessions accumulate 22 expected re-writes per session (35% risk × 64 inter-turn gaps for test runs, build waits, and review). 5-min TTL nearly quadruples the cache cost on those sessions.

TTL Selection Guide

5-Minute TTL — Use When

Fully automated CI/CD pipelines with no human in the loop
Sequential scripted invocations (claude -p) with <5 min gaps
Pre-commit hook automation running in-process
DevSecOps micro sessions only (marginal saving, $0.02/month)

1-Hour TTL — Default for Everything Else

All interactive DevSecOps sessions (standard and extended)
All agentic sessions regardless of length
All spec-driven Phase 2 execution sessions — non-negotiable. The spec must survive as warm cached context across the full execution session without re-write cost
Any session involving human review, test execution, or build waits
GitLab MCP workflows where CI pipeline waits create natural pauses
Cross-session DevSecOps blocks on the same project within an hour

Section 08

Developer Cost Optimisation Factors

After model policy and caching configuration, cost is controlled by developer behaviour. Each factor below is validated against Anthropic's Claude Code best practices documentation. Spec-driven development (SDD) is included as a first-class optimisation factor — it addresses the dominant agentic cost driver (regular input from file exploration and wrong-direction turns) at the workflow level rather than the session level.

🎯

Prompt Specificity & Context Front-Loading

Highest impact · All patterns

"The more precise your instructions, the fewer corrections you'll need. Reference specific files, mention constraints, and point to example patterns." A prompt like "review @auth.py lines 42–89 for SQL injection — here is the schema" costs 3–5× less than "check my auth code for security issues" because Claude doesn't explore files to find context it was never given.

In DevSecOps, specificity keeps sessions within their session-type scope and prevents drift from Micro into Extended territory. In agentic, vague prompts trigger broad file scanning — each file read compounds into every subsequent turn. In spec-driven execution, the spec itself is the specificity mechanism — but the execution prompt must still reference specific spec sections, not leave Claude to interpret the full spec freely.

DevSecOpsAgentic · 3–5× regular input reduction possible

📋

Plan Mode Before Execution

Highest impact · Agentic

"Claude reads files and answers questions without making changes." Enter Plan Mode by prefixing your prompt with /plan or pressing Shift+Tab. Review the plan file — Claude writes it to your project. Switch back to Normal Mode to execute.

Plan Mode is most useful when you're uncertain about approach, when the change touches multiple files, or when you're unfamiliar with the code. Skip Plan Mode for small, clearly scoped tasks — "if you could describe the diff in one sentence, skip the plan." Wrong-direction correction at the plan stage costs ~500 tokens; correction after 20 turns of wrong implementation costs tens of thousands.

For spec-driven development, the spec is inter-session Plan Mode — it surfaces wrong directions before execution begins. /plan is still valuable within execution sessions for multi-file changes inside the spec scope.

Agentic· prevents expensive wrong-direction sessions · DevSecOps· mandatory for multi-file brownfield changes

📐

Spec-Driven Development — Phase 1 Spec Authoring

Highest impact · Agentic only · ~13% monthly saving at heavy usage

Author a structured specification before any agentic execution begins. The spec defines interface contracts, data shapes, acceptance criteria, file layout, security constraints, and test coverage requirements. Claude Code then executes against the spec rather than discovering scope through open-ended file exploration — the dominant cost driver in standard agentic sessions.

The cost effect: standard agentic heavy sessions average 25,000 tokens regular input per turn from file exploration and history growth. Spec-driven execution sessions average ~13,000 tokens per turn — a 48% reduction — because the spec replaces the file-discovery phase. Wrong-direction turns drop from 4–8 per session to 0–1, eliminating the most expensive tail-cost scenario. Phase 3 conformance review sessions are Haiku-eligible (pattern matching, not reasoning), further reducing the blended model cost.

CLAUDE.md rule for spec-driven projects: the spec belongs in a separate file referenced at session start — not embedded in CLAUDE.md. A 600-line spec in CLAUDE.md adds ~9,000 tokens of cache-read cost per call with no benefit over a tight 150-line CLAUDE.md pointing to the spec file. Target CLAUDE.md under 200 lines regardless of spec-driven status.

Session hygiene rule: do not /clear between Phase 1 (spec write) and Phase 2 (execution) — the spec is the context that must persist. /clear only between unrelated task types. /compact at 80% context fill during Phase 2 execution. Phase 3 conformance sessions can be started fresh; they reference the spec directly.

Agentic · Phase 1 Opus justified all tiers · Phase 2 Sonnet/Haiku · Phase 3 Haiku · ~48% regular input reduction · 1-hr TTL non-negotiable

🧹

Session Hygiene — /clear and /compact

High impact · DevSecOps especially

"Run /clear between unrelated tasks to reset context. If you've corrected Claude more than twice on the same issue in one session, the context is cluttered with failed approaches. Run /clear and start fresh." A clean session with a better prompt almost always outperforms a long session with accumulated corrections. Use /rename before clearing to preserve session identity for later.

For long agentic sessions, /compact summarises conversation history rather than clearing it. You can focus compaction: /compact focus on the API changes. Claude Code also auto-compacts when approaching context limits. Put persistent rules in CLAUDE.md rather than relying on conversation history.

Spec-driven rule: do not /clear between Phase 1 (spec write) and Phase 2 (execution). The spec must remain in context. /compact at 80% during Phase 2. Phase 3 conformance sessions can start fresh — they reference the spec file directly.

DevSecOps· /clear between every task · Agentic· /compact at ~80% · spec-driven: no /clear between Ph1→Ph2

🔑

Model Selection — Haiku First, Effort Levels

High impact · DevSecOps especially

Defaulting to Sonnet for everything is the most common unnecessary cost. In DevSecOps, documentation strings, pipeline triage, dependency lookups, simple formatting, and boilerplate scaffolding are genuinely Haiku-suitable at 3× lower cost.

Anthropic also introduces effort levels (/effort): low, medium, high, and max (Opus only). Medium is recommended for most coding tasks. For simple tasks, "you can reduce costs by lowering the effort level" — it controls adaptive reasoning depth, with lower effort being faster and cheaper. High effort and max provide deeper reasoning for complex problems but consume significantly more output tokens. Set effort per-task, not as a session default.

DevSecOps· 25–30% Haiku achievable · Agentic· 10–15% Haiku realistic · /effort for complex tasks · Spec-Driven· Phase 3 conformance is Haiku-primary (rises to ~20% blended)

🔍

Subagents for Research & Cost Routing

High impact · Agentic

"Delegate research with 'use subagents to investigate X'. They explore in a separate context, keeping your main conversation clean for implementation." When Claude researches a codebase it reads many files — all consuming your context. Subagents run in separate context windows and report back summaries, keeping the main session lean.

Subagents also serve as cost routers: Anthropic's subagent documentation explicitly notes you can "control costs by routing tasks to faster, cheaper models like Haiku." Configure Haiku subagents for file scanning, documentation lookup, and log analysis — they return summaries to a Sonnet main session without the main session paying full Sonnet input prices for verbose exploration results.

Agentic· context preservation + Haiku routing · DevSecOps· useful for large SAST result triage

🗂️

CLAUDE.md & .claudeignore Discipline

Medium impact · Both patterns

Anthropic recommends keeping CLAUDE.md specific and concise — "specific, concise, well-structured instructions work best." A bloated CLAUDE.md (500+ lines) loads into every session, consuming tokens on every turn. Move specialist content to Skills (which load on demand) and keep project-wide rules tight. CLAUDE.md can also include a "Compact Instructions" section to guide what gets preserved during /compact.

The .claudeignore file prevents Claude from accidentally reading node_modules, build artifacts, generated code, and binaries. A single accidental glob-all read of a large brownfield repository can consume 50,000–150,000 tokens in one API call — equivalent to a day's DevSecOps budget. Configure .claudeignore on day one of any brownfield project.

Spec-driven bloat risk — the highest-cost CLAUDE.md failure mode: developers may embed verbose spec prose into CLAUDE.md. A 600-line spec adds ~9,000 tokens of cache-read cost per call with no output benefit over a tight 150-line CLAUDE.md that references the spec as a separate file. The spec belongs as a project file loaded at Phase 2 session start — not in CLAUDE.md. Telemetry signal: CLAUDE.md above 3,000 tokens on a spec-driven project indicates this failure is active.

DevSecOpsAgentic · one bad file read can cost a day's budget

⚡

Cross-Session Cache Grouping

High structural impact · DevSecOps

With 1-hr TTL, the service layer can route same-developer, same-project sessions to share a Bedrock cache state within the hour. Sessions 2 and 3 on the same project read the cache (at 0.1× input) instead of re-writing it (at 2.0× input) — saving $0.054–$0.077 per session.

Developer practice: Group related micro and standard sessions into focused 30–60 minute blocks rather than scattering them. Keep CLAUDE.md stable during the block — changes invalidate the cache. The TTL resets on every cache hit, so continuous active sessions stay warm automatically. This cross-session sharing is the most impactful DevSecOps-specific optimisation and has no equivalent in agentic patterns.

DevSecOps· cross-session sharing saves $6–14/dev/month (Sonnet) · Agentic· within-session amortisation

🔌

MCP Tool Search — Default Deferred Loading

Changed behaviour — verify configuration

Anthropic has changed the default MCP behaviour: "Tool search keeps MCP context usage low by deferring tool definitions until Claude needs them. Only tool names load at session start, so adding more MCP servers has minimal impact on your context window." Tool search is enabled by default from recent Claude Code versions.

This significantly reduces the GitLab MCP overhead previously modelled as 9,000 tokens — with tool search active, only tool names (~500 tokens) load at session start; full schemas load on demand. If your deployment disables tool search (ENABLE_TOOL_SEARCH=0), you revert to eager loading of all tool definitions. Verify your Claude Code version and configuration — recent versions (v2.1.x+) default to deferred loading.

DevSecOpsAgentic · deferred by default on v2.1+ · verify with /mcp in session

👥

Agent Teams — Explicit Cost Governance

Critical for agentic power users

"Agent teams use approximately 7× more tokens than standard sessions when teammates run in Plan Mode, because each teammate maintains its own context window and runs as a separate Claude instance." A heavy agentic developer using Agent Teams moves from the $178/month model to potential $1,200+/month exposure.

Agent teams are disabled by default (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 to enable). They are appropriate for genuinely parallelisable work — independent feature branches, parallel test generation, concurrent module construction — not for tasks that can be done serially. Keep teams small; active teammates consume tokens even if idle. Your service-layer rate limits and budget caps are the primary governance mechanism.

Agentic · 7× cost multiplier · disabled by default · service-layer governance essential

🔒

Extended Thinking — Selective Use Only

High cost when unmanaged

Extended thinking tokens are billed as output tokens at standard output rates — the most expensive token category. Anthropic confirms it is enabled by default in Claude Code: "Extended thinking is enabled by default because it significantly improves performance on complex planning and reasoning tasks." Thinking tokens use the default budget unless overridden.

For simpler tasks, disable or reduce: /effort low reduces thinking depth; MAX_THINKING_TOKENS=8000 caps the budget; setting to 0 disables thinking entirely. At Sonnet regional rates, a 4,000-token thinking block adds $0.066/call. In a 25-turn medium agentic session this can add $1.65 to session cost at default settings — worth managing explicitly on routine work.

Opus / seniors· use /effort for high-complexity tasks · Sonnet· /effort low for routine DevSecOps tasks

📊

Token Telemetry — Measure First

Foundation for all other optimisation

The Claude Code services includes token telemetry to allow for cost optimisation. These are aggregated per user, per team, per model, and organisation wide.

Key telemetry signals — all patterns: Cache hit rate under 70% → session hygiene is poor; regular input per turn over 15,000 on DevSecOps → broad prompting or missing /clear; Haiku actual split below 15% on DevSecOps → model discipline not applied. The X-Claude-Code-Session-Id header added in v2.1.86+ lets proxies aggregate requests by session without parsing the body — enabling accurate per-session cost attribution.

Additional signals for spec-driven agentic: Regular input per turn above 18,000 during Phase 2 → spec not being referenced; Claude is still file-exploring. CLAUDE.md above 3,000 tokens on a spec-driven project → spec prose has been embedded in CLAUDE.md. Opus usage outside Phase 1 sessions → policy drift; service layer should enforce Sonnet-only post-spec. Phase 3 sessions using Sonnet for all turns → Haiku is sufficient for conformance checking. The 30-day telemetry review is the single most operationally valuable practice.

DevSecOpsAgentic · service-layer responsibility · 30-day review cadence

Section 09

Opus Approval Guide Senior Access

Opus 4.6 scores 80.8% on SWE-bench vs Sonnet 4.6 at 79.6% — a 1.2pp gap on standard coding tasks. The meaningful Opus advantage appears on ARC-AGI-2 novel reasoning (68.8% vs Sonnet 58.3%) and Terminal-Bench autonomous operation (65.4% vs 59.1%). The approval gate should reflect this: Opus is justified when the task requires novel reasoning under genuine ambiguity, sustained autonomous operation over many turns, or where downstream error costs are high and the 10-point reasoning gap materially changes outcomes.

Spec-driven development introduces a new Opus justification that applies to all developer tiers: spec authoring. A single Opus spec session (~$8–15) producing a tight 1,500–2,500 token specification amortises across 40–65 Sonnet execution turns. The Opus cost is recouped within the first execution session through reduced file-exploration turns and eliminated wrong-direction corrections. The service layer should implement a "spec-write" session profile that permits Opus and enforces Sonnet-only for subsequent sessions in the same sprint.

Task or Scenario

Pattern

Opus Justified?

Rationale

Spec authoring — greenfield service or controlled migration sprint

Agentic

Yes — all tiers

Phase 1 only. Single Opus session amortised across 40–65 Sonnet execution turns. ARC-AGI-2 advantage (68.8% vs 58.3%) most material for architectural scope and interface decisions. Standard developers eligible under spec-write session profile.

Security vulnerability exploit chain assessment — novel threat vectors

DevSecOps

Yes

ARC-AGI-2 gap (68.8% vs 58.3%) is most material for novel reasoning under genuine ambiguity

Compliance gap analysis — assessing system against complex regulatory controls

DevSecOps

Yes

Multi-control reasoning; compliance errors have significant downstream cost

Architectural design for new greenfield service from spec

Agentic

Yes

Single high-value planning session in Plan Mode prevents many expensive wrong-direction turns

Multi-service brownfield refactor with ambiguous legacy coupling

Agentic

Conditional

Use Sonnet first in Plan Mode; escalate to Opus only if plan proposals are inadequate twice

Security threat modelling — genuinely novel threat scenarios

DevSecOps

Conditional

Opus for novel scenarios; Sonnet handles known patterns (OWASP Top 10, CVE triage) well

Spec execution — implementing against an authored spec

Agentic

Phase 2. Spec provides the reasoning frame — Sonnet executes against it. Haiku eligible for bounded implementation turns within spec scope.

Standard MR code review (1–5 files)

DevSecOps

Sonnet 4.6 at 79.6% SWE-bench is indistinguishable from Opus (80.8%) on routine review

Feature implementation — well-defined greenfield module

Agentic

Well-defined implementation is Sonnet territory; Plan Mode compensates for Sonnet's narrower reasoning

Conformance review — verifying output against spec

Agentic

Phase 3. Pattern matching against defined spec criteria — Haiku-level task at 3× lower cost than Sonnet, 5× lower than Opus

Documentation, comments, type annotations

DevSecOps

Haiku-level task — Opus is 5× input cost for equivalent output quality

Pipeline failure triage and CI configuration

DevSecOps

Haiku is appropriate; this is pattern matching, not novel reasoning

Automated CI/CD pipeline tasks (no human in loop)

Agentic

No human to verify reasoning; Sonnet provides better cost-reliability tradeoff in automation