secd3v · Claude Code · Developer Training

Developer Cost
Optimisation Guide

The 12 developer behaviours that directly control how much Claude Code costs yourself, your team and your organisation. Each factor explains why it matters, shows before-and-after examples from real DevSecOps, Agentic, and Spec-Driven workflows, and tells you exactly what to do.

REGION=AU · COMPLIANCE=IRAP assessed to PROTECTED · v2.0.0

How to use this guide. Start with Factor 01 — prompt specificity is the highest-impact change any developer can make and takes less than a day to internalise. Factor 03 covers spec-driven development — the highest-impact structural change for agentic developers, changing how you think about entire sprints. Work through the others in order; each builds on the previous. The Reference section is a quick lookup for model selection decisions.

The 12 Optimisation Factors

Factor 01

Prompt Specificity

Highest impact · All patterns

Factor 02

Plan Mode

Highest impact · Agentic

Factor 03

Spec-Driven Development

Highest impact · Agentic only · ~13% monthly saving

Factor 04

Session Hygiene

High impact · DevSecOps

Factor 05

Model Selection & Effort

High impact · DevSecOps

Factor 06

Subagents

High impact · Agentic

Factor 07

CLAUDE.md & .claudeignore

Medium impact · All patterns

Factor 08

Cross-Session Cache Grouping

High structural · DevSecOps

Factor 09

MCP Tool Search

Medium · GitLab MCP users

Factor 10

Agent Teams

Critical for agentic power users

Factor 11

Extended Thinking

High cost when unmanaged

Factor 12

Token Telemetry

Foundation for optimisation

The Three Work Patterns

This guide covers three Claude Code usage patterns. Spec-driven is a structured variant of Agentic — not a separate pattern — but it changes the session shape, model allocation, and cost profile enough to warrant its own treatment.

Pattern	When it applies	Session shape	Primary cost driver
DevSecOps	Working on an existing production system — code review, bug fixes, security checks, MR review on a live codebase with real users	8–11 short sessions/day, 5–45 min each, human approves every step	Cache write amortisation and regular input from broad prompting
Agentic	Building something new — new service, new module, greenfield implementation where wrong output can be discarded without consequence	1–2 long sessions/day, up to 4+ hours, Claude works autonomously	Growing conversation history as regular input — 25,000 tok/turn at heavy
Spec-Driven	Agentic variant: author a spec before executing. Interface contracts, acceptance criteria, file layout, constraints. Claude executes against the spec rather than discovering scope freely.	Phase 1: 30 min spec-write (Opus) → Phase 2: up to 4 hrs execution (Sonnet) → Phase 3: 30 min conformance review (Haiku)	Reduced to ~13,000 tok/turn in execution (−48%) — spec replaces file-discovery turns

Start here

Factor 01 — Prompt Specificity

→

Wiki / Optimisation / Prompt Specificity

Factor 01 of 12

Prompt Specificity & Context Front-Loading

The single largest per-session cost lever you control. How you phrase a request determines how much context Claude must read before it can act — and every token read compounds across the entire session.

TL;DR

Tell Claude exactly what file, what lines, and what you need. A prompt with an explicit file path, line range, and specific concern costs 3–5× less than an equivalent vague prompt — because Claude reads only what you point to, not everything that might be relevant.

💰

Cost Impact: Highest — All Patterns

"The more precise your instructions, the fewer corrections you'll need. Reference specific files, mention constraints, and point to example patterns." Vague prompts trigger broad file scanning. Each file read adds tokens that compound across every subsequent turn.

3–5× regular input reduction possible

Why it matters

Claude Code's context window holds everything — your messages, every file read, every command output. When you write a vague prompt, Claude reads broadly to find relevant files. Those reads stay in context for every subsequent message, inflating input token counts for the rest of the session.

~40,000Tokens from broad file exploration triggered by one vague prompt on a medium module

~8,000Tokens for the same task with an explicit file reference and line range

5×Reduction in regular input from specific vs vague prompting on this task type

DevSecOps examples

Security review of an authentication function

✗ Vague

Check my auth code for security issues

Claude scans auth.py, user.py, middleware.py, session.py, config.py, database.py, models.py — all of them, then their imports. Context grows before the real work begins.

✓ Specific

Review @src/auth/login.py lines 42–89 for SQL injection risk. The database layer uses psycopg2 and our connection is in @src/db/connection.py. Flag any raw string interpolation into queries.

Two files. Claude knows exactly what to look for and where. Session starts work immediately.

Fixing a known bug

✗ Vague

The user profile API is returning 500 errors. Fix it.

No file, no error message, no environment. Claude reads the entire module and its dependencies — potentially 10+ files — before forming a hypothesis.

✓ Specific

@src/api/user_profile.py line 134 is throwing: KeyError: 'preferred_name' Field added to the User model last sprint but the serializer at line 134 still expects the old schema. Fix the serializer to handle this field being absent, defaulting to None.

One file, exact line, exact error, exact expected behaviour. Targeted edit, micro-sized session.

MR code review with security focus

✗ Vague

Can you review this merge request for me?

No MR number, no focus area, no security standards. Claude asks clarifying questions (wasted turns) or speculatively reads the entire diff plus related files.

✓ Specific

Review MR !847 — adds JWT token refresh to session middleware. Focus on: 1. Token expiry edge cases (expired-during-request scenarios) 2. Whether refresh tokens are properly invalidated on logout 3. Any timing windows that could allow token reuse Our session middleware is at @src/middleware/session.py

Three specific concerns, one target file. Focused analysis costs far less than a general sweep.

Agentic example — building a new API endpoint

✗ Vague

Add an endpoint to the API for managing notifications.

Claude reads the entire API module, models, routing config, and tests to understand patterns — all before writing anything. Those reads stay in context for 60+ subsequent turns.

✓ Specific

Add GET /api/v2/notifications following the pattern in @src/api/v2/messages.py. - Paginated (use PaginationMixin from @src/api/mixins.py) - Filter by ?status=read|unread - Auth via existing @require_auth decorator - Return: id, title, body, created_at, is_read Notification model already exists — no new models needed.

Two reference files. Claude implements immediately, saving 10+ exploration turns at the start of what will be a long session.

The 5W1H checklist

Before sending a prompt, answer these six questions. If you can answer them, write the answers into the prompt.

Question	What it covers	Example in prompt
What	Exactly what to change, review, or understand	`"review the token validation logic"` not `"check auth"`
Where	Specific file path and line numbers	`"@src/auth/tokens.py lines 88–134"`
Why	Error message, security concern, test failure	`"returning 401 for valid tokens after migration"`
Look for	Specific patterns or vulnerability class	`"flag non-constant-time string comparisons"`
Don't touch	Files or logic to leave unchanged	`"do not modify the database schema or migrations"`
How	Pattern or library to follow — point to an example	`"use the same approach as @src/auth/refresh.py line 44"`

One task per prompt. Bundling multiple tasks — "fix the bug, add tests, and update the README" — forces Claude to hold all three dependency trees in context simultaneously. Sending them separately reduces total token cost even though it's more messages.

Quick reference by task type

Task	Instead of…	Say…
Security check	`"check for security issues"`	`"review @api/views.py lines 55–90 for IDOR — ensure user_id is validated against request.user"`
Test generation	`"write tests for the payment module"`	`"write pytest unit tests for charge() in @payments/processor.py using fixtures from @tests/conftest.py — cover: success, declined card, network timeout"`
Documentation	`"document the API"`	`"add Google-style docstrings to the 4 public methods in @src/api/client.py — type annotations already present, skip them"`
Pipeline triage	`"CI is failing, fix it"`	`"job 'test-unit' in pipeline #4821 fails with ImportError: cannot import 'TokenCache' from 'src.cache' — class renamed last commit. Fix imports in test files only"`
Refactoring	`"refactor the database layer"`	`"extract retry logic from @db/connection.py lines 120–156 into RetryPolicy class in @db/retry.py — keep existing interface, callers must not change"`

Knowledge Check — Factor 01

Scenario: You need to review a payment function for timing attack vulnerabilities. You know it's in src/payments/processor.py but not the exact lines. Which prompt is most cost-effective?

A. "Review the payment code for security issues" B. "Check src/payments/ for timing vulnerabilities" C. "Review the compare_secret() and validate_token() functions in @src/payments/processor.py for timing attack vulnerabilities — look for non-constant-time string comparisons" D. "Audit the entire payments module for OWASP Top 10 issues"

Show answer

C is correct. Naming the specific functions and exact vulnerability class limits Claude's scope to one file and one issue type — even without knowing exact line numbers. Options A and D scan entire modules. Option B scans an entire directory. C is precise enough to produce a targeted, affordable session.

←

Back to

Overview

Factor 02 — Plan Mode

→

Wiki / Optimisation / Plan Mode

Factor 02 of 12

Plan Mode Before Execution

Use Plan Mode to separate what Claude should figure out from what it should build. The cost of a wrong plan is ~500 tokens. The cost of wrong-direction implementation discovered at turn 20 is tens of thousands.

Spec-driven development extends Plan Mode to the inter-session level. Plan Mode prevents wrong-direction turns within a single session. A spec prevents wrong-direction sessions from starting at all — by front-loading scope resolution before any execution session begins. See Factor 03 — Spec-Driven Development for how to apply this to entire agentic sprints. Both are essential; they operate at different granularities.

TL;DR

For any task that touches multiple files or where you're uncertain about approach, prefix your prompt with /plan or press Shift+Tab before typing. Review Claude's proposed plan. Only switch back to execute when you're confident in the direction. Skip Plan Mode for small, clearly scoped tasks where you could describe the diff in one sentence.

💰

Cost Impact: Highest — Agentic. Mandatory — DevSecOps Brownfield.

Anthropic's internal research confirms Claude Code succeeds on its first autonomous attempt approximately one-third of the time on complex tasks. For the other two-thirds, Plan Mode prevents the expensive correction loop. Wrong-direction correction at the plan stage costs ~500 tokens. Correction after 20 turns of wrong implementation costs tens of thousands.

Prevents 1–3 wasted sessions per week for active developers

How Plan Mode works

In Plan Mode, Claude reads files and runs shell commands to explore — but makes no edits to your source code. It produces a plan file you can review (and edit directly) before execution begins.

1
Enter Plan Mode. Type /plan before your prompt, or press Shift+Tab to cycle through permission modes. Claude will confirm it's in Plan Mode.
2
Claude explores. It reads relevant files and asks clarifying questions without touching your code. This phase costs tokens but far fewer than an incorrect implementation.
3
Review and edit the plan. Press Ctrl+G to open the plan file in your editor. Correct the approach, add constraints, or reject it entirely — at near-zero cost.
4
Execute against the plan. Press Shift+Tab again to return to Normal Mode. Claude implements what was agreed, verifying against its own plan.

When to use Plan Mode

✓ Use Plan Mode when…

The task touches multiple files or modules
You're unfamiliar with the area of code being changed
You're uncertain what the right approach is
Any brownfield change to a production system
New feature implementation in an existing service
Refactoring with potential side-effects
Security-sensitive changes
Spec-driven Phase 2: multi-file changes within the spec scope — use /plan to confirm approach before implementing each significant component

○ Skip Plan Mode when…

You could describe the complete diff in one sentence
Fixing a typo, renaming a variable, adding a log line
Adding a docstring to a clearly understood function
Updating a dependency version
A formatting-only change
Any task where the scope is fully clear upfront

Plan Mode adds overhead for trivial tasks. Anthropic's docs: "Plan Mode is useful, but also adds overhead. For tasks where the scope is clear and the fix is small — like fixing a typo, adding a log line, or renaming a variable — ask Claude to do it directly."

DevSecOps example — multi-file bug fix

✗ Without Plan Mode

Fix the password reset flow — users aren't receiving the reset email and the link expires too quickly.

Claude dives straight into implementation. It might fix the email delivery in the wrong layer, change token expiry in a way that conflicts with the session middleware, and miss a third issue in the validation logic. Realising this at turn 15 means rewinding and re-implementing — expensive.

✓ With Plan Mode

/plan Fix the password reset flow — users aren't receiving the reset email and the link expires too quickly.

Claude maps the password reset flow across auth/reset.py, notifications/email.py, and middleware/session.py, identifies all three issues, and proposes a plan. You review: the email fix looks right, but the session expiry approach is wrong for your architecture. You correct it in the plan file before a single line of implementation is written.

Agentic example — new service construction

✗ Without Plan Mode

Build a notification service that sends email, SMS, and push notifications. It should be async and support retry logic.

Claude starts building. At turn 30 you discover it's built the queue on Redis when your infrastructure uses SQS, and the retry logic doesn't match your existing circuit-breaker pattern. Starting over costs the entire 30-turn investment.

✓ With Plan Mode

/plan Build a notification service that sends email, SMS, and push notifications. It should be async and support retry logic. We use SQS for queuing (@infrastructure/queues.py) and our circuit-breaker pattern is in @src/resilience/circuit_breaker.py

Claude reads your existing infrastructure files and proposes an architecture that fits. You review and catch that it's missed the dead-letter queue requirement — you add it to the plan. Implementation then proceeds correctly from turn 1.

Plan Mode in DevSecOps is a compliance practice, not just a cost practice. On brownfield production systems, executing before planning is how you accidentally modify the wrong component or create a change that passes tests but introduces a subtle security regression.

Knowledge Check — Factor 02

Scenario: You need to add input validation to a single function in @src/api/users.py. The validation logic is straightforward — just checking that the email field matches a regex. Should you use Plan Mode?

A. Yes — always use Plan Mode for any API change B. No — single function, clear scope, you could describe the diff in one sentence. Ask Claude to do it directly. C. Yes — any change to a brownfield file needs a plan D. Only if the function has more than 50 lines

Show answer

B is correct. This is exactly the scenario Anthropic describes as "skip the plan — if you could describe the diff in one sentence." The scope is fully clear: one function, one field, one regex check. Plan Mode would add overhead without benefit. Reserve Plan Mode for tasks with genuine scope uncertainty or multi-file impact.

←

Factor 01 — Prompt Specificity

Factor 03 — Spec-Driven Development

→

Wiki / Optimisation / Spec-Driven Development

Factor 03 of 12

Spec-Driven Development

The highest-impact structural change available to agentic developers. Author a complete specification before execution begins — and cut the dominant agentic cost driver by approximately half.

TL;DR

Before starting any agentic execution session, write a specification: interface contracts, data shapes, acceptance criteria, file layout, security constraints, test coverage scope. Claude Code then executes against the spec rather than discovering scope through file exploration. This replaces the most expensive phase of every agentic session — and the spec itself becomes the cached context that makes every subsequent execution turn cheaper.

💰

Cost Impact: Highest — Agentic. ~13% monthly saving at heavy usage.

Standard agentic heavy sessions average 25,000 tokens regular input per turn — driven by file exploration and growing conversation history. Spec-driven execution sessions average ~13,000 tokens per turn: a 48% reduction. Wrong-direction turns drop from 4–8 per session to 0–1. Phase 3 conformance review runs on Haiku. The blended monthly cost at heavy usage drops from ~$178 to ~$155 for a standard developer.

~48% reduction in regular input/turn · Phase 3 Haiku-eligible · Opus justified at Phase 1 (all tiers)

The three-phase lifecycle

Spec-driven development restructures Agentic work into three phases. Each has a distinct session profile, dominant model, and cost character. This is not a third work pattern — it is a structured variant of Agentic that modifies how Agentic sessions run.

Phase 1 · Spec Writing

Author before executing

Write the full specification. Interface contracts, data shapes, acceptance criteria, file layout, security constraints, test coverage scope. Quality here has compounding leverage across all downstream sessions.

8–15 turns · 15–30 min · Opus (all developer tiers)
Avg regular input: ~4,000 tok/turn
Output: 1,500–2,500 token spec file
Est. cost: ~$8–15 per spec session

Phase 2 · Spec Execution

Implement against spec

Execute against the spec. Claude reads the spec and only the files it names. Do not /clear between Phase 1 and Phase 2 — the spec is the context that must persist. /compact at 80% context fill.

40–65 turns · up to 4 hrs · Sonnet primary / Haiku for bounded turns
Avg regular input: ~13,000 tok/turn (vs 25,000 standard)
Spec cached at 0.1× from turn 2 · 1-hr TTL mandatory

Phase 3 · Conformance Review

Check output vs spec

Verify implementation against spec criteria. Pattern matching — does the output satisfy the contracts, acceptance criteria, and security constraints? Not novel reasoning. Failures return to Phase 2 with specific deviation notes.

5–10 turns · 15–30 min · Haiku primary / Sonnet for edge cases
Avg regular input: ~3,000 tok/turn
Est. cost: ~$1.50–3 per review session

Why it works: the front-loading principle

The dominant agentic cost driver is regular input from file exploration and growing conversation history. In a standard agentic session, Claude reads 10–15 files to discover scope before work begins — each read compounds into every subsequent turn. Spec-driven development replaces this discovery phase with a single authored document. Claude reads the spec and the specific files it names. Nothing else.

25,000Avg regular input tokens per turn — standard agentic heavy

~13,000Avg regular input tokens per turn — spec-driven execution

−48%Reduction in the dominant cost driver from spec-driven execution

The second saving is wrong-direction turns. A standard agentic session at heavy usage accumulates 4–8 wrong-direction turns per session — turns where Claude builds something that gets discarded because a constraint wasn't understood upfront. Wrong-direction correction after 20 turns costs tens of thousands of tokens. A spec surfaces these errors at Phase 1 review, where correction costs ~500 tokens. The spec is Plan Mode applied to the inter-session lifecycle, not just a single session.

Why Opus is justified at Phase 1 — for all developer tiers

The cost model normally gates Opus by developer tier. Spec authoring creates an amortisation justification that applies regardless of tier.

# Opus amortisation — standard developer, standalone, heavy usage Phase 1: Spec writing session (Opus regional) Input: ~4,000 tok/turn × 12 turns = 48,000 tokens Output: ~2,000 token spec (written once, cached in Phase 2) Est. cost: ~$12.50 per spec session Phase 2: Standard agentic execution without spec Regular input: 25,000 tok/turn × 65 turns = 1,625,000 tokens Wrong-direction corrections: ~6 per session (tens of thousands of tokens) Monthly cost (heavy): ~$177.93 Phase 2: Spec-driven execution (Sonnet) Regular input: ~13,000 tok/turn × 65 turns = 845,000 tokens Wrong-direction corrections: ~0–1 per session Monthly cost (heavy): ~$155 Break-even: Opus spec overhead ($12.50) vs execution saving ($23/month) At 1 spec per 5 execution sessions → net saving ≈ $20.50/month At 1 spec per 2 execution sessions → still net positive ≈ $10.25/month CONCLUSION: Spec-driven Opus is recouped within the first execution session.

What a good spec contains

A spec is not a project overview or a requirements document. It is a set of actionable constraints Claude can follow and verify against. Every line should be something Claude will do differently because it's there.

○ Not a spec — too vague

Build a notification service. It should support email and SMS and be async. Handle errors properly and write tests.

No interface contracts. No queue technology specified. No error-handling pattern. No test coverage requirements. Claude will make assumptions — some wrong — and the execution session will diverge from intent.

✓ A spec — actionable constraints

Service: NotificationService Interface: send(notification: Notification) → Result[str, NotificationError] Queue: SQS via @infrastructure/queues.py — do not use Redis Retry: circuit-breaker in @src/resilience/circuit_breaker.py — max 3 retries, exponential backoff Channels: EmailProvider, SMSProvider — both must implement ChannelProtocol Errors: NotificationError(channel, reason, retryable: bool) Tests: unit per provider + integration test per channel · coverage ≥ 90% File layout: src/notifications/service.py src/notifications/providers/{email,sms}.py src/notifications/errors.py tests/notifications/

Interface contract, queue technology, retry pattern, error type, test requirements, file layout — all explicit. Phase 2 execution proceeds from turn 1 without discovery.

CLAUDE.md rule for spec-driven projects

The spec belongs in a separate project file — not embedded in CLAUDE.md. CLAUDE.md is loaded on every API call for every session. A 600-line spec embedded in CLAUDE.md adds ~9,000 tokens of cache-read cost per call, including calls that have nothing to do with the spec. A tight CLAUDE.md that references the spec file keeps cost low and the spec accessible only when Phase 2 execution loads it explicitly.

Telemetry signal. If CLAUDE.md exceeds 3,000 tokens on a spec-driven project, the spec has been embedded there. Strip it out, save it as SPEC.md or similar in the project root, and reference it from CLAUDE.md with a single line: Current sprint spec: @SPEC.md. This is the most common spec-driven cost failure mode.

Session hygiene rules specific to spec-driven work

Rule	Why
Do not /clear between Phase 1 and Phase 2	The spec is the context that must persist. /clear destroys it. Start Phase 2 immediately after Phase 1 in the same session, or resume the named session.
/compact at 80% context fill during Phase 2	Long execution sessions hit context limits. /compact preserves spec and architecture decisions, drops verbose tool output. Focus: `/compact focus on spec compliance and decisions made so far`
1-hour TTL — non-negotiable for Phase 2	The spec must survive as warm cached context across the full execution session without re-write cost. Standard agentic TTL guidance applies; spec-driven removes any flexibility here.
Phase 3 can start fresh	Conformance review sessions reference the spec file directly. /clear before Phase 3 is fine — provide the spec path explicitly at the start of the review session.
/clear between unrelated sprints	When a sprint is complete, /rename and /clear. The next spec-driven sprint starts with a fresh Phase 1 session.

Telemetry signals to watch

✓ Healthy spec-driven indicators

Regular input per turn in Phase 2: <15,000 tokens — spec is being used, exploration eliminated
Cache hit rate: >80% in Phase 2 — spec as stable fixed context is warming the cache
CLAUDE.md token count: <3,000 tokens — spec not embedded in CLAUDE.md
Opus usage concentrated in Phase 1 sessions only
Phase 3 sessions using Haiku for ≥70% of turns

○ Warning signals — take action

Regular input >20,000 tok/turn during Phase 2 → Claude is still file-exploring; spec scope not defined tightly enough
CLAUDE.md >3,000 tokens on spec-driven project → spec embedded; strip and externalise
Opus usage during Phase 2 or 3 → policy drift; Phase 2 should be Sonnet
Wrong-direction corrections >2 per Phase 2 session → spec quality issue; revisit Phase 1
Phase 3 using Sonnet for all turns → Haiku is sufficient for conformance checking

Knowledge Check — Factor 03

Scenario: You've completed Phase 1 and written a 2,000-token spec for a new authentication service. Phase 2 execution is about to begin. Your teammate suggests embedding the spec into CLAUDE.md so it's always available, and then running /clear to start Phase 2 fresh. Are they right?

A. Yes — CLAUDE.md is the right place for spec content and a fresh /clear ensures a clean Phase 2 start B. No on both counts. The spec belongs as a separate file (e.g. SPEC.md), not embedded in CLAUDE.md. And /clear between Phase 1 and Phase 2 destroys the context that Phase 2 needs — do not clear between phases. C. Embed the spec in CLAUDE.md but don't /clear — Phase 2 can continue from Phase 1 D. /clear is correct but embed the spec as a file reference in the new session prompt

Show answer

B is correct. Two mistakes in the suggestion: (1) CLAUDE.md loads on every API call in every session. A 2,000-token spec embedded there adds ~$0.007 of cache-read overhead per call across all sessions including those unrelated to this sprint. Keep CLAUDE.md under 200 lines of actionable instructions; save the spec as SPEC.md and reference it. (2) /clear between Phase 1 and 2 destroys the spec context Phase 2 depends on. Continue Phase 2 directly from Phase 1, or /rename the session and resume it. Phase 2 requires the spec to be live in context — not rebuilt from scratch.

←

Factor 02 — Plan Mode

Factor 04 — Session Hygiene

→

Wiki / Optimisation / Session Hygiene

Factor 04 of 12

Session Hygiene — /clear and /compact

Every API call processes the full conversation history to date. Accumulated irrelevant turns add tokens to every subsequent message. The fix is simple: clear aggressively between tasks.

TL;DR

Use /clear between every unrelated task in DevSecOps work. If Claude has made the same mistake twice on the same issue, clear and start fresh — a clean session with a better prompt outperforms a long session with accumulated corrections. For long agentic sessions, use /compact at around 80% context fill instead of clearing.

💰

Cost Impact: High — DevSecOps. Important — Agentic.

A 20-turn session where turns 15–20 are a different task than turns 1–14 carries 14 turns of irrelevant context as overhead on those last 6 messages. Each of those 6 messages pays full input price for context it will never use. Anthropic: "If you've corrected Claude more than twice on the same issue in one session, the context is cluttered with failed approaches. Run /clear and start fresh with a more specific prompt."

15–25% regular input reduction from consistent session hygiene

The two commands

Command	What it does	When to use it
`/clear`	Completely resets the context window — all conversation history is gone. A truly fresh start.	Between unrelated tasks in DevSecOps work. After two failed corrections on the same issue. Switching to a different project or codebase.
`/rename`	Names the current session before clearing, so you can resume it later with `/resume`.	Always run `/rename` before `/clear` if you might want to return to the session.
`/compact`	Summarises the conversation — keeps important code and decisions, removes verbose history. Preserves continuity.	In long agentic sessions when context approaches 70–80% full, or when you want to continue the same thread but reduce noise.
`/compact focus on X`	Focuses the summary on what matters, e.g. `/compact focus on the API changes made so far`.	When you want to continue with a specific subset of context from a long session.
`/context`	Shows what's currently using context — system prompt, files, conversation, tools. Helps you diagnose bloat.	Before deciding whether to compact or clear. After connecting new MCP servers.

DevSecOps — the session boundary rule

In DevSecOps work, every meaningful task boundary is a /clear boundary. The goal is to keep sessions within their intended type — Micro (5 turns), Standard (9 turns), Extended (15 turns).

✗ No hygiene — "session drift"

Turn 1: Review @auth/login.py for SQL injection Turn 5: Good. Now check the session middleware too Turn 9: And while we're at it, what about the API rate limiting? Turn 13: OK that's interesting — can you also look at the Redis config? Turn 17: One more thing — the password reset flow has a bug...

By turn 17, the context contains auth code, session middleware, rate limiting analysis, Redis configuration, and password reset logic. Each new message pays input cost for all of it.

✓ Clean hygiene — proper boundaries

Turn 1: Review @auth/login.py for SQL injection Turn 5: Looks good, thanks. /rename auth-sql-review-2026-03-28 /clear Turn 1: Review @middleware/session.py for session fixation Turn 4: Done. /clear Turn 1: Check rate limiting in @api/throttle.py...

Each task starts fresh. Turn 1 of each session has only the 22k system context plus the immediate task. Every session stays micro-sized. Total cost is a fraction of the drifted session.

Agentic — when to compact vs clear

Compact — use when continuity matters

# You're 50 turns into building a notification service. # Context is 75% full. Claude has built the email and SMS # providers. You still need push notifications. /compact focus on the notification service structure built so far and the interfaces of the email and SMS providers

Claude preserves the architecture decisions and interfaces, drops verbose tool output and exploration history. You continue with a clean but informed context for the push notification work.

Clear — use when switching direction

# You've finished the notification service. # Now you're starting an entirely different task: # building the user preference settings API. /rename notification-service-complete /clear # Fresh session for the new task Build a user preference settings API following the pattern in @src/api/v2/messages.py...

Nothing from the notification service is relevant to preferences. Keeping it in context adds cost with zero benefit. Clear completely.

The two-correction rule. Anthropic's guidance: if you've corrected Claude on the same mistake twice in a session, the context is cluttered with failed attempts. /clear and restate the task with the corrections already incorporated into the prompt. A clean session with a better prompt almost always produces a better result at lower cost.

Spec-driven — the phase boundary rule

Spec-driven development introduces a different /clear logic. The standard rule — clear between every unrelated task — does not apply between Phase 1 and Phase 2. The spec written in Phase 1 is the context Phase 2 depends on. Clearing it destroys the primary cost benefit of the entire approach.

✗ Wrong — /clear between phases

# Phase 1 complete — spec written /rename auth-service-spec /clear # Phase 2 starts fresh @SPEC.md implement Phase 2...

The spec is gone from live context. Claude re-reads it from disk on every reference — as regular input, not cached context. The primary cost benefit of spec-driven execution is lost. The spec must be live in the context window from Phase 1 through Phase 2.

✓ Correct — continue directly from Phase 1

# Phase 1 complete — spec is live in context # Do NOT /clear — continue Phase 2 immediately Now execute against the spec. Start with src/auth/service.py — implement the AuthService interface as defined...

The spec stays warm in cached context from Phase 1. Every Phase 2 turn reads it at 0.1× cost. This is the core caching benefit of spec-driven development — do not break it with a /clear.

Phase	/clear rule	Why
Phase 1 → Phase 2	Do not /clear	Spec must remain live in context. Continue directly or resume named session.
Within Phase 2 (80% context)	`/compact` — not /clear	Preserve spec and architecture decisions. Focus: `/compact focus on spec compliance and decisions made`
Phase 2 → Phase 3	/clear is fine	Conformance review starts fresh, references spec file directly by path.
Between unrelated sprints	/rename then /clear	New sprint = new Phase 1 = new spec. Old context has no value.

Persistent instructions go in CLAUDE.md, not conversation

If you find yourself re-stating the same instructions every session — "always use our custom logger, never use print() directly" — these belong in CLAUDE.md, not in your prompt. Instructions in CLAUDE.md survive /clear. Instructions in conversation history do not. See Factor 07 — CLAUDE.md for how to structure persistent project instructions. The spec, however, is not CLAUDE.md material — see Factor 03 for why.

Knowledge Check — Factor 04

Scenario: You're in turn 8 of a DevSecOps session. You asked Claude to fix a bug in the payment validator, and it's made the same mistake twice — using float() for currency amounts when your codebase requires Decimal(). What should you do next?

A. Correct it a third time and hope it sticks B. Run /clear, then restart with a prompt that explicitly states "all currency amounts must use Python's Decimal type, never float() — this is non-negotiable" C. Run /compact and continue D. Switch to a different model

Show answer

B is correct. Two corrections on the same issue is the trigger to clear. The context is now cluttered with two failed attempts. A fresh session with the constraint stated upfront — "all currency amounts must use Decimal, never float" — will be faster, cheaper, and produce a correct result. Also consider adding this to CLAUDE.md so you never need to state it again.

←

Factor 03 — Spec-Driven Development

Factor 05 — Model Selection & Effort

→

Wiki / Optimisation / Model Selection

Factor 05 of 12

Model Selection & Effort Levels

Defaulting to Sonnet for every task is the most common unnecessary cost. Haiku handles a large fraction of DevSecOps and spec-driven conformance work at 3× lower price. Effort levels let you tune reasoning depth without changing model.

TL;DR

Use Haiku for routine tasks and spec-driven Phase 3 conformance review (pattern matching, not reasoning). Use Sonnet for code review, security analysis, bug fixes, and spec-driven Phase 2 execution. Use Opus for spec authoring (Phase 1) — justified for all developer tiers through amortisation — and for novel reasoning tasks with senior approval. Use /effort low to reduce thinking depth on routine Sonnet tasks.

💰

Cost Impact: High — DevSecOps. Moderate — Agentic.

Haiku costs $1.10/MTok input vs Sonnet at $3.30/MTok — a 3× difference. In DevSecOps work where 25–30% of interactions are routine tasks, consistently routing these to Haiku produces meaningful monthly savings. Under spec-driven development, Phase 3 conformance review is pattern matching — Haiku handles it at the same quality as Sonnet at 3× lower cost, raising the blended Haiku share from 15% toward 20%.

Haiku routing: 25–30% DevSecOps · 10–15% Agentic standard · ~20% Agentic spec-driven (Phase 3)

Model selection guide

Task type	Right model	Why
Docstring and comment generation	Haiku	Pattern completion — no deep reasoning needed
Pipeline failure triage	Haiku	Log pattern matching, not novel analysis
Simple syntax fixes and formatting	Haiku	Mechanical transformation
Dependency version lookups and compatibility checks	Haiku	Factual retrieval
Boilerplate and scaffold generation	Haiku	Template filling, no design decisions
GitLab issue summarisation	Haiku	Text processing
Spec-driven Phase 3 — conformance review	Haiku	Pattern matching against defined spec criteria — not novel reasoning. Haiku handles it at the same quality at 3× lower cost
Code review and security analysis	Sonnet	Requires understanding of intent and edge cases
Bug fix analysis and implementation	Sonnet	Reasoning about cause, effect, and constraints
Test generation (non-trivial coverage)	Sonnet	Understanding behaviour under failure modes
Multi-file feature implementation	Sonnet	Sustained multi-turn reasoning
Spec-driven Phase 2 — execution	Sonnet	Implementation against spec. Haiku eligible for simple bounded turns within spec scope
IRAP / compliance gap analysis	Opus (senior)	Complex multi-control reasoning; error consequences are high
Novel exploit chain assessment	Opus (senior)	ARC-AGI-2 advantage: 68.8% Opus vs 58.3% Sonnet
Architectural design for greenfield service	Opus (senior)	Single planning session prevents many wrong-direction turns
Spec-driven Phase 1 — spec authoring	Opus (all tiers)	One Opus session amortised across 40–65 Sonnet execution turns. ARC-AGI-2 advantage most material for interface design and scope decisions. See Factor 03.

Switching models mid-session

Switch models at any time during a session with the /model command. The conversation history carries over, but new API calls use the new model's pricing.

# Start a session in Haiku for triage work /model haiku Tell me why pipeline #4821 test-unit job is failing based on this log: [paste log] # Issue identified — switch to Sonnet to implement the fix /model sonnet Fix the ImportError in @tests/test_cache.py — the TokenCache class was renamed to CacheToken in the last commit

Effort levels

Effort levels control how deeply Claude reasons before responding — separate from which model you're using. Sonnet 4.6 and Opus 4.6 support four levels:

Effort level	Command	Token cost	When to use
Low	`/effort low`	Lowest	Routine DevSecOps tasks: doc strings, simple syntax fixes, basic formatting, quick lookups
Medium (default)	`/effort medium`	Moderate	Most coding tasks — Anthropic: "Medium is recommended for most coding tasks."
High	`/effort high`	High	Security analysis, architectural decisions, complex bug investigation
Max (Opus only)	`/effort max`	Highest — does not persist	The most complex reasoning tasks. Senior-approved Opus sessions only.

Knowledge Check — Factor 04

Scenario: You're about to start a session that will generate docstrings for 30 functions across 5 files. Which setup minimises cost?

A. Sonnet, default effort (medium) B. Haiku, low effort C. Haiku, default effort (medium) D. Sonnet, high effort — quality docstrings require deep reasoning

Show answer

B is correct. Docstring generation is pattern completion — no deep reasoning needed. Haiku handles it well at 3× less cost than Sonnet. Low effort eliminates the thinking block overhead. Starting with /model haiku and /effort low before the first prompt sets both optimisations for the whole session.

←

Factor 04 — Session Hygiene

Factor 06 — Subagents

→

Wiki / Optimisation / Subagents

Factor 06 of 12

Subagents — Targeted Delegation

Subagents are lightweight Claude instances spawned within your session for specific sub-tasks. Used correctly, they reduce your main session's context overhead by offloading research, verification, and low-intelligence work to cheaper models.

TL;DR

Use subagents by asking Claude directly: "Use a subagent to investigate X." Route research, verification, and routine sub-tasks to Haiku subagents to avoid polluting the main session's context. Subagents are enabled by default — no special configuration needed.

💰

Cost Impact: High — Agentic.

Without subagents, all investigation output — file reads, shell command results, exploration — accumulates in the main session context and is re-sent on every subsequent API call. A subagent handles the investigation and returns only its summary to the main session. Main session input stays clean.

Reduces main session regular input by 20–40% on investigation-heavy tasks

How to invoke subagents

You don't need special syntax. Ask Claude naturally, specifying the model if you want cost control:

# Research task — route to Haiku subagent Use a Haiku subagent to check whether our Redis client version (redis-py 4.5.4) is compatible with Redis 7.2. Return just the compatibility verdict and any breaking changes. # Verification task — Haiku subagent Use a subagent to verify that all test files in @tests/ import from the correct module paths after the refactor. Return a list of any files with broken imports. # Complex investigation — let Claude choose the model Use a subagent to investigate why the auth middleware is adding 200ms latency. Check the middleware chain in @src/middleware/ and return the most likely cause.

Good subagent use cases

✓ Route to subagents

Library/version compatibility checks
Scanning for a pattern across many files
Verifying test coverage for a specific module
Checking whether a dependency is already present
Summarising a large file or log output
Running shell commands and returning results

○ Keep in main session

Any work that needs the full conversation context
Implementation decisions that depend on prior turns
Security analysis requiring nuanced interpretation
Tasks where you need to review Claude's reasoning
Anything that modifies files in your project

Subagents vs Agent Teams. Subagents work within your session and are always available. Agent Teams spawn entirely separate parallel sessions and require explicit approval — they're covered in Factor 09. If you're not sure which you need, use a subagent.

Knowledge Check — Factor 06

Scenario: You're implementing a new feature and need to know whether three specific third-party libraries are already in your requirements.txt, and if so, what versions. What's the most cost-effective approach?

A. Ask Claude directly — it will check the file itself B. "Use a Haiku subagent to check @requirements.txt for flask, celery, and redis-py — return the exact version pins or 'not present' for each" C. Check requirements.txt yourself and paste the relevant lines D. Use /plan to investigate the dependencies

Show answer

B is correct — but C is nearly as good. B is optimal because the Haiku subagent reads the file and returns only the specific answer, keeping main session context clean and using the cheapest model. C (reading it yourself and pasting) is also excellent — you've offloaded the file read and added only the relevant lines to context. A is worse because Claude reads the full file into main session context. D is wrong — Plan Mode is for code changes, not simple file lookups.

←

Factor 05 — Model Selection & Effort

Factor 07 — CLAUDE.md & .claudeignore

→

Wiki / Optimisation / CLAUDE.md

Factor 07 of 12

CLAUDE.md & .claudeignore

CLAUDE.md is loaded into every session automatically. That's powerful — but it means every token in it is billed on every API call. Keep it sharp: instructions Claude follows, not background context Claude reads.

TL;DR

Keep CLAUDE.md under 200 lines. Write only actionable instructions — things Claude will actually do differently because they're there. Explanatory background, historical context, and architecture decisions don't belong here; they cost tokens on every call without influencing output. Configure .claudeignore before your first session on any brownfield project.

💰

Cost Impact: Medium-High — All Patterns.

Every token in CLAUDE.md is included in the cached system context — which means it's billed (at cache read price) on every single API call in every session on that project. A bloated 600-line CLAUDE.md with explanatory prose adds ~9,000 tokens of cache read cost per call that produces no better output than a focused 150-line CLAUDE.md.

Target: under 200 lines · 100% actionable · zero narrative prose

What belongs in CLAUDE.md

✓ Include

Language and framework versions in use
Coding conventions Claude must follow (e.g. "use Decimal for all currency")
Security rules (e.g. "never log PII, never use raw string queries")
Test patterns and required coverage level
Key file paths for common reference
Forbidden patterns that Claude should refuse to implement
Build and run commands

○ Do not include

Why you chose a technology (history/rationale)
Architecture evolution narrative
Git branching strategy explanations
Team onboarding context
Long lists of "things to know about the project"
Content from other docs pasted in wholesale

CLAUDE.md template — secd3v projects

# Project: [service-name] # Environment: ap-southeast-2 · IRAP Protected ## Stack Python 3.12 · Django 5.1 · PostgreSQL 16 · Redis 7.2 · Celery 5.4 AWS: ECS Fargate, RDS, ElastiCache, SQS, S3 ## Critical rules — follow always - All currency: use Decimal, never float - All logging: use src/logging/structured.py — never print() - All API responses: use ResponseWrapper from src/api/base.py - No raw SQL string interpolation — parameterised queries only - Never log: passwords, tokens, PII, credit card data - Migrations: never auto-generate — write manually and review ## Test requirements - pytest · fixtures in tests/conftest.py - Coverage must not drop below 85% - Security-related functions: always add failure-mode tests ## Key paths src/api/base.py — base views and ResponseWrapper src/auth/ — all authentication logic src/logging/ — structured logging utilities infrastructure/ — IaC (Terraform) — do not modify in Claude sessions ## Forbidden - Do not modify: migrations/, infrastructure/, .env files - Do not use: requests library (use httpx), time.sleep() (use asyncio)

Spec-driven projects — the critical bloat failure mode

When using spec-driven development, the most common CLAUDE.md failure is embedding the spec directly into it. A 2,000-token spec embedded in CLAUDE.md adds that cost to every API call in every session — including sessions that have nothing to do with the current sprint. The spec should live as a separate file loaded only during Phase 2 execution.

✗ Wrong — spec embedded in CLAUDE.md

# CLAUDE.md — 350 lines ## Project rules... ## Current sprint spec: Service: NotificationService Interface: send(notification: Notification)... [300 more lines of spec content]

Every API call in every session — including Phase 3 conformance review, unrelated DevSecOps sessions, CI automation — pays cache-read cost for 2,000+ tokens of spec content. At Sonnet rates: ~$0.007 extra per call, ~$5–8/month unnecessary cost per developer.

✓ Correct — spec as separate file

# CLAUDE.md — 45 lines ## Project rules... ## Current sprint spec: @SPEC.md # SPEC.md — separate file, 300 lines Service: NotificationService Interface: send(notification: Notification)... [300 lines of spec content]

CLAUDE.md stays under 200 lines. SPEC.md loads on demand during Phase 2 execution. Phase 3 and unrelated sessions don't pay for spec content they don't use.

Telemetry signal: CLAUDE.md above 3,000 tokens on a spec-driven project. Run /context and check the CLAUDE.md line. If it's above 3,000 tokens, spec content has been embedded. Strip it, save as SPEC.md in the repo root, and replace with a single reference line in CLAUDE.md. Target: CLAUDE.md under 200 lines regardless of spec-driven status.

.claudeignore — essential for Brownfield

Without .claudeignore, Claude can read and index any file in your repository when prompted broadly. In Brownfield repos with years of accumulated build artifacts, this can exhaust your session budget on a single general query.

# .claudeignore — place in repository root # Build outputs dist/ build/ *.egg-info/ __pycache__/ *.pyc .pytest_cache/ # Dependencies node_modules/ vendor/ .venv/ env/ # Generated files *.min.js *.min.css coverage/ .coverage htmlcov/ # Binaries and large files *.pdf *.png *.jpg *.zip *.tar.gz migrations/ # if auto-generated # IDE and OS .idea/ .vscode/ .DS_Store

Brownfield repositories are especially dangerous. Years of accumulated build artifacts, test output, generated migration files, and cached data can easily total millions of tokens. A prompt like "what's in the project?" without .claudeignore can trigger a scan that exhausts a session budget before producing a single useful response. Configure .claudeignore on day one of every brownfield project.

Knowledge Check — Factor 07

Scenario: Your CLAUDE.md currently includes a 400-word explanation of why your team chose Django over FastAPI, a description of your Git branching strategy, a detailed history of the project's architecture evolution, and the full 2,000-token spec for the current sprint. Your team is hitting high token costs. What should you prioritise fixing first?

A. Keep it all — context helps Claude make better decisions B. Remove the Django rationale and branching strategy only — the spec and architecture need to stay C. Extract the sprint spec to SPEC.md first (highest per-call cost impact), then remove the rationale/branching narrative entirely, then keep architecture decisions only if they're expressed as actionable rules Claude can follow. D. Compress everything to 200 words and keep it all in CLAUDE.md

Show answer

C is correct. The 2,000-token spec is the highest-impact fix — it adds cost to every API call including unrelated sessions. Extract it to SPEC.md immediately and replace with a single reference line. Then remove the Django rationale, branching strategy, and architecture history — none of these are actionable instructions Claude follows differently on any specific task. If architecture decisions are expressed as rules (e.g. "always use ResponseWrapper") they belong; if they're background reading, they don't.

←

Factor 06 — Subagents

Factor 08 — Cross-Session Cache Grouping

→

Wiki / Optimisation / Cache Grouping

Factor 08 of 12

Cross-Session Cache Grouping

With 1-hour TTL caching, the platform can share a cache write across multiple sessions on the same project within an hour. Your job as a developer is to work in a way that lets this happen.

TL;DR

Group related DevSecOps sessions on the same project into focused 30–60 minute working blocks. Keep your CLAUDE.md stable during these blocks — changes invalidate the cache. Sessions 2 and 3 on the same project within an hour benefit from the cache established by session 1, saving the cost of a fresh write.

💰

Cost Impact: High Structural — DevSecOps.

The 22,000-token system context (31,000 with GitLab MCP) is written to cache at the start of each session at 2.0× input price. With 1-hr TTL, the service routes subsequent sessions on the same project to the same cache state — sessions 2 and 3 read at 0.1× price instead of re-writing at 2.0×. This saves $6–14 per developer per month (Sonnet regional).

$6–14/dev/month savings from cross-session sharing (Sonnet)

Cache economics at a glance

$0.145Cost of one 22k cache write (Sonnet regional 1-hr TTL)

$0.007Cost of one 22k cache read (Sonnet regional) — 95% cheaper

3 sessionsIf grouped in 60 min: 1 write + 2 reads = $0.159 vs $0.435 for 3 separate writes

How to group sessions in practice

1
Plan your morning DevSecOps block. Identify which project you'll work on for the first hour. MR reviews, security checks, and bug triage on the same repository are natural groupings.
2
Start your first session. This writes the cache for the project. The 1-hour TTL clock starts now.
3
Use /clear between tasks, not between projects. /clear resets conversation history but does not invalidate the prompt cache. Same project, same CLAUDE.md = cache remains warm.
4
Keep CLAUDE.md stable during the block. Editing CLAUDE.md changes the cached content and requires a new write. If you need to update it, do so outside your working block.
5
The TTL resets on every cache hit. Actively working sessions stay warm automatically. A 3-hour focused session on one project writes the cache once and benefits from cheap reads throughout.

✗ Scattered sessions — multiple cache writes

9:00 AM — Review MR !231 on project-alpha [cache write: $0.145] 9:20 AM — Check a config in project-beta [cache write: $0.145] 9:40 AM — Quick fix in project-alpha [cache write: $0.145 — alpha cache expired] 10:00 AM — Back to MR review on project-alpha [cache write: $0.145] 10:20 AM — Audit issue in project-gamma [cache write: $0.145] 5 cache writes = $0.725

Jumping between projects prevents the cache from warming. Each return to a project requires a fresh write because the previous session's TTL has expired during the detour.

✓ Grouped sessions — cache shared

9:00 AM — Review MR !231 on project-alpha [cache write: $0.145] 9:20 AM — /clear, then: fix auth bug, project-alpha [cache READ: $0.007] 9:40 AM — /clear, then: security audit, project-alpha [cache READ: $0.007] 10:00 AM — /clear, then: final MR check, project-alpha [cache READ: $0.007] 10:20 AM — Move to project-beta [cache write: $0.145] 2 cache writes + 3 reads = $0.311 vs $0.725

Grouping project-alpha work into a single focused block means 4 sessions share one cache write. Total cache cost is less than half of the scattered approach.

You don't need to do anything special to enable this. The platform handles cache routing. Your job is to create the conditions: group related sessions on the same project, keep CLAUDE.md stable during your working block, and don't unnecessarily jump between projects mid-hour.

Knowledge Check — Factor 08

Scenario: You've been working on project-alpha for 45 minutes — 3 sessions, all within the 1-hour TTL window. You've just used /clear to start session 4. You then remember you need to update one line in CLAUDE.md. Should you update it now?

A. Yes — update it immediately so the new session picks it up B. No — wait until after the current working block. Editing CLAUDE.md invalidates the cache and session 4 will need a fresh write. Add the rule now in your prompt as a temporary constraint, and update CLAUDE.md later. C. It doesn't matter — CLAUDE.md changes don't affect caching D. Use /compact before editing CLAUDE.md

Show answer

B is correct. Editing CLAUDE.md changes the cached content hash, forcing a new cache write at the start of the next session. With 15 minutes remaining in your block, you'd lose the benefit of the warm cache. State the new constraint in your current prompt, then update CLAUDE.md at the end of your block. That way session 4 continues on the warm cache.

←

Factor 07 — CLAUDE.md & .claudeignore

Factor 09 — MCP Tool Search

→

Wiki / Optimisation / MCP Tool Search

Factor 09 of 12

MCP Tool Search — Default Deferred Loading

MCP tool definitions used to load entirely at session start, adding thousands of tokens of overhead before you typed a word. Tool Search changed this — now only names load upfront, and full schemas load on demand.

TL;DR

Tool Search is enabled by default in recent Claude Code versions (v2.1+) and significantly reduces MCP context overhead. Verify it's active with /mcp — you should see "deferred" status for tool definitions. Connect GitLab MCP only for sessions where you'll actually use it, and disconnect for pure coding sessions where you won't.

💰

Cost Impact: Medium — GitLab MCP users.

"Tool search keeps MCP context usage low by deferring tool definitions until Claude needs them. Only tool names load at session start, so adding more MCP servers has minimal impact on your context window." Previously, GitLab MCP's ~35 tool definitions added approximately 9,000 tokens at session start regardless of whether GitLab was used. With Tool Search, only tool names (~500 tokens) load upfront.

Saves $8–14/dev/month in mixed workflows (Sonnet) vs eager loading

Before and after Tool Search

Before Tool Search (eager loading)

# Session starts — GitLab MCP connected # Context window at session start: System prompt: 3,900 tokens Tool definitions: 16,600 tokens GitLab MCP schemas: 9,000 tokens ← loaded regardless CLAUDE.md: 1,500 tokens ───────────────────────────────────── Total fixed context: 31,000 tokens # Even if you never use a GitLab tool in this session, # you pay 9,000 tokens overhead on every API call.

With Tool Search (deferred — default v2.1+)

# Session starts — GitLab MCP connected # Context window at session start: System prompt: 3,900 tokens Tool definitions: 16,600 tokens GitLab MCP tool names: 500 tokens ← names only CLAUDE.md: 1,500 tokens ───────────────────────────────────── Total fixed context: 22,500 tokens # GitLab tool schemas load only when Claude # needs to call a specific GitLab tool.

Verify Tool Search is active

# Run this at the start of any session with MCP connected /mcp # Good — Tool Search active: GitLab MCP: connected Status: deferred (tool search enabled) Loaded: 0/35 tool schemas # Warning — eager loading (older config or Tool Search disabled): GitLab MCP: connected Status: loaded Loaded: 35/35 tool schemas [9,247 tokens]

MCP connection hygiene

Scenario	MCP configuration
Pure coding session — no GitLab operations needed	Disconnect GitLab MCP. Zero overhead.
MR review session — reading diffs, adding comments, checking pipeline status	Connect GitLab MCP. Tool Search loads schemas only for tools you actually call.
Mixed session — some coding, occasional GitLab checks	Connect GitLab MCP. With Tool Search active, schemas only load when needed.
Multiple MCP servers connected that you rarely use	Disconnect servers you don't need. Each server adds tool names at session start even with deferred loading.

# Add a server when needed claude mcp add --transport http GitLab https://<your-instance>/api/v4/mcp # List connected servers /mcp # Remove a server when done claude mcp remove GitLab

Watch for large diff outputs. A GitLab MR diff on a large feature branch can return 30,000–50,000 tokens. Ask Claude to summarise the diff rather than loading it entirely: "Use GitLab MCP to get MR !847's diff, but only read and summarise the changes in src/auth/ — skip other directories."

Knowledge Check — Factor 09

Scenario: You're starting a 2-hour coding session to refactor the payment module. You have GitLab MCP connected from yesterday's MR review work. You won't need any GitLab operations today — just local file editing and testing. What should you do?

A. Leave it connected — Tool Search means it costs almost nothing B. Disconnect GitLab MCP at the start of the session. Even with Tool Search, tool names add overhead and occasional discovery calls may occur. Zero need = zero cost. C. Run /mcp to check if the overhead is acceptable D. It makes no difference either way

Show answer

B is correct. Tool Search significantly reduces (but doesn't eliminate) MCP overhead. Tool names still load at session start, and Claude may occasionally initiate tool discovery calls during complex reasoning. For a 2-hour pure coding session with no GitLab operations planned, disconnecting is the right call.

←

Factor 08 — Cross-Session Cache Grouping

Factor 10 — Agent Teams

→

Wiki / Optimisation / Agent Teams

Factor 10 of 12

Agent Teams — Explicit Cost Governance

Agent Teams are Claude Code's most powerful capability and its most expensive. They multiply your token consumption approximately 7× per additional team member. Misuse can turn a $178/month developer into a $1,200+/month developer overnight. For further information refer to the Claude Code docs

TL;DR

Agent Teams are disabled by default and require explicit enabling. Use them only for genuinely parallelisable work — independent feature branches, concurrent test generation across separate modules. Never use them for tasks that can be done serially. Clean up active teammates immediately when done; idle teammates continue consuming tokens.

🚨

Cost Impact: Critical — Requires Explicit Approval.

Anthropic: "Agent teams use approximately 7× more tokens than standard sessions when teammates run in Plan Mode, because each teammate maintains its own context window and runs as a separate Claude instance." A heavy agentic developer running Agent Teams daily moves from ~$178/month to ~$1,200+/month. Agent Teams are disabled by default and require explicit service-layer approval.

~7× token multiplier per team member · requires explicit approval

Single Agent vs Subagent vs Agent Team

	Single Agent	Subagent	Agent Team
Context windows	1	2 (main + subagent)	1 per team member
Token multiplier	1×	~1.2–1.5×	~7× per member in Plan Mode
Enabled by default	Yes	Yes	No — requires explicit flag
Good for	All work	Research, verification, cost routing	Genuinely parallel independent tasks only
Service approval needed	No	No	Yes

When Agent Teams are appropriate

✓ Good use cases

Building 3 independent microservices simultaneously
Generating test suites for separate, unrelated modules in parallel
Running concurrent documentation generation across disconnected packages
Parallel migration scripts for independent database tables
Any work where tasks are truly independent and have no shared state

○ Wrong use cases — use serial instead

Sequential tasks disguised as parallel ones
Tasks that share files or modules
Work where output of task A feeds input of task B
Feature implementation (use Plan Mode + Sonnet instead)
Any task a single focused session could handle

Clean up immediately when done. Active Agent Team members continue consuming tokens even when idle between instructions. Always explicitly dismiss teammates when their task is complete. Leaving a 3-member team active overnight is a significant unintended cost event.

Knowledge Check — Factor 10

Scenario: You need to implement a new authentication service and write a comprehensive test suite for it. The tests depend on the implementation being complete. Should you use an Agent Team?

A. Yes — two teammates: one builds the service, one writes tests simultaneously B. No — the tests depend on the implementation. These are sequential tasks. Use Plan Mode + Sonnet in a single session: plan first, implement, then write tests. C. Yes — Agent Teams always produce better results for complex tasks D. Only if the test suite will be over 100 tests

Show answer

B is correct. The tests depend on the implementation — they're inherently sequential, not parallel. An Agent Team would waste the 7× multiplier on two instances that can't actually work simultaneously. The right approach: single session, Plan Mode to design the service, implementation, then test generation. This is exactly what Plan Mode + Sonnet is designed for.

←

Factor 09 — MCP Tool Search

Factor 11 — Extended Thinking

→

Wiki / Optimisation / Extended Thinking

Factor 11 of 12

Extended Thinking

Extended thinking is enabled by default and significantly improves performance on complex reasoning tasks. It's also invisible — you don't see the thinking tokens, but you pay for them. Unconstrained, it can add substantial cost to sessions that don't need deep reasoning. For further information see the Claude Code docs

TL;DR

Use /effort low for routine DevSecOps tasks — documentation, formatting, simple fixes, pipeline triage. Use /effort high for security analysis, architectural decisions, novel problems where Claude needs to think carefully. The default (medium) is appropriate for most coding work. Never leave effort at high or max for entire sessions of routine work.

💰

Cost Impact: Significant when unmanaged — especially on high-volume simple tasks.

Anthropic: "Extended thinking is enabled by default because it significantly improves performance on complex planning and reasoning tasks. Thinking tokens are billed as output tokens." At Sonnet 4.6 regional rates, a 4,000-token thinking block costs $0.066 per call. In a 25-turn medium session, unconstrained thinking can add $1.65 to session cost on work that didn't need it.

4k thinking block at Sonnet: $0.066/call · 25-turn session: up to +$1.65

Controlling extended thinking

Method	Effect	When to use
`/effort low`	Reduces thinking depth to minimum. Claude still reasons but briefly.	Routine DevSecOps tasks: documentation, simple fixes, triage, boilerplate
`/effort medium` (default)	Balanced thinking for most coding work.	Code review, bug fixes, standard implementation
`/effort high`	Deeper reasoning. Higher cost. Worth it for complex problems.	Security analysis, architectural decisions, novel problems
`MAX_THINKING_TOKENS=8000`	Hard cap on thinking tokens (env variable).	If you want a ceiling regardless of effort level
`MAX_THINKING_TOKENS=0`	Disables extended thinking entirely.	Automation pipelines where deep reasoning is never needed

A practical session strategy

# Start a DevSecOps session — most tasks today are routine /effort low Add Google-style docstrings to the 4 public methods in @src/api/client.py # → Low effort. Fast, cheap, correct. Fix the typo on line 44 of @src/utils/validator.py: 'paramter' → 'parameter' # → Low effort. Trivial. Done. # Now a genuinely complex task /effort high Review @src/auth/oauth.py for security vulnerabilities. We're integrating with a third-party OAuth provider that supports token exchange. Identify any flows that could allow token substitution or replay attacks. # → High effort. Deep reasoning justified. # After the analysis, return to low effort for the rest of the session /effort low

Extended thinking vs Opus for seniors

Sonnet + high effort — when:

The problem is complex but well-defined
You have enough context to specify constraints clearly
The task is within Sonnet's 79.6% SWE-bench capability
Cost is a significant concern

Opus + medium effort — when:

Genuinely novel reasoning required (ARC-AGI-2 class)
Security threat modelling with ambiguous attack surfaces
Compliance analysis where multiple control interpretations exist
Sonnet has already failed twice at high effort

Knowledge Check — Factor 11

Scenario: You're about to start a session that will generate docstrings for 30 functions across 5 files. Which setup minimises cost?

A. Sonnet, default effort (medium) B. Haiku, low effort C. Haiku, default effort (medium) D. Sonnet, high effort — quality docstrings require deep reasoning

Show answer

←

Factor 10 — Agent Teams

Factor 12 — Token Telemetry

→

Wiki / Optimisation / Token Telemetry

Factor 12 of 12

Token Telemetry — Measure Before Optimising

Without instrumentation, cost optimisation is guesswork. The Claude Code service and CLI give you several tools to see exactly what's consuming tokens before you try to fix it.

TL;DR

Run /context at the start of any session where you suspect bloat. Run /cost at the end of sessions to understand total usage. Check the platform dashboard monthly for team-level patterns. If your cache hit rate is under 70%, your DevSecOps regular input per turn is over 15,000 tokens, or your CLAUDE.md exceeds 3,000 tokens on a spec-driven project — something is wrong and fixable.

📊

Cost Impact: Foundation — Enables All Other Optimisations.

You can't fix what you can't measure. Token telemetry tells you whether your CLAUDE.md is too large, whether your sessions are drifting, whether your cache is actually warm, whether a spec-driven project has embedded spec in CLAUDE.md, and whether a particular developer is consuming 10× more than peers. Without these signals, optimisation is guesswork.

In-session commands

Command	What it shows	When to use
`/context`	Full breakdown of what's in context: system prompt, CLAUDE.md, files, conversation history, tool definitions. Shows token counts per component.	At session start when cost seems high. After connecting MCP servers. When debugging unexpected token usage. At Phase 2 start to verify spec is loaded.
`/cost`	Total cost for the current session: input tokens, output tokens, cache writes, cache reads, estimated cost.	At the end of any session to understand where money went. After Agent Team sessions to verify the multiplier. After Phase 1 spec sessions to confirm Opus cost.
`/tokens`	Current token count and how close you are to the context limit.	Before deciding whether to /compact or /clear. In long agentic and spec-driven Phase 2 sessions to track growth.

Key health metrics — all patterns

>70%Target cache hit rate — below this means sessions are too scattered or CLAUDE.md is changing too often

<15kTarget regular input tokens per turn in DevSecOps and spec-driven Phase 2 — above this suggests session drift or broad prompting

<50%Cache hit rate — action required. Review session grouping and CLAUDE.md stability.

Additional health metrics — spec-driven agentic

>80%Target cache hit rate in Phase 2 execution — spec as stable fixed context should improve hit rate above the standard 70% target

>18kRegular input per turn in Phase 2 — indicates spec not being used; Claude is still file-exploring. Review spec scope and Phase 2 prompt.

>3kCLAUDE.md tokens on a spec-driven project — spec has been embedded in CLAUDE.md. Strip it to SPEC.md immediately.

Reading a /context breakdown

/context # Example output — healthy DevSecOps session System prompt: 3,920 tokens [cached] Tool definitions: 16,580 tokens [cached] CLAUDE.md: 1,480 tokens [cached] GitLab MCP names: 490 tokens [cached] ───────────────────────────────────────────── Fixed context: 22,470 tokens [all cached — warm] Conversation (turns 1–6): Turn 1 user: 380 tokens Turn 1 assistant: 890 tokens Turn 2 user: 210 tokens Turn 2 assistant: 1,440 tokens ... Regular input: 8,430 tokens [not cached — billed at full rate] ───────────────────────────────────────────── Total this call: 30,900 tokens # Warning signs — all patterns: CLAUDE.md: 9,800 tokens ← bloated — review and trim Regular input: 42,000 tokens ← session drift — consider /clear GitLab MCP schemas: 8,700 tokens ← Tool Search not active # Warning signs — spec-driven Phase 2 specifically: CLAUDE.md: 4,200 tokens ← likely contains embedded spec — extract to SPEC.md Regular input: 18,000 tokens ← spec not being used; Claude still file-exploring SPEC.md: 0 tokens ← spec not loaded in context; Phase 2 running without it # Healthy spec-driven Phase 2 context: CLAUDE.md: 850 tokens ← tight, actionable only SPEC.md: 2,100 tokens [cached from Phase 1] Regular input: 11,400 tokens ← file exploration eliminated; within target

Monthly review cadence. Check your admin dashboard monthly. Look for: developers with consistently high regular input per turn (broad prompting), low cache hit rates (session grouping problem), unexpected Agent Team costs, or CLAUDE.md above 3,000 tokens on spec-driven projects. One developer paying 3× peers on the same work type almost always has a fixable behaviour driving the gap.

Knowledge Check — Factor 12

Scenario: You run /context at turn 5 of a DevSecOps session and see: CLAUDE.md = 8,200 tokens, Regular input = 38,000 tokens, Cache hit rate = 45%. What are the two most urgent issues to fix?

A. Upgrade to Opus and enable Agent Teams B. CLAUDE.md is too large (target <3k tokens for effective caching) and regular input is far too high (38k vs target <15k) suggesting session drift. Fix: trim CLAUDE.md aggressively, then /clear and restart with a specific single-task prompt. C. Enable Tool Search and disconnect MCP servers D. Switch to /effort max for the rest of the session to compensate

Show answer

B is correct. Two clear signals: CLAUDE.md at 8,200 tokens is 5× the efficient target — that's cached prose and history, not actionable instructions. Regular input at 38,000 tokens at turn 5 means the session has already drifted across multiple unrelated topics. The 45% cache hit rate confirms the session pattern is broken. Immediate actions: end this session, trim CLAUDE.md to under 1,500 tokens of actionable instructions, then restart with a single-task specific prompt.

←

Factor 11 — Extended Thinking

Model Capability & Approval Guide

→

Wiki / Reference

Reference

Model Capability & Approval Guide

Quick reference for model selection decisions. Use this when you're uncertain which model to use for a specific task type.

Model pricing (AWS Bedrock AU region)

Model	Input	Output	Cache write	Cache read	Best for
Haiku 4.5	$1.10/MTok	$5.50/MTok	$2.20/MTok	$0.11/MTok	Documentation, triage, boilerplate, simple formatting, sub-agents, spec-driven Phase 3 conformance review
Sonnet 4.6	$3.30/MTok	$16.50/MTok	$6.60/MTok	$0.33/MTok	Code review, bug fixes, security analysis, feature implementation, spec-driven Phase 2 execution — the primary workhorse
Opus 4.6	$5.50/MTok	$27.50/MTok	$11.00/MTok	$0.55/MTok	Complex security analysis, compliance gaps, architectural design — senior access; spec-driven Phase 1 authoring (all tiers via amortisation)

Monthly cost reference (Sonnet 4.6 standard developer)

Pattern	Use case	Light / mo	Medium / mo	Heavy / mo
DevSecOps	Standalone	$45.84	$61.72	$87.09
DevSecOps	+ GitLab MCP	$60.86	$79.37	$111.46
Agentic	Standalone	$44.04	$69.23	$177.93
Agentic	+ GitLab MCP	$54.04	$77.68	$190.68
Spec-Driven	Standalone (est.)	~$40	~$60	~$155
Spec-Driven	+ GitLab MCP (est.)	~$49	~$68	~$167

Spec-driven estimates include Phase 1 Opus overhead (1 spec per 5 sessions), Phase 3 Haiku conformance, and ~48% regular input reduction in Phase 2. Marked est. — actual results depend on spec quality and sprint cadence.

When Opus approval is warranted

Task	Pattern	Approved?	Why
Spec authoring — greenfield service or controlled migration sprint	Spec-Driven	Yes — all tiers	Phase 1 only. One Opus session amortised across 40–65 Sonnet execution turns. ARC-AGI-2 advantage most material for interface design and scope decisions. Standard developers eligible.
Security vulnerability exploit chain assessment — novel threat vectors	DevSecOps	Yes	ARC-AGI-2: 68.8% Opus vs 58.3% Sonnet — meaningful gap for novel reasoning
Compliance gap analysis against complex regulatory controls	DevSecOps	Yes	Multi-control reasoning; compliance errors have high downstream cost
Architectural design for new greenfield service	Agentic	Yes	Single planning session prevents many expensive wrong-direction turns
Multi-service brownfield refactor with ambiguous legacy coupling	Agentic	Conditional	Use Sonnet in Plan Mode first; escalate to Opus only if Sonnet fails twice
Security threat modelling for novel threat scenarios	DevSecOps	Conditional	Opus for genuinely novel; Sonnet handles known OWASP patterns well
Standard MR code review (1–5 files)	DevSecOps	No	79.6% Sonnet vs 80.8% Opus — indistinguishable for routine review
Spec-driven Phase 2 — execution against spec	Spec-Driven	No	Spec provides the reasoning frame. Sonnet executes it. Haiku eligible for bounded implementation turns.
Spec-driven Phase 3 — conformance review	Spec-Driven	No	Pattern matching against defined criteria. Haiku-level task at 3× lower cost.
Feature implementation — well-defined greenfield module	Agentic	No	Well-defined implementation is Sonnet + Plan Mode territory
Documentation, comments, type annotations	DevSecOps	No	Haiku task — Opus is 5× more expensive for equivalent output
Pipeline triage and CI configuration	DevSecOps	No	Pattern matching — Haiku appropriate

Documentation, comments, type annotationsDevSecOpsNoHaiku task — Opus is 5× more expensive for equivalent output Pipeline triage and CI configurationDevSecOpsNoPattern matching — Haiku appropriate

Optimisation factor quick summary

#	Factor	Impact	Key action
01	Prompt Specificity	Highest	File path + line range + specific concern. 5W1H checklist. One task per message.
02	Plan Mode	Highest — Agentic	`/plan` before multi-file changes. Skip for trivial single-line tasks. Used within Phase 2 for component-level planning.
03	Spec-Driven Development	Highest — Agentic only	Author spec before executing. Separate SPEC.md file. Do not /clear between Phase 1 → 2. Phase 3 on Haiku.
04	Session Hygiene	High — DevSecOps	`/clear` between every unrelated task. `/compact` at 80% in agentic. Do NOT /clear between Phase 1→2 in spec-driven.
05	Model Selection & Effort	High — DevSecOps	Haiku for routine tasks and Phase 3 conformance. Sonnet for Phase 2 execution. Opus for Phase 1 spec authoring (all tiers).
06	Subagents	High — Agentic	"Use a subagent to investigate X." Haiku subagents for research routing.
07	CLAUDE.md & .claudeignore	Medium-High	CLAUDE.md under 200 lines. Spec to SPEC.md — never CLAUDE.md. Configure .claudeignore before first session.
08	Cross-Session Cache Grouping	High — DevSecOps	Group same-project sessions in 60-min blocks. Keep CLAUDE.md stable during block.
09	MCP Tool Search	Medium	Verify Tool Search active with `/mcp`. Disconnect MCP servers you won't use.
10	Agent Teams	Critical governance	Disabled by default. ~7× multiplier. Only for genuinely parallel independent tasks.
11	Extended Thinking	Significant if unmanaged	`/effort low` for routine tasks and Phase 3 conformance. `/effort high` for complex analysis and Phase 1 spec authoring.
12	Token Telemetry	Foundation	`/context` to diagnose. `/cost` after sessions. CLAUDE.md >3k on spec-driven = action needed. 30-day review.

←

Factor 12 — Token Telemetry

Return to

Overview

↑

Developer CostOptimisation Guide

The 12 Optimisation Factors

The Three Work Patterns

Prompt Specificity & Context Front-Loading

Why it matters

DevSecOps examples

Security review of an authentication function

Fixing a known bug

MR code review with security focus

Agentic example — building a new API endpoint

The 5W1H checklist

Quick reference by task type

Plan Mode Before Execution

How Plan Mode works

When to use Plan Mode

DevSecOps example — multi-file bug fix

Agentic example — new service construction

Spec-Driven Development

The three-phase lifecycle

Why it works: the front-loading principle

Why Opus is justified at Phase 1 — for all developer tiers

What a good spec contains

CLAUDE.md rule for spec-driven projects

Session hygiene rules specific to spec-driven work

Telemetry signals to watch

Session Hygiene — /clear and /compact

The two commands

DevSecOps — the session boundary rule

Agentic — when to compact vs clear

Spec-driven — the phase boundary rule

Persistent instructions go in CLAUDE.md, not conversation

Model Selection & Effort Levels

Model selection guide

Switching models mid-session

Effort levels

Subagents — Targeted Delegation

How to invoke subagents

Good subagent use cases

CLAUDE.md & .claudeignore

What belongs in CLAUDE.md

CLAUDE.md template — secd3v projects

Spec-driven projects — the critical bloat failure mode

.claudeignore — essential for Brownfield

Cross-Session Cache Grouping

Cache economics at a glance

How to group sessions in practice

MCP Tool Search — Default Deferred Loading

Before and after Tool Search

Verify Tool Search is active

MCP connection hygiene

Agent Teams — Explicit Cost Governance

Single Agent vs Subagent vs Agent Team

When Agent Teams are appropriate

Extended Thinking

Controlling extended thinking

A practical session strategy

Extended thinking vs Opus for seniors

Token Telemetry — Measure Before Optimising

In-session commands

Key health metrics — all patterns

Additional health metrics — spec-driven agentic

Reading a /context breakdown

Model Capability & Approval Guide

Model pricing (AWS Bedrock AU region)

Monthly cost reference (Sonnet 4.6 standard developer)

When Opus approval is warranted

Optimisation factor quick summary

Developer Cost
Optimisation Guide