The 12 developer behaviours that directly control how much Claude Code costs yourself, your team and your organisation. Each factor explains why it matters, shows before-and-after examples from real DevSecOps, Agentic, and Spec-Driven workflows, and tells you exactly what to do.
REGION=AU · COMPLIANCE=IRAP assessed to PROTECTED · v2.0.0
How to use this guide. Start with Factor 01 — prompt specificity is the highest-impact change any developer can make and takes less than a day to internalise. Factor 03 covers spec-driven development — the highest-impact structural change for agentic developers, changing how you think about entire sprints. Work through the others in order; each builds on the previous. The Reference section is a quick lookup for model selection decisions.
The 12 Optimisation Factors
Factor 01
Prompt Specificity
Highest impact · All patterns
Factor 02
Plan Mode
Highest impact · Agentic
Factor 03
Spec-Driven Development
Highest impact · Agentic only · ~13% monthly saving
Factor 04
Session Hygiene
High impact · DevSecOps
Factor 05
Model Selection & Effort
High impact · DevSecOps
Factor 06
Subagents
High impact · Agentic
Factor 07
CLAUDE.md & .claudeignore
Medium impact · All patterns
Factor 08
Cross-Session Cache Grouping
High structural · DevSecOps
Factor 09
MCP Tool Search
Medium · GitLab MCP users
Factor 10
Agent Teams
Critical for agentic power users
Factor 11
Extended Thinking
High cost when unmanaged
Factor 12
Token Telemetry
Foundation for optimisation
The Three Work Patterns
This guide covers three Claude Code usage patterns. Spec-driven is a structured variant of Agentic — not a separate pattern — but it changes the session shape, model allocation, and cost profile enough to warrant its own treatment.
Pattern
When it applies
Session shape
Primary cost driver
DevSecOps
Working on an existing production system — code review, bug fixes, security checks, MR review on a live codebase with real users
8–11 short sessions/day, 5–45 min each, human approves every step
Cache write amortisation and regular input from broad prompting
Agentic
Building something new — new service, new module, greenfield implementation where wrong output can be discarded without consequence
1–2 long sessions/day, up to 4+ hours, Claude works autonomously
Growing conversation history as regular input — 25,000 tok/turn at heavy
Spec-Driven
Agentic variant: author a spec before executing. Interface contracts, acceptance criteria, file layout, constraints. Claude executes against the spec rather than discovering scope freely.
Phase 1: 30 min spec-write (Opus) → Phase 2: up to 4 hrs execution (Sonnet) → Phase 3: 30 min conformance review (Haiku)
Reduced to ~13,000 tok/turn in execution (−48%) — spec replaces file-discovery turns
The single largest per-session cost lever you control. How you phrase a request determines how much context Claude must read before it can act — and every token read compounds across the entire session.
TL;DR
Tell Claude exactly what file, what lines, and what you need. A prompt with an explicit file path, line range, and specific concern costs 3–5× less than an equivalent vague prompt — because Claude reads only what you point to, not everything that might be relevant.
💰
Cost Impact: Highest — All Patterns
"The more precise your instructions, the fewer corrections you'll need. Reference specific files, mention constraints, and point to example patterns." Vague prompts trigger broad file scanning. Each file read adds tokens that compound across every subsequent turn.
3–5× regular input reduction possible
Why it matters
Claude Code's context window holds everything — your messages, every file read, every command output. When you write a vague prompt, Claude reads broadly to find relevant files. Those reads stay in context for every subsequent message, inflating input token counts for the rest of the session.
~40,000Tokens from broad file exploration triggered by one vague prompt on a medium module
~8,000Tokens for the same task with an explicit file reference and line range
5×Reduction in regular input from specific vs vague prompting on this task type
DevSecOps examples
Security review of an authentication function
✗ Vague
Check my auth code for security issues
Claude scans auth.py, user.py, middleware.py, session.py, config.py, database.py, models.py — all of them, then their imports. Context grows before the real work begins.
✓ Specific
Review @src/auth/login.py lines 42–89 for SQL injection risk. The database layer uses psycopg2 and our connection is in @src/db/connection.py. Flag any raw string interpolation into queries.
Two files. Claude knows exactly what to look for and where. Session starts work immediately.
Fixing a known bug
✗ Vague
The user profile API is returning 500 errors. Fix it.
No file, no error message, no environment. Claude reads the entire module and its dependencies — potentially 10+ files — before forming a hypothesis.
✓ Specific
@src/api/user_profile.py line 134 is throwing:
KeyError: 'preferred_name'
Field added to the User model last sprint but the serializer at line 134 still expects the old schema. Fix the serializer to handle this field being absent, defaulting to None.
No MR number, no focus area, no security standards. Claude asks clarifying questions (wasted turns) or speculatively reads the entire diff plus related files.
✓ Specific
Review MR !847 — adds JWT token refresh to session middleware. Focus on:
1. Token expiry edge cases (expired-during-request scenarios)
2. Whether refresh tokens are properly invalidated on logout
3. Any timing windows that could allow token reuse
Our session middleware is at @src/middleware/session.py
Three specific concerns, one target file. Focused analysis costs far less than a general sweep.
Agentic example — building a new API endpoint
✗ Vague
Add an endpoint to the API for managing notifications.
Claude reads the entire API module, models, routing config, and tests to understand patterns — all before writing anything. Those reads stay in context for 60+ subsequent turns.
✓ Specific
Add GET /api/v2/notifications following the pattern in @src/api/v2/messages.py.
- Paginated (use PaginationMixin from @src/api/mixins.py)
- Filter by ?status=read|unread
- Auth via existing @require_auth decorator
- Return: id, title, body, created_at, is_read
Notification model already exists — no new models needed.
Two reference files. Claude implements immediately, saving 10+ exploration turns at the start of what will be a long session.
The 5W1H checklist
Before sending a prompt, answer these six questions. If you can answer them, write the answers into the prompt.
Question
What it covers
Example in prompt
What
Exactly what to change, review, or understand
"review the token validation logic" not "check auth"
Where
Specific file path and line numbers
"@src/auth/tokens.py lines 88–134"
Why
Error message, security concern, test failure
"returning 401 for valid tokens after migration"
Look for
Specific patterns or vulnerability class
"flag non-constant-time string comparisons"
Don't touch
Files or logic to leave unchanged
"do not modify the database schema or migrations"
How
Pattern or library to follow — point to an example
"use the same approach as @src/auth/refresh.py line 44"
One task per prompt. Bundling multiple tasks — "fix the bug, add tests, and update the README" — forces Claude to hold all three dependency trees in context simultaneously. Sending them separately reduces total token cost even though it's more messages.
Quick reference by task type
Task
Instead of…
Say…
Security check
"check for security issues"
"review @api/views.py lines 55–90 for IDOR — ensure user_id is validated against request.user"
Test generation
"write tests for the payment module"
"write pytest unit tests for charge() in @payments/processor.py using fixtures from @tests/conftest.py — cover: success, declined card, network timeout"
Documentation
"document the API"
"add Google-style docstrings to the 4 public methods in @src/api/client.py — type annotations already present, skip them"
Pipeline triage
"CI is failing, fix it"
"job 'test-unit' in pipeline #4821 fails with ImportError: cannot import 'TokenCache' from 'src.cache' — class renamed last commit. Fix imports in test files only"
Refactoring
"refactor the database layer"
"extract retry logic from @db/connection.py lines 120–156 into RetryPolicy class in @db/retry.py — keep existing interface, callers must not change"
Knowledge Check — Factor 01
Scenario: You need to review a payment function for timing attack vulnerabilities. You know it's in src/payments/processor.py but not the exact lines. Which prompt is most cost-effective?
Show answer
C is correct. Naming the specific functions and exact vulnerability class limits Claude's scope to one file and one issue type — even without knowing exact line numbers. Options A and D scan entire modules. Option B scans an entire directory. C is precise enough to produce a targeted, affordable session.
Use Plan Mode to separate what Claude should figure out from what it should build. The cost of a wrong plan is ~500 tokens. The cost of wrong-direction implementation discovered at turn 20 is tens of thousands.
Spec-driven development extends Plan Mode to the inter-session level. Plan Mode prevents wrong-direction turns within a single session. A spec prevents wrong-direction sessions from starting at all — by front-loading scope resolution before any execution session begins. See Factor 03 — Spec-Driven Development for how to apply this to entire agentic sprints. Both are essential; they operate at different granularities.
TL;DR
For any task that touches multiple files or where you're uncertain about approach, prefix your prompt with /plan or press Shift+Tab before typing. Review Claude's proposed plan. Only switch back to execute when you're confident in the direction. Skip Plan Mode for small, clearly scoped tasks where you could describe the diff in one sentence.
Anthropic's internal research confirms Claude Code succeeds on its first autonomous attempt approximately one-third of the time on complex tasks. For the other two-thirds, Plan Mode prevents the expensive correction loop. Wrong-direction correction at the plan stage costs ~500 tokens. Correction after 20 turns of wrong implementation costs tens of thousands.
Prevents 1–3 wasted sessions per week for active developers
How Plan Mode works
In Plan Mode, Claude reads files and runs shell commands to explore — but makes no edits to your source code. It produces a plan file you can review (and edit directly) before execution begins.
1
Enter Plan Mode. Type /plan before your prompt, or press Shift+Tab to cycle through permission modes. Claude will confirm it's in Plan Mode.
2
Claude explores. It reads relevant files and asks clarifying questions without touching your code. This phase costs tokens but far fewer than an incorrect implementation.
3
Review and edit the plan. Press Ctrl+G to open the plan file in your editor. Correct the approach, add constraints, or reject it entirely — at near-zero cost.
4
Execute against the plan. Press Shift+Tab again to return to Normal Mode. Claude implements what was agreed, verifying against its own plan.
When to use Plan Mode
✓ Use Plan Mode when…
The task touches multiple files or modules
You're unfamiliar with the area of code being changed
You're uncertain what the right approach is
Any brownfield change to a production system
New feature implementation in an existing service
Refactoring with potential side-effects
Security-sensitive changes
Spec-driven Phase 2: multi-file changes within the spec scope — use /plan to confirm approach before implementing each significant component
○ Skip Plan Mode when…
You could describe the complete diff in one sentence
Fixing a typo, renaming a variable, adding a log line
Adding a docstring to a clearly understood function
Updating a dependency version
A formatting-only change
Any task where the scope is fully clear upfront
Plan Mode adds overhead for trivial tasks. Anthropic's docs: "Plan Mode is useful, but also adds overhead. For tasks where the scope is clear and the fix is small — like fixing a typo, adding a log line, or renaming a variable — ask Claude to do it directly."
DevSecOps example — multi-file bug fix
✗ Without Plan Mode
Fix the password reset flow — users aren't receiving the reset email and the link expires too quickly.
Claude dives straight into implementation. It might fix the email delivery in the wrong layer, change token expiry in a way that conflicts with the session middleware, and miss a third issue in the validation logic. Realising this at turn 15 means rewinding and re-implementing — expensive.
✓ With Plan Mode
/plan Fix the password reset flow — users aren't receiving the reset email and the link expires too quickly.
Claude maps the password reset flow across auth/reset.py, notifications/email.py, and middleware/session.py, identifies all three issues, and proposes a plan. You review: the email fix looks right, but the session expiry approach is wrong for your architecture. You correct it in the plan file before a single line of implementation is written.
Agentic example — new service construction
✗ Without Plan Mode
Build a notification service that sends email, SMS, and push notifications. It should be async and support retry logic.
Claude starts building. At turn 30 you discover it's built the queue on Redis when your infrastructure uses SQS, and the retry logic doesn't match your existing circuit-breaker pattern. Starting over costs the entire 30-turn investment.
✓ With Plan Mode
/plan Build a notification service that sends email, SMS, and push notifications. It should be async and support retry logic.
We use SQS for queuing (@infrastructure/queues.py) and our circuit-breaker pattern is in @src/resilience/circuit_breaker.py
Claude reads your existing infrastructure files and proposes an architecture that fits. You review and catch that it's missed the dead-letter queue requirement — you add it to the plan. Implementation then proceeds correctly from turn 1.
Plan Mode in DevSecOps is a compliance practice, not just a cost practice. On brownfield production systems, executing before planning is how you accidentally modify the wrong component or create a change that passes tests but introduces a subtle security regression.
Knowledge Check — Factor 02
Scenario: You need to add input validation to a single function in @src/api/users.py. The validation logic is straightforward — just checking that the email field matches a regex. Should you use Plan Mode?
Show answer
B is correct. This is exactly the scenario Anthropic describes as "skip the plan — if you could describe the diff in one sentence." The scope is fully clear: one function, one field, one regex check. Plan Mode would add overhead without benefit. Reserve Plan Mode for tasks with genuine scope uncertainty or multi-file impact.
The highest-impact structural change available to agentic developers. Author a complete specification before execution begins — and cut the dominant agentic cost driver by approximately half.
TL;DR
Before starting any agentic execution session, write a specification: interface contracts, data shapes, acceptance criteria, file layout, security constraints, test coverage scope. Claude Code then executes against the spec rather than discovering scope through file exploration. This replaces the most expensive phase of every agentic session — and the spec itself becomes the cached context that makes every subsequent execution turn cheaper.
💰
Cost Impact: Highest — Agentic. ~13% monthly saving at heavy usage.
Standard agentic heavy sessions average 25,000 tokens regular input per turn — driven by file exploration and growing conversation history. Spec-driven execution sessions average ~13,000 tokens per turn: a 48% reduction. Wrong-direction turns drop from 4–8 per session to 0–1. Phase 3 conformance review runs on Haiku. The blended monthly cost at heavy usage drops from ~$178 to ~$155 for a standard developer.
~48% reduction in regular input/turn · Phase 3 Haiku-eligible · Opus justified at Phase 1 (all tiers)
The three-phase lifecycle
Spec-driven development restructures Agentic work into three phases. Each has a distinct session profile, dominant model, and cost character. This is not a third work pattern — it is a structured variant of Agentic that modifies how Agentic sessions run.
Phase 1 · Spec Writing
Author before executing
Write the full specification. Interface contracts, data shapes, acceptance criteria, file layout, security constraints, test coverage scope. Quality here has compounding leverage across all downstream sessions.
8–15 turns · 15–30 min · Opus (all developer tiers) Avg regular input: ~4,000 tok/turn Output: 1,500–2,500 token spec file Est. cost: ~$8–15 per spec session
Phase 2 · Spec Execution
Implement against spec
Execute against the spec. Claude reads the spec and only the files it names. Do not /clear between Phase 1 and Phase 2 — the spec is the context that must persist. /compact at 80% context fill.
40–65 turns · up to 4 hrs · Sonnet primary / Haiku for bounded turns Avg regular input: ~13,000 tok/turn (vs 25,000 standard) Spec cached at 0.1× from turn 2 · 1-hr TTL mandatory
Phase 3 · Conformance Review
Check output vs spec
Verify implementation against spec criteria. Pattern matching — does the output satisfy the contracts, acceptance criteria, and security constraints? Not novel reasoning. Failures return to Phase 2 with specific deviation notes.
5–10 turns · 15–30 min · Haiku primary / Sonnet for edge cases Avg regular input: ~3,000 tok/turn Est. cost: ~$1.50–3 per review session
Why it works: the front-loading principle
The dominant agentic cost driver is regular input from file exploration and growing conversation history. In a standard agentic session, Claude reads 10–15 files to discover scope before work begins — each read compounds into every subsequent turn. Spec-driven development replaces this discovery phase with a single authored document. Claude reads the spec and the specific files it names. Nothing else.
25,000Avg regular input tokens per turn — standard agentic heavy
~13,000Avg regular input tokens per turn — spec-driven execution
−48%Reduction in the dominant cost driver from spec-driven execution
The second saving is wrong-direction turns. A standard agentic session at heavy usage accumulates 4–8 wrong-direction turns per session — turns where Claude builds something that gets discarded because a constraint wasn't understood upfront. Wrong-direction correction after 20 turns costs tens of thousands of tokens. A spec surfaces these errors at Phase 1 review, where correction costs ~500 tokens. The spec is Plan Mode applied to the inter-session lifecycle, not just a single session.
Why Opus is justified at Phase 1 — for all developer tiers
The cost model normally gates Opus by developer tier. Spec authoring creates an amortisation justification that applies regardless of tier.
# Opus amortisation — standard developer, standalone, heavy usagePhase 1: Spec writing session (Opus regional) Input: ~4,000 tok/turn × 12 turns = 48,000 tokens
Output: ~2,000 token spec (written once, cached in Phase 2)
Est. cost: ~$12.50 per spec sessionPhase 2: Standard agentic execution without spec Regular input: 25,000 tok/turn × 65 turns = 1,625,000 tokens
Wrong-direction corrections: ~6 per session (tens of thousands of tokens)
Monthly cost (heavy): ~$177.93Phase 2: Spec-driven execution (Sonnet) Regular input: ~13,000 tok/turn × 65 turns = 845,000 tokens
Wrong-direction corrections: ~0–1 per session
Monthly cost (heavy): ~$155Break-even: Opus spec overhead ($12.50) vs execution saving ($23/month)
At 1 spec per 5 execution sessions → net saving ≈ $20.50/month
At 1 spec per 2 execution sessions → still net positive ≈ $10.25/month
CONCLUSION: Spec-driven Opus is recouped within the first execution session.
What a good spec contains
A spec is not a project overview or a requirements document. It is a set of actionable constraints Claude can follow and verify against. Every line should be something Claude will do differently because it's there.
○ Not a spec — too vague
Build a notification service. It should support email and SMS and be async. Handle errors properly and write tests.
No interface contracts. No queue technology specified. No error-handling pattern. No test coverage requirements. Claude will make assumptions — some wrong — and the execution session will diverge from intent.
✓ A spec — actionable constraints
Service: NotificationService
Interface: send(notification: Notification) → Result[str, NotificationError]
Queue: SQS via @infrastructure/queues.py — do not use Redis
Retry: circuit-breaker in @src/resilience/circuit_breaker.py — max 3 retries, exponential backoff
Channels: EmailProvider, SMSProvider — both must implement ChannelProtocol
Errors: NotificationError(channel, reason, retryable: bool)
Tests: unit per provider + integration test per channel · coverage ≥ 90%
File layout:
src/notifications/service.py
src/notifications/providers/{email,sms}.py
src/notifications/errors.py
tests/notifications/
Interface contract, queue technology, retry pattern, error type, test requirements, file layout — all explicit. Phase 2 execution proceeds from turn 1 without discovery.
CLAUDE.md rule for spec-driven projects
The spec belongs in a separate project file — not embedded in CLAUDE.md. CLAUDE.md is loaded on every API call for every session. A 600-line spec embedded in CLAUDE.md adds ~9,000 tokens of cache-read cost per call, including calls that have nothing to do with the spec. A tight CLAUDE.md that references the spec file keeps cost low and the spec accessible only when Phase 2 execution loads it explicitly.
Telemetry signal. If CLAUDE.md exceeds 3,000 tokens on a spec-driven project, the spec has been embedded there. Strip it out, save it as SPEC.md or similar in the project root, and reference it from CLAUDE.md with a single line: Current sprint spec: @SPEC.md. This is the most common spec-driven cost failure mode.
Session hygiene rules specific to spec-driven work
Rule
Why
Do not /clear between Phase 1 and Phase 2
The spec is the context that must persist. /clear destroys it. Start Phase 2 immediately after Phase 1 in the same session, or resume the named session.
/compact at 80% context fill during Phase 2
Long execution sessions hit context limits. /compact preserves spec and architecture decisions, drops verbose tool output. Focus: /compact focus on spec compliance and decisions made so far
1-hour TTL — non-negotiable for Phase 2
The spec must survive as warm cached context across the full execution session without re-write cost. Standard agentic TTL guidance applies; spec-driven removes any flexibility here.
Phase 3 can start fresh
Conformance review sessions reference the spec file directly. /clear before Phase 3 is fine — provide the spec path explicitly at the start of the review session.
/clear between unrelated sprints
When a sprint is complete, /rename and /clear. The next spec-driven sprint starts with a fresh Phase 1 session.
Telemetry signals to watch
✓ Healthy spec-driven indicators
Regular input per turn in Phase 2: <15,000 tokens — spec is being used, exploration eliminated
Cache hit rate: >80% in Phase 2 — spec as stable fixed context is warming the cache
CLAUDE.md token count: <3,000 tokens — spec not embedded in CLAUDE.md
Opus usage concentrated in Phase 1 sessions only
Phase 3 sessions using Haiku for ≥70% of turns
○ Warning signals — take action
Regular input >20,000 tok/turn during Phase 2 → Claude is still file-exploring; spec scope not defined tightly enough
CLAUDE.md >3,000 tokens on spec-driven project → spec embedded; strip and externalise
Opus usage during Phase 2 or 3 → policy drift; Phase 2 should be Sonnet
Phase 3 using Sonnet for all turns → Haiku is sufficient for conformance checking
Knowledge Check — Factor 03
Scenario: You've completed Phase 1 and written a 2,000-token spec for a new authentication service. Phase 2 execution is about to begin. Your teammate suggests embedding the spec into CLAUDE.md so it's always available, and then running /clear to start Phase 2 fresh. Are they right?
Show answer
B is correct. Two mistakes in the suggestion: (1) CLAUDE.md loads on every API call in every session. A 2,000-token spec embedded there adds ~$0.007 of cache-read overhead per call across all sessions including those unrelated to this sprint. Keep CLAUDE.md under 200 lines of actionable instructions; save the spec as SPEC.md and reference it. (2) /clear between Phase 1 and 2 destroys the spec context Phase 2 depends on. Continue Phase 2 directly from Phase 1, or /rename the session and resume it. Phase 2 requires the spec to be live in context — not rebuilt from scratch.
Every API call processes the full conversation history to date. Accumulated irrelevant turns add tokens to every subsequent message. The fix is simple: clear aggressively between tasks.
TL;DR
Use /clear between every unrelated task in DevSecOps work. If Claude has made the same mistake twice on the same issue, clear and start fresh — a clean session with a better prompt outperforms a long session with accumulated corrections. For long agentic sessions, use /compact at around 80% context fill instead of clearing.
💰
Cost Impact: High — DevSecOps. Important — Agentic.
A 20-turn session where turns 15–20 are a different task than turns 1–14 carries 14 turns of irrelevant context as overhead on those last 6 messages. Each of those 6 messages pays full input price for context it will never use. Anthropic: "If you've corrected Claude more than twice on the same issue in one session, the context is cluttered with failed approaches. Run /clear and start fresh with a more specific prompt."
15–25% regular input reduction from consistent session hygiene
The two commands
Command
What it does
When to use it
/clear
Completely resets the context window — all conversation history is gone. A truly fresh start.
Between unrelated tasks in DevSecOps work. After two failed corrections on the same issue. Switching to a different project or codebase.
/rename
Names the current session before clearing, so you can resume it later with /resume.
Always run /rename before /clear if you might want to return to the session.
/compact
Summarises the conversation — keeps important code and decisions, removes verbose history. Preserves continuity.
In long agentic sessions when context approaches 70–80% full, or when you want to continue the same thread but reduce noise.
/compact focus on X
Focuses the summary on what matters, e.g. /compact focus on the API changes made so far.
When you want to continue with a specific subset of context from a long session.
/context
Shows what's currently using context — system prompt, files, conversation, tools. Helps you diagnose bloat.
Before deciding whether to compact or clear. After connecting new MCP servers.
DevSecOps — the session boundary rule
In DevSecOps work, every meaningful task boundary is a /clear boundary. The goal is to keep sessions within their intended type — Micro (5 turns), Standard (9 turns), Extended (15 turns).
✗ No hygiene — "session drift"
Turn 1: Review @auth/login.py for SQL injection
Turn 5: Good. Now check the session middleware too
Turn 9: And while we're at it, what about the API rate limiting?
Turn 13: OK that's interesting — can you also look at the Redis config?
Turn 17: One more thing — the password reset flow has a bug...
By turn 17, the context contains auth code, session middleware, rate limiting analysis, Redis configuration, and password reset logic. Each new message pays input cost for all of it.
Each task starts fresh. Turn 1 of each session has only the 22k system context plus the immediate task. Every session stays micro-sized. Total cost is a fraction of the drifted session.
Agentic — when to compact vs clear
Compact — use when continuity matters
# You're 50 turns into building a notification service.
# Context is 75% full. Claude has built the email and SMS
# providers. You still need push notifications.
/compact focus on the notification service structure built so far and the interfaces of the email and SMS providers
Claude preserves the architecture decisions and interfaces, drops verbose tool output and exploration history. You continue with a clean but informed context for the push notification work.
Clear — use when switching direction
# You've finished the notification service.
# Now you're starting an entirely different task:
# building the user preference settings API.
/rename notification-service-complete
/clear
# Fresh session for the new task
Build a user preference settings API following the pattern in @src/api/v2/messages.py...
Nothing from the notification service is relevant to preferences. Keeping it in context adds cost with zero benefit. Clear completely.
The two-correction rule. Anthropic's guidance: if you've corrected Claude on the same mistake twice in a session, the context is cluttered with failed attempts. /clear and restate the task with the corrections already incorporated into the prompt. A clean session with a better prompt almost always produces a better result at lower cost.
Spec-driven — the phase boundary rule
Spec-driven development introduces a different /clear logic. The standard rule — clear between every unrelated task — does not apply between Phase 1 and Phase 2. The spec written in Phase 1 is the context Phase 2 depends on. Clearing it destroys the primary cost benefit of the entire approach.
The spec is gone from live context. Claude re-reads it from disk on every reference — as regular input, not cached context. The primary cost benefit of spec-driven execution is lost. The spec must be live in the context window from Phase 1 through Phase 2.
✓ Correct — continue directly from Phase 1
# Phase 1 complete — spec is live in context
# Do NOT /clear — continue Phase 2 immediately
Now execute against the spec. Start with
src/auth/service.py — implement the AuthService
interface as defined...
The spec stays warm in cached context from Phase 1. Every Phase 2 turn reads it at 0.1× cost. This is the core caching benefit of spec-driven development — do not break it with a /clear.
Phase
/clear rule
Why
Phase 1 → Phase 2
Do not /clear
Spec must remain live in context. Continue directly or resume named session.
Within Phase 2 (80% context)
/compact — not /clear
Preserve spec and architecture decisions. Focus: /compact focus on spec compliance and decisions made
Phase 2 → Phase 3
/clear is fine
Conformance review starts fresh, references spec file directly by path.
Between unrelated sprints
/rename then /clear
New sprint = new Phase 1 = new spec. Old context has no value.
Persistent instructions go in CLAUDE.md, not conversation
If you find yourself re-stating the same instructions every session — "always use our custom logger, never use print() directly" — these belong in CLAUDE.md, not in your prompt. Instructions in CLAUDE.md survive /clear. Instructions in conversation history do not. See Factor 07 — CLAUDE.md for how to structure persistent project instructions. The spec, however, is not CLAUDE.md material — see Factor 03 for why.
Knowledge Check — Factor 04
Scenario: You're in turn 8 of a DevSecOps session. You asked Claude to fix a bug in the payment validator, and it's made the same mistake twice — using float() for currency amounts when your codebase requires Decimal(). What should you do next?
Show answer
B is correct. Two corrections on the same issue is the trigger to clear. The context is now cluttered with two failed attempts. A fresh session with the constraint stated upfront — "all currency amounts must use Decimal, never float" — will be faster, cheaper, and produce a correct result. Also consider adding this to CLAUDE.md so you never need to state it again.
Defaulting to Sonnet for every task is the most common unnecessary cost. Haiku handles a large fraction of DevSecOps and spec-driven conformance work at 3× lower price. Effort levels let you tune reasoning depth without changing model.
TL;DR
Use Haiku for routine tasks and spec-driven Phase 3 conformance review (pattern matching, not reasoning). Use Sonnet for code review, security analysis, bug fixes, and spec-driven Phase 2 execution. Use Opus for spec authoring (Phase 1) — justified for all developer tiers through amortisation — and for novel reasoning tasks with senior approval. Use /effort low to reduce thinking depth on routine Sonnet tasks.
💰
Cost Impact: High — DevSecOps. Moderate — Agentic.
Haiku costs $1.10/MTok input vs Sonnet at $3.30/MTok — a 3× difference. In DevSecOps work where 25–30% of interactions are routine tasks, consistently routing these to Haiku produces meaningful monthly savings. Under spec-driven development, Phase 3 conformance review is pattern matching — Haiku handles it at the same quality as Sonnet at 3× lower cost, raising the blended Haiku share from 15% toward 20%.
Dependency version lookups and compatibility checks
Haiku
Factual retrieval
Boilerplate and scaffold generation
Haiku
Template filling, no design decisions
GitLab issue summarisation
Haiku
Text processing
Spec-driven Phase 3 — conformance review
Haiku
Pattern matching against defined spec criteria — not novel reasoning. Haiku handles it at the same quality at 3× lower cost
Code review and security analysis
Sonnet
Requires understanding of intent and edge cases
Bug fix analysis and implementation
Sonnet
Reasoning about cause, effect, and constraints
Test generation (non-trivial coverage)
Sonnet
Understanding behaviour under failure modes
Multi-file feature implementation
Sonnet
Sustained multi-turn reasoning
Spec-driven Phase 2 — execution
Sonnet
Implementation against spec. Haiku eligible for simple bounded turns within spec scope
IRAP / compliance gap analysis
Opus (senior)
Complex multi-control reasoning; error consequences are high
Novel exploit chain assessment
Opus (senior)
ARC-AGI-2 advantage: 68.8% Opus vs 58.3% Sonnet
Architectural design for greenfield service
Opus (senior)
Single planning session prevents many wrong-direction turns
Spec-driven Phase 1 — spec authoring
Opus (all tiers)
One Opus session amortised across 40–65 Sonnet execution turns. ARC-AGI-2 advantage most material for interface design and scope decisions. See Factor 03.
Switching models mid-session
Switch models at any time during a session with the /model command. The conversation history carries over, but new API calls use the new model's pricing.
# Start a session in Haiku for triage work/modelhaikuTell me why pipeline #4821 test-unit job is failing based on this log: [paste log]# Issue identified — switch to Sonnet to implement the fix/modelsonnetFix the ImportError in @tests/test_cache.py — the TokenCache class was renamed to CacheToken in the last commit
Effort levels
Effort levels control how deeply Claude reasons before responding — separate from which model you're using. Sonnet 4.6 and Opus 4.6 support four levels:
The most complex reasoning tasks. Senior-approved Opus sessions only.
Knowledge Check — Factor 04
Scenario: You're about to start a session that will generate docstrings for 30 functions across 5 files. Which setup minimises cost?
Show answer
B is correct. Docstring generation is pattern completion — no deep reasoning needed. Haiku handles it well at 3× less cost than Sonnet. Low effort eliminates the thinking block overhead. Starting with /model haiku and /effort low before the first prompt sets both optimisations for the whole session.
Subagents are lightweight Claude instances spawned within your session for specific sub-tasks. Used correctly, they reduce your main session's context overhead by offloading research, verification, and low-intelligence work to cheaper models.
TL;DR
Use subagents by asking Claude directly: "Use a subagent to investigate X." Route research, verification, and routine sub-tasks to Haiku subagents to avoid polluting the main session's context. Subagents are enabled by default — no special configuration needed.
💰
Cost Impact: High — Agentic.
Without subagents, all investigation output — file reads, shell command results, exploration — accumulates in the main session context and is re-sent on every subsequent API call. A subagent handles the investigation and returns only its summary to the main session. Main session input stays clean.
Reduces main session regular input by 20–40% on investigation-heavy tasks
How to invoke subagents
You don't need special syntax. Ask Claude naturally, specifying the model if you want cost control:
# Research task — route to Haiku subagentUse a Haiku subagent to check whether our Redis client version (redis-py 4.5.4) is compatible with Redis 7.2. Return just the compatibility verdict and any breaking changes.# Verification task — Haiku subagentUse a subagent to verify that all test files in @tests/ import from the correct module paths after the refactor. Return a list of any files with broken imports.# Complex investigation — let Claude choose the modelUse a subagent to investigate why the auth middleware is adding 200ms latency. Check the middleware chain in @src/middleware/ and return the most likely cause.
Good subagent use cases
✓ Route to subagents
Library/version compatibility checks
Scanning for a pattern across many files
Verifying test coverage for a specific module
Checking whether a dependency is already present
Summarising a large file or log output
Running shell commands and returning results
○ Keep in main session
Any work that needs the full conversation context
Implementation decisions that depend on prior turns
Subagents vs Agent Teams. Subagents work within your session and are always available. Agent Teams spawn entirely separate parallel sessions and require explicit approval — they're covered in Factor 09. If you're not sure which you need, use a subagent.
Knowledge Check — Factor 06
Scenario: You're implementing a new feature and need to know whether three specific third-party libraries are already in your requirements.txt, and if so, what versions. What's the most cost-effective approach?
Show answer
B is correct — but C is nearly as good. B is optimal because the Haiku subagent reads the file and returns only the specific answer, keeping main session context clean and using the cheapest model. C (reading it yourself and pasting) is also excellent — you've offloaded the file read and added only the relevant lines to context. A is worse because Claude reads the full file into main session context. D is wrong — Plan Mode is for code changes, not simple file lookups.
CLAUDE.md is loaded into every session automatically. That's powerful — but it means every token in it is billed on every API call. Keep it sharp: instructions Claude follows, not background context Claude reads.
TL;DR
Keep CLAUDE.md under 200 lines. Write only actionable instructions — things Claude will actually do differently because they're there. Explanatory background, historical context, and architecture decisions don't belong here; they cost tokens on every call without influencing output. Configure .claudeignore before your first session on any brownfield project.
💰
Cost Impact: Medium-High — All Patterns.
Every token in CLAUDE.md is included in the cached system context — which means it's billed (at cache read price) on every single API call in every session on that project. A bloated 600-line CLAUDE.md with explanatory prose adds ~9,000 tokens of cache read cost per call that produces no better output than a focused 150-line CLAUDE.md.
Target: under 200 lines · 100% actionable · zero narrative prose
What belongs in CLAUDE.md
✓ Include
Language and framework versions in use
Coding conventions Claude must follow (e.g. "use Decimal for all currency")
Security rules (e.g. "never log PII, never use raw string queries")
Test patterns and required coverage level
Key file paths for common reference
Forbidden patterns that Claude should refuse to implement
Build and run commands
○ Do not include
Why you chose a technology (history/rationale)
Architecture evolution narrative
Git branching strategy explanations
Team onboarding context
Long lists of "things to know about the project"
Content from other docs pasted in wholesale
CLAUDE.md template — secd3v projects
# Project: [service-name]
# Environment: ap-southeast-2 · IRAP Protected## StackPython 3.12 · Django 5.1 · PostgreSQL 16 · Redis 7.2 · Celery 5.4
AWS: ECS Fargate, RDS, ElastiCache, SQS, S3## Critical rules — follow always- All currency: use Decimal, never float
- All logging: use src/logging/structured.py — never print()
- All API responses: use ResponseWrapper from src/api/base.py
- No raw SQL string interpolation — parameterised queries only
- Never log: passwords, tokens, PII, credit card data
- Migrations: never auto-generate — write manually and review## Test requirements- pytest · fixtures in tests/conftest.py
- Coverage must not drop below 85%
- Security-related functions: always add failure-mode tests## Key pathssrc/api/base.py — base views and ResponseWrapper
src/auth/ — all authentication logic
src/logging/ — structured logging utilities
infrastructure/ — IaC (Terraform) — do not modify in Claude sessions## Forbidden- Do not modify: migrations/, infrastructure/, .env files
- Do not use: requests library (use httpx), time.sleep() (use asyncio)
Spec-driven projects — the critical bloat failure mode
When using spec-driven development, the most common CLAUDE.md failure is embedding the spec directly into it. A 2,000-token spec embedded in CLAUDE.md adds that cost to every API call in every session — including sessions that have nothing to do with the current sprint. The spec should live as a separate file loaded only during Phase 2 execution.
✗ Wrong — spec embedded in CLAUDE.md
# CLAUDE.md — 350 lines
## Project rules...
## Current sprint spec:
Service: NotificationService
Interface: send(notification: Notification)...
[300 more lines of spec content]
Every API call in every session — including Phase 3 conformance review, unrelated DevSecOps sessions, CI automation — pays cache-read cost for 2,000+ tokens of spec content. At Sonnet rates: ~$0.007 extra per call, ~$5–8/month unnecessary cost per developer.
✓ Correct — spec as separate file
# CLAUDE.md — 45 lines
## Project rules...
## Current sprint spec: @SPEC.md
# SPEC.md — separate file, 300 lines
Service: NotificationService
Interface: send(notification: Notification)...
[300 lines of spec content]
CLAUDE.md stays under 200 lines. SPEC.md loads on demand during Phase 2 execution. Phase 3 and unrelated sessions don't pay for spec content they don't use.
Telemetry signal: CLAUDE.md above 3,000 tokens on a spec-driven project. Run /context and check the CLAUDE.md line. If it's above 3,000 tokens, spec content has been embedded. Strip it, save as SPEC.md in the repo root, and replace with a single reference line in CLAUDE.md. Target: CLAUDE.md under 200 lines regardless of spec-driven status.
.claudeignore — essential for Brownfield
Without .claudeignore, Claude can read and index any file in your repository when prompted broadly. In Brownfield repos with years of accumulated build artifacts, this can exhaust your session budget on a single general query.
# .claudeignore — place in repository root# Build outputsdist/
build/
*.egg-info/
__pycache__/
*.pyc
.pytest_cache/# Dependenciesnode_modules/
vendor/
.venv/
env/# Generated files*.min.js
*.min.css
coverage/
.coverage
htmlcov/# Binaries and large files*.pdf
*.png
*.jpg
*.zip
*.tar.gz
migrations/ # if auto-generated# IDE and OS.idea/
.vscode/
.DS_Store
Brownfield repositories are especially dangerous. Years of accumulated build artifacts, test output, generated migration files, and cached data can easily total millions of tokens. A prompt like "what's in the project?" without .claudeignore can trigger a scan that exhausts a session budget before producing a single useful response. Configure .claudeignore on day one of every brownfield project.
Knowledge Check — Factor 07
Scenario: Your CLAUDE.md currently includes a 400-word explanation of why your team chose Django over FastAPI, a description of your Git branching strategy, a detailed history of the project's architecture evolution, and the full 2,000-token spec for the current sprint. Your team is hitting high token costs. What should you prioritise fixing first?
Show answer
C is correct. The 2,000-token spec is the highest-impact fix — it adds cost to every API call including unrelated sessions. Extract it to SPEC.md immediately and replace with a single reference line. Then remove the Django rationale, branching strategy, and architecture history — none of these are actionable instructions Claude follows differently on any specific task. If architecture decisions are expressed as rules (e.g. "always use ResponseWrapper") they belong; if they're background reading, they don't.
With 1-hour TTL caching, the platform can share a cache write across multiple sessions on the same project within an hour. Your job as a developer is to work in a way that lets this happen.
TL;DR
Group related DevSecOps sessions on the same project into focused 30–60 minute working blocks. Keep your CLAUDE.md stable during these blocks — changes invalidate the cache. Sessions 2 and 3 on the same project within an hour benefit from the cache established by session 1, saving the cost of a fresh write.
💰
Cost Impact: High Structural — DevSecOps.
The 22,000-token system context (31,000 with GitLab MCP) is written to cache at the start of each session at 2.0× input price. With 1-hr TTL, the service routes subsequent sessions on the same project to the same cache state — sessions 2 and 3 read at 0.1× price instead of re-writing at 2.0×. This saves $6–14 per developer per month (Sonnet regional).
$6–14/dev/month savings from cross-session sharing (Sonnet)
Cache economics at a glance
$0.145Cost of one 22k cache write (Sonnet regional 1-hr TTL)
$0.007Cost of one 22k cache read (Sonnet regional) — 95% cheaper
3 sessionsIf grouped in 60 min: 1 write + 2 reads = $0.159 vs $0.435 for 3 separate writes
How to group sessions in practice
1
Plan your morning DevSecOps block. Identify which project you'll work on for the first hour. MR reviews, security checks, and bug triage on the same repository are natural groupings.
2
Start your first session. This writes the cache for the project. The 1-hour TTL clock starts now.
3
Use /clear between tasks, not between projects./clear resets conversation history but does not invalidate the prompt cache. Same project, same CLAUDE.md = cache remains warm.
4
Keep CLAUDE.md stable during the block. Editing CLAUDE.md changes the cached content and requires a new write. If you need to update it, do so outside your working block.
5
The TTL resets on every cache hit. Actively working sessions stay warm automatically. A 3-hour focused session on one project writes the cache once and benefits from cheap reads throughout.
✗ Scattered sessions — multiple cache writes
9:00 AM — Review MR !231 on project-alpha [cache write: $0.145]
9:20 AM — Check a config in project-beta [cache write: $0.145]
9:40 AM — Quick fix in project-alpha [cache write: $0.145 — alpha cache expired]
10:00 AM — Back to MR review on project-alpha [cache write: $0.145]
10:20 AM — Audit issue in project-gamma [cache write: $0.145]
5 cache writes = $0.725
Jumping between projects prevents the cache from warming. Each return to a project requires a fresh write because the previous session's TTL has expired during the detour.
✓ Grouped sessions — cache shared
9:00 AM — Review MR !231 on project-alpha [cache write: $0.145]
9:20 AM — /clear, then: fix auth bug, project-alpha [cache READ: $0.007]
9:40 AM — /clear, then: security audit, project-alpha [cache READ: $0.007]
10:00 AM — /clear, then: final MR check, project-alpha [cache READ: $0.007]
10:20 AM — Move to project-beta [cache write: $0.145]
2 cache writes + 3 reads = $0.311 vs $0.725
Grouping project-alpha work into a single focused block means 4 sessions share one cache write. Total cache cost is less than half of the scattered approach.
You don't need to do anything special to enable this. The platform handles cache routing. Your job is to create the conditions: group related sessions on the same project, keep CLAUDE.md stable during your working block, and don't unnecessarily jump between projects mid-hour.
Knowledge Check — Factor 08
Scenario: You've been working on project-alpha for 45 minutes — 3 sessions, all within the 1-hour TTL window. You've just used /clear to start session 4. You then remember you need to update one line in CLAUDE.md. Should you update it now?
Show answer
B is correct. Editing CLAUDE.md changes the cached content hash, forcing a new cache write at the start of the next session. With 15 minutes remaining in your block, you'd lose the benefit of the warm cache. State the new constraint in your current prompt, then update CLAUDE.md at the end of your block. That way session 4 continues on the warm cache.
MCP tool definitions used to load entirely at session start, adding thousands of tokens of overhead before you typed a word. Tool Search changed this — now only names load upfront, and full schemas load on demand.
TL;DR
Tool Search is enabled by default in recent Claude Code versions (v2.1+) and significantly reduces MCP context overhead. Verify it's active with /mcp — you should see "deferred" status for tool definitions. Connect GitLab MCP only for sessions where you'll actually use it, and disconnect for pure coding sessions where you won't.
💰
Cost Impact: Medium — GitLab MCP users.
"Tool search keeps MCP context usage low by deferring tool definitions until Claude needs them. Only tool names load at session start, so adding more MCP servers has minimal impact on your context window." Previously, GitLab MCP's ~35 tool definitions added approximately 9,000 tokens at session start regardless of whether GitLab was used. With Tool Search, only tool names (~500 tokens) load upfront.
Saves $8–14/dev/month in mixed workflows (Sonnet) vs eager loading
Before and after Tool Search
Before Tool Search (eager loading)
# Session starts — GitLab MCP connected
# Context window at session start:
System prompt: 3,900 tokens
Tool definitions: 16,600 tokens
GitLab MCP schemas: 9,000 tokens ← loaded regardless
CLAUDE.md: 1,500 tokens
─────────────────────────────────────
Total fixed context: 31,000 tokens
# Even if you never use a GitLab tool in this session,
# you pay 9,000 tokens overhead on every API call.
With Tool Search (deferred — default v2.1+)
# Session starts — GitLab MCP connected
# Context window at session start:
System prompt: 3,900 tokens
Tool definitions: 16,600 tokens
GitLab MCP tool names: 500 tokens ← names only
CLAUDE.md: 1,500 tokens
─────────────────────────────────────
Total fixed context: 22,500 tokens
# GitLab tool schemas load only when Claude
# needs to call a specific GitLab tool.
Verify Tool Search is active
# Run this at the start of any session with MCP connected/mcp# Good — Tool Search active:GitLab MCP: connected
Status: deferred (tool search enabled)
Loaded: 0/35 tool schemas# Warning — eager loading (older config or Tool Search disabled):GitLab MCP: connected
Status: loaded
Loaded: 35/35 tool schemas [9,247 tokens]
Connect GitLab MCP. Tool Search loads schemas only for tools you actually call.
Mixed session — some coding, occasional GitLab checks
Connect GitLab MCP. With Tool Search active, schemas only load when needed.
Multiple MCP servers connected that you rarely use
Disconnect servers you don't need. Each server adds tool names at session start even with deferred loading.
# Add a server when neededclaude mcp add--transport http GitLab https://<your-instance>/api/v4/mcp# List connected servers/mcp# Remove a server when doneclaude mcp removeGitLab
Watch for large diff outputs. A GitLab MR diff on a large feature branch can return 30,000–50,000 tokens. Ask Claude to summarise the diff rather than loading it entirely: "Use GitLab MCP to get MR !847's diff, but only read and summarise the changes in src/auth/ — skip other directories."
Knowledge Check — Factor 09
Scenario: You're starting a 2-hour coding session to refactor the payment module. You have GitLab MCP connected from yesterday's MR review work. You won't need any GitLab operations today — just local file editing and testing. What should you do?
Show answer
B is correct. Tool Search significantly reduces (but doesn't eliminate) MCP overhead. Tool names still load at session start, and Claude may occasionally initiate tool discovery calls during complex reasoning. For a 2-hour pure coding session with no GitLab operations planned, disconnecting is the right call.
Agent Teams are Claude Code's most powerful capability and its most expensive. They multiply your token consumption approximately 7× per additional team member. Misuse can turn a $178/month developer into a $1,200+/month developer overnight. For further information refer to the Claude Code docs
TL;DR
Agent Teams are disabled by default and require explicit enabling. Use them only for genuinely parallelisable work — independent feature branches, concurrent test generation across separate modules. Never use them for tasks that can be done serially. Clean up active teammates immediately when done; idle teammates continue consuming tokens.
Anthropic: "Agent teams use approximately 7× more tokens than standard sessions when teammates run in Plan Mode, because each teammate maintains its own context window and runs as a separate Claude instance." A heavy agentic developer running Agent Teams daily moves from ~$178/month to ~$1,200+/month. Agent Teams are disabled by default and require explicit service-layer approval.
~7× token multiplier per team member · requires explicit approval
Single Agent vs Subagent vs Agent Team
Single Agent
Subagent
Agent Team
Context windows
1
2 (main + subagent)
1 per team member
Token multiplier
1×
~1.2–1.5×
~7× per member in Plan Mode
Enabled by default
Yes
Yes
No — requires explicit flag
Good for
All work
Research, verification, cost routing
Genuinely parallel independent tasks only
Service approval needed
No
No
Yes
When Agent Teams are appropriate
✓ Good use cases
Building 3 independent microservices simultaneously
Generating test suites for separate, unrelated modules in parallel
Running concurrent documentation generation across disconnected packages
Parallel migration scripts for independent database tables
Any work where tasks are truly independent and have no shared state
○ Wrong use cases — use serial instead
Sequential tasks disguised as parallel ones
Tasks that share files or modules
Work where output of task A feeds input of task B
Feature implementation (use Plan Mode + Sonnet instead)
Any task a single focused session could handle
Clean up immediately when done. Active Agent Team members continue consuming tokens even when idle between instructions. Always explicitly dismiss teammates when their task is complete. Leaving a 3-member team active overnight is a significant unintended cost event.
Knowledge Check — Factor 10
Scenario: You need to implement a new authentication service and write a comprehensive test suite for it. The tests depend on the implementation being complete. Should you use an Agent Team?
Show answer
B is correct. The tests depend on the implementation — they're inherently sequential, not parallel. An Agent Team would waste the 7× multiplier on two instances that can't actually work simultaneously. The right approach: single session, Plan Mode to design the service, implementation, then test generation. This is exactly what Plan Mode + Sonnet is designed for.
Extended thinking is enabled by default and significantly improves performance on complex reasoning tasks. It's also invisible — you don't see the thinking tokens, but you pay for them. Unconstrained, it can add substantial cost to sessions that don't need deep reasoning. For further information see the Claude Code docs
TL;DR
Use /effort low for routine DevSecOps tasks — documentation, formatting, simple fixes, pipeline triage. Use /effort high for security analysis, architectural decisions, novel problems where Claude needs to think carefully. The default (medium) is appropriate for most coding work. Never leave effort at high or max for entire sessions of routine work.
💰
Cost Impact: Significant when unmanaged — especially on high-volume simple tasks.
Anthropic: "Extended thinking is enabled by default because it significantly improves performance on complex planning and reasoning tasks. Thinking tokens are billed as output tokens." At Sonnet 4.6 regional rates, a 4,000-token thinking block costs $0.066 per call. In a 25-turn medium session, unconstrained thinking can add $1.65 to session cost on work that didn't need it.
4k thinking block at Sonnet: $0.066/call · 25-turn session: up to +$1.65
Controlling extended thinking
Method
Effect
When to use
/effort low
Reduces thinking depth to minimum. Claude still reasons but briefly.
Automation pipelines where deep reasoning is never needed
A practical session strategy
# Start a DevSecOps session — most tasks today are routine/effort lowAdd Google-style docstrings to the 4 public methods in @src/api/client.py# → Low effort. Fast, cheap, correct.Fix the typo on line 44 of @src/utils/validator.py: 'paramter' → 'parameter'# → Low effort. Trivial. Done.# Now a genuinely complex task/effort highReview @src/auth/oauth.py for security vulnerabilities. We're integrating with a third-party OAuth provider that supports token exchange. Identify any flows that could allow token substitution or replay attacks.# → High effort. Deep reasoning justified.# After the analysis, return to low effort for the rest of the session/effort low
Extended thinking vs Opus for seniors
Sonnet + high effort — when:
The problem is complex but well-defined
You have enough context to specify constraints clearly
The task is within Sonnet's 79.6% SWE-bench capability
Security threat modelling with ambiguous attack surfaces
Compliance analysis where multiple control interpretations exist
Sonnet has already failed twice at high effort
Knowledge Check — Factor 11
Scenario: You're about to start a session that will generate docstrings for 30 functions across 5 files. Which setup minimises cost?
Show answer
B is correct. Docstring generation is pattern completion — no deep reasoning needed. Haiku handles it well at 3× less cost than Sonnet. Low effort eliminates the thinking block overhead. Starting with /model haiku and /effort low before the first prompt sets both optimisations for the whole session. Option D (Sonnet + high effort) would be the most expensive possible choice for the task with no quality benefit.
Without instrumentation, cost optimisation is guesswork. The Claude Code service and CLI give you several tools to see exactly what's consuming tokens before you try to fix it.
TL;DR
Run /context at the start of any session where you suspect bloat. Run /cost at the end of sessions to understand total usage. Check the platform dashboard monthly for team-level patterns. If your cache hit rate is under 70%, your DevSecOps regular input per turn is over 15,000 tokens, or your CLAUDE.md exceeds 3,000 tokens on a spec-driven project — something is wrong and fixable.
📊
Cost Impact: Foundation — Enables All Other Optimisations.
You can't fix what you can't measure. Token telemetry tells you whether your CLAUDE.md is too large, whether your sessions are drifting, whether your cache is actually warm, whether a spec-driven project has embedded spec in CLAUDE.md, and whether a particular developer is consuming 10× more than peers. Without these signals, optimisation is guesswork.
In-session commands
Command
What it shows
When to use
/context
Full breakdown of what's in context: system prompt, CLAUDE.md, files, conversation history, tool definitions. Shows token counts per component.
At session start when cost seems high. After connecting MCP servers. When debugging unexpected token usage. At Phase 2 start to verify spec is loaded.
/cost
Total cost for the current session: input tokens, output tokens, cache writes, cache reads, estimated cost.
At the end of any session to understand where money went. After Agent Team sessions to verify the multiplier. After Phase 1 spec sessions to confirm Opus cost.
/tokens
Current token count and how close you are to the context limit.
Before deciding whether to /compact or /clear. In long agentic and spec-driven Phase 2 sessions to track growth.
Key health metrics — all patterns
>70%Target cache hit rate — below this means sessions are too scattered or CLAUDE.md is changing too often
<15kTarget regular input tokens per turn in DevSecOps and spec-driven Phase 2 — above this suggests session drift or broad prompting
<50%Cache hit rate — action required. Review session grouping and CLAUDE.md stability.
Additional health metrics — spec-driven agentic
>80%Target cache hit rate in Phase 2 execution — spec as stable fixed context should improve hit rate above the standard 70% target
>18kRegular input per turn in Phase 2 — indicates spec not being used; Claude is still file-exploring. Review spec scope and Phase 2 prompt.
>3kCLAUDE.md tokens on a spec-driven project — spec has been embedded in CLAUDE.md. Strip it to SPEC.md immediately.
Reading a /context breakdown
/context# Example output — healthy DevSecOps sessionSystem prompt: 3,920 tokens [cached]
Tool definitions: 16,580 tokens [cached]
CLAUDE.md: 1,480 tokens [cached]
GitLab MCP names: 490 tokens [cached]
─────────────────────────────────────────────
Fixed context: 22,470 tokens [all cached — warm]
Conversation (turns 1–6):
Turn 1 user: 380 tokens
Turn 1 assistant: 890 tokens
Turn 2 user: 210 tokens
Turn 2 assistant: 1,440 tokens
...
Regular input: 8,430 tokens [not cached — billed at full rate]
─────────────────────────────────────────────
Total this call: 30,900 tokens# Warning signs — all patterns:CLAUDE.md: 9,800 tokens ← bloated — review and trim
Regular input: 42,000 tokens ← session drift — consider /clear
GitLab MCP schemas: 8,700 tokens ← Tool Search not active# Warning signs — spec-driven Phase 2 specifically:CLAUDE.md: 4,200 tokens ← likely contains embedded spec — extract to SPEC.md
Regular input: 18,000 tokens ← spec not being used; Claude still file-exploring
SPEC.md: 0 tokens ← spec not loaded in context; Phase 2 running without it# Healthy spec-driven Phase 2 context:CLAUDE.md: 850 tokens ← tight, actionable only
SPEC.md: 2,100 tokens [cached from Phase 1]
Regular input: 11,400 tokens ← file exploration eliminated; within target
Monthly review cadence. Check your admin dashboard monthly. Look for: developers with consistently high regular input per turn (broad prompting), low cache hit rates (session grouping problem), unexpected Agent Team costs, or CLAUDE.md above 3,000 tokens on spec-driven projects. One developer paying 3× peers on the same work type almost always has a fixable behaviour driving the gap.
Knowledge Check — Factor 12
Scenario: You run /context at turn 5 of a DevSecOps session and see: CLAUDE.md = 8,200 tokens, Regular input = 38,000 tokens, Cache hit rate = 45%. What are the two most urgent issues to fix?
Show answer
B is correct. Two clear signals: CLAUDE.md at 8,200 tokens is 5× the efficient target — that's cached prose and history, not actionable instructions. Regular input at 38,000 tokens at turn 5 means the session has already drifted across multiple unrelated topics. The 45% cache hit rate confirms the session pattern is broken. Immediate actions: end this session, trim CLAUDE.md to under 1,500 tokens of actionable instructions, then restart with a single-task specific prompt.
Monthly cost reference (Sonnet 4.6 standard developer)
Pattern
Use case
Light / mo
Medium / mo
Heavy / mo
DevSecOps
Standalone
$45.84
$61.72
$87.09
DevSecOps
+ GitLab MCP
$60.86
$79.37
$111.46
Agentic
Standalone
$44.04
$69.23
$177.93
Agentic
+ GitLab MCP
$54.04
$77.68
$190.68
Spec-Driven
Standalone (est.)
~$40
~$60
~$155
Spec-Driven
+ GitLab MCP (est.)
~$49
~$68
~$167
Spec-driven estimates include Phase 1 Opus overhead (1 spec per 5 sessions), Phase 3 Haiku conformance, and ~48% regular input reduction in Phase 2. Marked est. — actual results depend on spec quality and sprint cadence.
When Opus approval is warranted
Task
Pattern
Approved?
Why
Spec authoring — greenfield service or controlled migration sprint
Spec-Driven
Yes — all tiers
Phase 1 only. One Opus session amortised across 40–65 Sonnet execution turns. ARC-AGI-2 advantage most material for interface design and scope decisions. Standard developers eligible.