A Guide to Claude Code 2.0 and Getting Better at Using Coding Agents¶

English Summary¶

Sankalp, a heavy Claude Code user and developer advocate, delivers the most comprehensive deep-dive into Claude Code 2.0 to date—not merely explaining how to use it, but revealing why it works through reverse-engineered prompts, tool schemas, and context engineering fundamentals. The guide emerges from his lived journey: Claude Code (June-Sept 2025) → Codex exodus during Anthropic's outages → Opus 4.5 redemption arc (Nov 2025), ultimately choosing Claude for its speed, communication quality, and "soul" despite Codex's superior raw capability.

Core Thesis: Mastering coding agents requires understanding the scaffolding beneath—tool calling mechanics, context windows as limited "attention budgets," agentic sub-agent orchestration. Learning Claude Code's architecture transfers directly to Codex, OpenCode, Cursor because they all drew inspiration from its harness design.

Key Technical Insights:

Sub-agents as Specialized Workers - Claude Code spawns sub-agents via the Task tool: Explore (read-only search, fresh context), general-purpose (full tools, inherits context), Plan (architect mode). The Explore agent's strict prompt engineering ("STRICTLY PROHIBITED from creating files") demonstrates how hard accurate tool calling is to achieve.
Context Engineering Fundamentals - Agents are "token guzzlers": every tool call + result stays in context because LLMs are stateless. A single task can consume 6K+ tokens. Context retrieval degrades with length—effective windows are only 50-60% of nominal size (Opus 4.5: 200K → ~100K effective). Don't start complex work mid-conversation; use /compact or /clear.
System Reminders Combat Context Rot - Claude injects <system-reminder> tags into user messages and tool results to "recite objectives into the end of context," avoiding "lost-in-the-middle" issues. Todo lists persist during compaction as state preservation.
Skills as On-Demand Expertise - Skills load domain knowledge just-in-time (like Neo downloading kung fu), avoiding system prompt bloat. Plugins package skills + commands + hooks + MCP servers into distributable units. The popular frontend-design plugin is just a skill with SKILL.md.
Hooks for Behavior Modification - Hooks observe agent loop stages (before tool call, after result) to inject reminders or modify behavior. Combined with skills and reminders, they enable heavy customization without forking the codebase.

Quality-of-Life Evolution (CC 2.0.56 → 2.0.74): - Syntax highlighting (2.0.71) - eliminated need to open Cursor for review - Checkpointing (Esc+Esc / /rewind) - major feature request, can rewind code + conversation - Prompt suggestions (2.0.73) + history search (Ctrl+R) - "token guzzler machine" - Ultrathink mode - spam for rigorous tasks - Background agents (2.0.60) - monitor logs/errors without blocking main loop - Fuzzy file search - 3x faster

Sankalp's Workflow: - Exploration: Ask Opus 4.5 tons of questions (great explainer, ASCII diagrams, May'25 cutoff). Spam /ultrathink before execution. - Execution: Micro-manage changes, use "throw-away first draft" for complex features (new branch → let Claude write end-to-end → compare against mental model → run sharper second iteration with learned insights). - Review: GPT-5.2-Codex for code review and bug finding (marks P1/P2 severity, fewer false positives). "Better than code review products."

Why Opus 4.5 Over Codex (Despite Lower Raw Capability): - Faster: Similar tasks in much less time (better thinking efficiency + higher throughput) - Better communicator: Conversational tone vs Codex's nested bullets, higher contrast UI (Codex's thin font + light thinking traces strain eyes) - Intent detection: Less likely to ignore instructions and make unwanted changes - Soul: Amanda Askell post-trained personality back into Opus 4.5 after it was diluted in Sonnet 4/Opus 4/4.1

MCP Code Execution Insight: Instead of tool definitions bloating context, expose code APIs + give Claude a sandbox environment to write code that calls tools—"prompt on demand" like skills.

The Harness Carries You: Claude's engineering is so thorough that it auto-decides which sub-agent to spawn, which command/tool to run, what to background. "Your task is mainly to use judgement and prompt in right direction."

Augmentation Framework (vs "Keeping Up"): 1. Stay updated with tooling - Use these tools daily, track releases 2. Upskill in domain - More knowledge → better prompts, converting unknown unknowns to known unknowns. "Experience builds judgement and taste—what differentiates professional devs from vibe-coders." 3. Play + open mind - Try SoTA models, ask them to do things you think they can't. Build intuition.

繁體中文總結¶

Sankalp 是 Claude Code 重度使用者，這篇文章是迄今為止最詳盡的 Claude Code 2.0 深度指南——不只教「如何使用」，更揭露「為何有效」：透過逆向工程的 prompts、tool schemas、context engineering 基礎原理。

核心論述： 要真正掌握 coding agents，必須理解底層機制——tool calling 運作方式、context window 是有限的「注意力預算」、agentic sub-agent 編排。學會 Claude Code 的架構知識可以直接轉移到 Codex、OpenCode、Cursor，因為它們都從 CC 的 harness 設計汲取靈感。

作者旅程（Lore）： - 2025 年 6-9 月：主力 Claude Code - 9-10 月：因 Anthropic 大量 API outages + Sonnet 4/Opus 4 表現不佳，切換到 Codex（GPT-5-codex 當時更強） - 11 月 24 日：Opus 4.5 發布 → 回歸 Claude Code - 為何 Opus 4.5 feels good： 更快（thinking 效率 + throughput 雙高）、溝通更好（對話式 vs Codex 巢狀列表）、意圖檢測更準、有「靈魂」（Amanda Askell post-trained 人格回來了）

技術深度洞察：

1. Sub-agents 作為專業工人¶

Claude Code 透過 Task tool 產生 sub-agents： - Explore: 唯讀搜尋專家（Glob、Grep、Read、有限 Bash），從零開始不繼承 context（搜尋任務通常獨立） - general-purpose: 完整工具權限，繼承完整 context - Plan: 軟體架構師，設計實作計劃

Explore agent 的 prompt 非常嚴格（"STRICTLY PROHIBITED from creating files"），展示了「讓 tool calling 精準運作有多困難」。

2. Context Engineering 基礎¶

Agents 是 token guzzlers： 每個 tool call + tool result 都會加到 context（因為 LLMs 是 stateless）。單一任務可以消耗 6K+ tokens。

Context rot（上下文衰退）： Context retrieval 效能隨每個新 token 降低。有效 context window 可能只有名義值的 50-60%（Opus 4.5: 200K → 實際約 100K 有效）。

建議： 不要在對話過半時開始複雜任務。使用 /compact 或 /clear 重新開始。

3. System Reminders 對抗 Context Degradation¶

Claude 會在 user messages 和 tool results 插入 <system-reminder> tags，「把目標重複誦讀到 context 尾端」，避免「lost-in-the-middle」問題。Todo lists 在 compaction 時保留，作為狀態持久化機制。

4. Skills：按需載入的專業知識¶

Skills 是包含 SKILL.md 的資料夾 + 可執行腳本。Meta-data 加入 system prompt，當 Claude 覺得相關時會 tool call 讀取內容，「就像駭客任務裡 Neo 下載功夫」。

避免 system prompt bloat（因為不用預先載入所有 domain knowledge）。

Plugins 是打包機制：將 skills + commands + hooks + MCP servers 封裝成可分發單元（via /plugins）。熱門的 frontend-design plugin 其實就是一個 skill。

5. Hooks：行為修改機制¶

Hooks 觀察 agent loop 階段（tool call 前/結果後），注入 reminders 或修改行為。結合 skills + reminders 可以做到高度客製化，不需要 fork codebase。

CC 2.0 生活品質改進（2.0.56 → 2.0.74）： - 語法高亮（2.0.71）：作者因此幾乎不再打開 Cursor 審查程式碼 - Checkpointing（Esc+Esc / /rewind）：可回到特定 checkpoint，重要功能請求 - Prompt suggestions（2.0.73）+ 歷史搜尋（Ctrl+R）：「token guzzler machine」 - Ultrathink 模式：遇到困難任務時 spam ultrathink 讓模型更嚴謹 - 背景 agents（2.0.60）：監控 logs/errors 不阻塞主迴圈 - Fuzzy file search：快 3 倍

Sankalp 的 Workflow： 1. 探索階段： 問 Opus 4.5 大量問題（超會解釋、畫 ASCII 圖表、有 May'25 知識截止日期）。執行前 spam /ultrathink。 2. 執行階段： 密切監控變更（micro-manage）。對複雜功能使用「throw-away first draft」：開新 branch → 讓 Claude 端到端寫完 → 比較輸出與心智模型 → 用學到的東西跑更精準的第二輪迭代。 3. 審查階段： 用 GPT-5.2-Codex 做 code review 和找 bugs（會標 P1/P2 severity，false positives 更少）。「比 code review 產品更好。」

為何選 Opus 4.5 而非 Codex（儘管後者 raw capability 更強）： - 更快： 類似任務用更少時間（thinking 效率 + throughput 雙高） - 更好的溝通者： 對話式語調 vs Codex 巢狀列表，UI 高對比（Codex 細字 + 淡色 thinking traces 造成眼睛疲勞） - 意圖檢測： 較不會忽略指示、擅自變更 - 靈魂： Amanda Askell 把人格 post-train 回 Opus 4.5（Sonnet 4/Opus 4/4.1 時被稀釋了）

MCP Code Execution 洞察： 與其讓 tool definitions 塞爆 context，不如暴露「code APIs」+ 給 Claude sandbox 環境寫程式呼叫 tools——類似 skills 的「prompt on demand」。

Harness 重度承載： Claude 的工程設計非常完善，它會自動決定要 spawn 哪個 sub-agent、跑哪個 command/tool、哪些要背景執行。「你的任務主要是運用判斷力、把它導向正確方向。」

增強框架（vs「跟上」）： 1. 保持工具更新 - 天天用這些工具，追蹤 releases 2. 提升領域技能 - 知識愈多 → prompts 愈好，把 unknown unknowns 轉為 known unknowns。「經驗建立判斷力與品味——這是專業開發者與 vibe-coders 的差別。」 3. 多玩 + 開放心態 - 試 SoTA 模型，問它們做你覺得做不到的事。建立直覺。

Key Quotes¶

"Claude Code dominated the CLI coding product experience this year and all the CLI products like Codex, OpenCode, Amp CLI, Vibe CLI and even Cursor have heavily taken inspiration from it. This means learning how things work in Claude Code directly transfers to other tools both in terms of personal usage and production grade engineering."

"It's important that the model goes through each of the relevant files itself so that all that ingested context can attend to each other. That's the high level idea of attention. Make context cross with previous context. This way model can extract more pair-wise relationships and therefore better reasoning and prediction."

"The harness is so heavily engineered that Claude knows which sub-agent to spawn, what command/tool call/skill to run, what to run in async manner. It's able to heavy carry the agent loop that your task is mainly to use your judgement and prompt it in right direction."

"By constantly rewriting the todo list, Manus is reciting its objectives into the end of the context. This pushes the global plan into the model's recent attention span, avoiding 'lost-in-the-middle' issues and reducing goal misalignment."

"Agents are token guzzlers. Both the tool call and the tool call outputs are added to the context so that the LLM can know the results. This is because LLMs are stateless—they don't have memory outside the context window."

"Context retrieval performance of LLMs degrades as every new token is introduced. Think of context as a limited 'attention budget'. A rough corollary one can draw is effective context windows are probably 50-60% or even lesser."

"I don't claim this but many people love Claude Opus 4.5 for its personality and the way it talks—some referring to it as Opus 4.5 having soul. This trait was somewhat lesser in Sonnet 3.7, Sonnet 4, Opus 4, Opus 4.1 but it came back in Opus 4.5. Amanda Askell post-trained the soul into Claude haha."

"Experience builds judgement and taste—that's what differentiates professional devs from vibe-coders. Since implementation is much faster now, you can spend more time on taste refinement."

"With skills, you don't have to [bloat the system prompt] as the model loads it on-demand. This is especially useful when you are not sure if you require those instructions always. 'I know Kung Fu'—Skills load on-demand, just like Neo in The Matrix (1999)."

Personal Reflection¶

這篇文章是 Claude Code 社群迄今為止最寶貴的技術資產之一，因為它做到了三件稀缺的事：

揭露黑箱：逆向工程 Explore agent prompt、Task tool schema、system-reminders 機制，讓使用者理解「為何有效」而非只會用。這種洞察在官方文件中幾乎不可能獲得。
經驗校準：Sankalp 的旅程（Claude → Codex → Claude）不是 fanboy 式盲目擁護，而是基於數月實戰的校準：Opus 4.5 不是最強（Codex raw capability 略勝），但它更快、溝通更好、有靈魂。這種細膩的trade-off 分析極其珍貴。
遷移性知識：文章最關鍵的論述是「學 Claude Code 的知識可以遷移到 Codex/OpenCode/Cursor」——這意味著投資理解 CC 架構（sub-agents、context engineering、skills/hooks）不是學一個工具，而是學整個 coding agent 範式。

對我們的啟發：

Context Engineering 是 AI 時代的核心技能：理解 token 如何消耗、context rot 如何發生、如何用 reminders/compaction 對抗衰退。這不只適用於 coding agents，也適用於所有 LLM 應用。
Sub-agents 設計模式值得借鑒：Explore（唯讀、不繼承 context）vs general-purpose（繼承 context）的設計，展示了如何根據任務特性選擇 context 策略——這可以應用到我們自己的 agent 系統（如 OpenClaw 的 subagent 機制）。
Skills as On-Demand Loading 是優雅解決方案：避免 system prompt bloat，只在需要時載入 domain knowledge。這與 OpenClaw 的 skills 系統設計不謀而合，驗證了我們的方向正確。
Workflow 洞察：Throw-away First Draft：對複雜功能先讓 AI 端到端寫完、比較與心智模型差異、再跑精準第二輪——這是利用 AI 速度優勢的聰明策略，值得我們在開發 OpenClaw 時採用。
「Harness 重度承載」的啟示：當工具設計足夠好時，使用者的任務變成「運用判斷力、導向正確方向」而非手動 micro-manage。這是我們設計 OpenClaw skills/agents 的北極星：讓系統自動做對的事，使用者只需高層指導。

關鍵 takeaway： 不要只學「怎麼用 Claude Code」，要學「Claude Code 背後的架構思想」——context engineering、sub-agent orchestration、on-demand loading、system reminders。這些是可遷移的範式，適用於任何 AI agent 系統。