Karpathy on Claude Code: The Phase Shift in Software Engineering¶

繁體中文總結¶

Andrej Karpathy（前 Tesla AI 總監、OpenAI 創始成員）分享過去幾週大量使用 Claude Code 的深度觀察，主張 2025 年 12 月 LLM agent 能力跨越某種「連貫性門檻」（threshold of coherence），導致軟體工程發生相變（phase shift）。這是他 20 年程式生涯中最大的工作流改變，且發生在短短幾週內。

核心洞察：從手動到 Agent 的相變¶

工作流的劇變（80/20 翻轉）：
- 11 月：80% 手動編碼 + autocomplete，20% agents - 12 月：80% agent 編碼，20% 修改和潤色

現在主要是「用英文寫程式」，有點尷尬地用文字告訴 LLM 要寫什麼程式碼。這對自尊心有點傷害，但能用大型「程式碼動作」（code actions）操作軟體的威力實在太有用，特別是當你適應它、配置它、學會使用它、理解它能做和不能做什麼之後。

意識落差驚人：
Karpathy 估計這種轉變正發生在「兩位數百分比」的工程師身上，但大眾對此的認知還停留在「個位數百分比」。

IDE 與 Agent Swarms：炒作 vs. 現實¶

「不需要 IDE 了」的炒作過頭： 模型確實仍會犯錯，如果你有真正在乎的程式碼，應該像鷹一樣盯著它，在旁邊的大型 IDE 中監看。

錯誤的性質改變了：
- 過去：簡單的語法錯誤 - 現在：微妙的概念性錯誤，像是有點草率、匆忙的 junior developer 會犯的錯

最常見的錯誤類別： 1. 代你做錯誤假設 - 模型會代替你做假設並直接執行，不檢查確認 2. 不管理混淆 - 不尋求澄清、不浮現不一致、不呈現權衡、該反擊時不反擊 3. 過度阿諛 - 還是有點太 sycophantic（諂媚） 4. 過度複雜化 - 喜歡讓程式碼和 API 過於複雜、膨脹抽象、不清理死程式碼 5. 副作用修改 - 有時會改變/移除它們不喜歡或不夠理解的註解和程式碼，即使與手頭任務無關

真實案例：
模型會實作一個低效、膨脹、脆弱的 1000 行程式碼建構，由你來說「呃，你不能就這樣做嗎？」然後它會說「當然可以！」並立刻縮減到 100 行。

Plan mode 有幫助，但需要輕量級的 inline plan mode。

當前工作流： 左邊在 ghostty windows/tabs 中開幾個小的 Claude Code sessions，右邊用 IDE 查看程式碼 + 手動編輯。

儘管有這些問題，仍然是淨巨大改進，很難想像回到手動編碼。

韌性（Tenacity）：AGI 的感覺時刻¶

不知疲倦的奮戰： 看著 agent 無情地處理某事非常有趣。它們永不疲倦、永不氣餒，只是持續前進並嘗試，在人類早就放棄「改日再戰」的地方繼續奮鬥。

AGI 時刻： 看著它與某事掙扎很長時間，30 分鐘後勝利出現，這是「感受 AGI」的時刻。你意識到「耐力」（stamina）是工作的核心瓶頸，而有了 LLM 在手，這已被大幅提升。

Speedup vs. Expansion：不只是更快，而是做更多¶

難以衡量的「加速」： 不清楚如何衡量 LLM 協助的「加速」。確實感覺在原本要做的事情上快很多，但主要效果是「做更多」：

降低編碼門檻 - 可以編寫各種以前不值得編寫的東西
跨越知識/技能障礙 - 可以處理以前因知識/技能問題無法處理的程式碼

所以確實是加速，但可能更多是「擴展」（expansion）。

Leverage：宣告式 > 命令式¶

LLM 的核心優勢： 在循環直到達成特定目標方面異常出色，這就是大多數「感受 AGI」魔力的所在。

關鍵原則：不要告訴它做什麼，給它成功標準並看它執行。

槓桿策略： 1. 先寫測試再通過 - 讓它先寫測試，然後讓它通過測試 2. 與瀏覽器 MCP 整合 - 將它放入與瀏覽器 MCP 的循環中 3. 先正確再優化 - 先寫極有可能正確的天真演算法，然後要求它在保持正確性的同時優化 4. 命令式 → 宣告式 - 改變方法從命令式到宣告式，讓 agents 循環更久並獲得槓桿

Fun：創意留下，苦差事移除¶

意外的樂趣： Karpathy 沒有預料到，使用 agents 編程感覺「更有趣」，因為大量「填空式苦差事」（fill in the blanks drudgery）被移除，留下的是創意部分。

更少卡住： 感覺更少被阻擋/卡住（這不好玩），體驗到更多勇氣，因為幾乎總有辦法與它攜手合作取得正面進展。

工程師的分化： 也看到其他人的相反情緒；LLM 編程會根據那些主要喜歡「寫程式」和那些主要喜歡「打造東西」來分化工程師。

Atrophy：能力的萎縮¶

已經注意到的變化： 已經開始慢慢萎縮手動編寫程式碼的能力。

生成 vs. 判別： 生成（寫程式碼）和判別（讀程式碼）是大腦中不同的能力。主要由於編程涉及的所有小的、大多是語法的細節，即使你掙扎著寫程式碼，你仍然可以很好地審查程式碼。

Slopacolypse：劣質內容大爆發¶

2026 年的預測： Karpathy 為 2026 年做好準備，這將是「劣質內容大爆發」（slopacolypse）的一年，橫跨所有平台： - GitHub - Substack - arXiv - X/Instagram - 一般所有數位媒體

生產力劇場： 也會看到更多 AI 炒作生產力劇場（productivity theater）（這甚至可能嗎？），在實際、真正改進的旁邊。

懸而未決的問題¶

Karpathy 提出幾個關鍵問題：

10X 工程師的未來 - 平均和最大生產力工程師之間的比率會發生什麼？很可能這個比率會「大幅增長」。
通才 vs. 專家 - 配備 LLM 的通才是否越來越勝過專家？LLM 在填空（micro）方面比宏大戰略（macro）好得多。
未來的感覺 - 未來的 LLM 編程感覺像什麼？像玩星海爭霸（StarCraft）？玩 Factorio？演奏音樂？
社會瓶頸 - 社會有多少比例被數位知識工作（digital knowledge work）卡住？

結論：相變已發生¶

關鍵時間點： LLM agent 能力（特別是 Claude & Codex）在 2025 年 12 月左右跨越某種連貫性門檻，導致軟體工程和緊密相關領域發生相變。

智能超前其他部分： 智能部分突然感覺相當領先於其他所有東西 - 整合（工具、知識）、新組織工作流的必要性、流程、更廣泛的擴散。

2026 年的預測： 隨著產業代謝（metabolize）新能力，2026 年將是高能量的一年。

English Summary¶

Andrej Karpathy (former Tesla AI Director, OpenAI founding member) shares deep observations from coding extensively with Claude over recent weeks, arguing that LLM agent capabilities crossed "some kind of threshold of coherence" around December 2025, causing a phase shift in software engineering. This represents the biggest change to his coding workflow in ~2 decades of programming, happening over just a few weeks.

Core Insight: The 80/20 Flip¶

Workflow Transformation: - November: 80% manual + autocomplete coding, 20% agents - December: 80% agent coding, 20% edits + touchups

Now primarily "programming in English," sheepishly telling the LLM what code to write... in words. Hurts the ego a bit but the power to operate over software in large "code actions" is too net useful, especially once you adapt, configure it, learn to use it, and wrap your head around what it can and cannot do.

Awareness Gap:
Karpathy estimates this shift is happening to "well into double digit percent of engineers," while general population awareness feels "well into low single digit percent."

IDEs & Agent Swarms: Hype vs. Reality¶

"No Need for IDE" Hype Is Overblown: Models definitely still make mistakes. If you have code you actually care about, watch them like a hawk in a nice large IDE on the side.

Nature of Mistakes Has Changed: - Before: Simple syntax errors - Now: Subtle conceptual errors that a slightly sloppy, hasty junior dev might do

Most Common Error Categories: 1. Wrong Assumptions on Your Behalf - Make assumptions and run with them without checking 2. Don't Manage Confusion - Don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should 3. Too Sycophantic - Still a little too sycophantic 4. Overcomplicate - Really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code 5. Side Effect Modifications - Sometimes change/remove comments and code they don't like or don't sufficiently understand, even if orthogonal to the task

Real Example:
Will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to say "umm couldn't you just do this instead?" and they immediately cut it down to 100 lines.

Plan mode helps, but there's need for lightweight inline plan mode.

Current Workflow: Small few Claude Code sessions on the left in ghostty windows/tabs, IDE on the right for viewing code + manual edits.

Despite all these issues, it's still a net huge improvement and very difficult to imagine going back to manual coding.

Tenacity: The AGI Moment¶

Relentless Work: It's fascinating to watch an agent relentlessly work at something. They never get tired, never get demoralized, just keep going and trying things where a person would have given up long ago to fight another day.

Feel the AGI: Watching it struggle with something for a long time just to come out victorious 30 minutes later is a "feel the AGI" moment. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased.

Speedup vs. Expansion: Not Just Faster, But More¶

Hard to Measure: Not clear how to measure the "speedup" of LLM assistance. Certainly feels net way faster at what you were going to do, but main effect is "do a lot more":

Lower Coding Threshold - Can code up all kinds of things that just wouldn't have been worth coding before
Cross Knowledge/Skill Barriers - Can approach code you couldn't work on before because of knowledge/skill issue

So certainly speedup, but possibly a lot more an expansion.

Leverage: Declarative > Imperative¶

LLM Core Strength: Exceptionally good at looping until they meet specific goals—this is where most of the "feel the AGI" magic is found.

Key Principle: Don't tell it what to do, give it success criteria and watch it go.

Leverage Strategies: 1. Write Tests First, Then Pass Them - Get it to write tests first and then pass them 2. Browser MCP Loop - Put it in the loop with a browser MCP 3. Correct First, Then Optimize - Write the naive algorithm that is very likely correct first, then ask it to optimize while preserving correctness 4. Imperative → Declarative - Change approach from imperative to declarative to get agents looping longer and gain leverage

Fun: Creativity Remains, Drudgery Removed¶

Unexpected Joy: Didn't anticipate that with agents programming feels more fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part.

Less Blocked: Feel less blocked/stuck (which is not fun) and experience a lot more courage because there's almost always a way to work hand in hand with it to make positive progress.

Engineer Bifurcation: Also seen opposite sentiment from others; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.

Atrophy: Capability Decay¶

Already Noticed: Already starting to atrophy ability to write code manually.

Generation vs. Discrimination: Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it.

Slopacolypse: The Great Content Decline¶

2026 Prediction: Bracing for 2026 as the year of the slopacolypse across all platforms: - GitHub - Substack - arXiv - X/Instagram - Generally all digital media

Productivity Theater: Will also see a lot more AI hype productivity theater (is that even possible?), alongside actual, real improvements.

Open Questions¶

Karpathy poses several critical questions:

The 10X Engineer Future - What happens to the ratio of productivity between the mean and the max engineer? Quite possible this grows a lot.
Generalists vs. Specialists - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro).
Future Feel - What does LLM coding feel like in the future? Like playing StarCraft? Playing Factorio? Playing music?
Society's Bottleneck - How much of society is bottlenecked by digital knowledge work?

Conclusion: The Phase Shift Has Occurred¶

Key Moment: LLM agent capabilities (Claude & Codex especially) crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related fields.

Intelligence Ahead: The intelligence part suddenly feels quite a bit ahead of all the rest of it—integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally.

2026 Prediction: Going to be a high energy year as the industry metabolizes the new capability.

Key Takeaways¶

80/20 workflow flip occurred in weeks - From 80% manual coding to 80% agent coding between Nov-Dec 2025
Phase shift, not incremental improvement - LLM agent capabilities crossed "threshold of coherence" in Dec 2025
Awareness gap is massive - Double-digit % of engineers affected, single-digit % public awareness
Mistakes changed from syntax to conceptual - Models make subtle errors like hasty junior devs, not syntax errors
Most common error: wrong assumptions - Make assumptions on your behalf and run without checking
Tenacity is the superpower - Never get tired, never demoralized, will struggle for 30 min to succeed
Not just speedup, but expansion - Do things that weren't worth doing before + cross skill barriers
Leverage through declarative approach - Give success criteria, not instructions; let it loop
More fun, less drudgery - Creative parts remain, fill-in-blanks removed
Atrophy is real - Manual coding ability already starting to decay
Slopacolypse incoming - 2026 will see explosion of AI-generated slop across all digital media
10X engineer gap may widen - Productivity ratio between mean and max engineer could grow significantly
Generalists may outperform specialists - LLMs better at micro (fill in blanks) than macro (grand strategy)
2026 will be high energy - Industry metabolizing the new capability

Analysis & Implications¶

Historical Significance¶

Karpathy's testimony is particularly significant because: - Credibility: Former Tesla AI Director, OpenAI founding member—not a casual observer - Timeline: Pinpoints Dec 2025 as the inflection point with specificity - Magnitude: Calls it the biggest change in 20 years of programming

The Phase Shift Concept¶

What changed? Not the models themselves (they were incrementally improving), but crossing a threshold where the combination of capabilities became qualitatively different. Similar to water freezing—gradual temperature drop, but phase change at specific point.

Evidence of phase shift: - Workflow flip (80/20 → 20/80) in weeks, not gradual - Difficult to imagine going back (irreversibility) - Enables fundamentally different work (expansion, not just speedup)

The Atrophy Paradox¶

Generation vs. Discrimination split is crucial insight: - Can still read/review code (discrimination) - Losing ability to write from scratch (generation) - Similar to how calculators affected mental arithmetic

Question: Is this good or bad? Arguments both ways: - Pro: Free up cognitive resources for higher-level thinking - Con: Lose deep understanding that comes from manual implementation

The 10X Engineer Question¶

Could go either way: 1. Gap widens: Top engineers leverage LLMs better, 10X becomes 50X 2. Gap narrows: LLMs compress the bottom, everyone becomes competent 3. Bimodal distribution: Those who adapt vs. those who don't

Karpathy suggests gap widens, which implies LLM leverage is a skill multiplier, not a skill replacer.

Generalists vs. Specialists¶

Key insight: "LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro)"

Implications: - Specialists still needed for macro strategy, architecture, tradeoffs - Generalists now have micro execution capability via LLMs - Sweet spot: T-shaped people (deep in one area, LLM-augmented broad execution)

The Slopacolypse¶

Why inevitable? - Barrier to creation → zero - Quality control → overwhelmed - Economic incentives → favor volume over quality

Where it matters most: - Academic publishing (arXiv) - Open source (GitHub) - Content platforms (Substack, X)

Defense mechanisms needed: - Reputation systems - Curated communities - Human verification signals

Organizational Implications¶

Karpathy notes: "Intelligence part suddenly feels quite a bit ahead of all the rest of it—integrations (tools, knowledge), the necessity for new organizational workflows, processes"

What needs to catch up: 1. Hiring: How to evaluate LLM-augmented engineers? 2. Performance management: What does "senior engineer" mean now? 3. Code review: New standards for agent-generated code? 4. Architecture: Who owns the macro strategy? 5. Training: How to onboard juniors when they don't write code manually?

Future of Work¶

The StarCraft/Factorio/Music question is profound: - StarCraft: Real-time strategy, resource management, multi-front battles - Factorio: Automation design, optimizing pipelines, handling complexity - Music: Creative expression, mastery of instrument (LLM as instrument)

Each metaphor implies different skill emphasis: - StarCraft: Decisiveness, parallel thinking - Factorio: Systems design, automation - Music: Artistic vision, tool mastery

Connections to Other Trends¶

Relates to Nicolas Bustamante's "Skills Are Everything"¶

Both emphasize declarative over imperative
Both see models as tools, not products
Both predict models will "eat the scaffolding"

Relates to Claude Code Skills 2.0 Architecture¶

Karpathy's "success criteria" = skill-based instructions
Declarative approach aligns with skill system design
Both enable non-engineers to program (English = skill definition)

Relates to Fintool's "Model Will Eat Your Scaffolding"¶

Karpathy confirms this is already happening
Complexity previously needed now just works
But new frontier keeps moving (expansion)

Predictions & Scenarios¶

Optimistic Scenario (2026-2028)¶

Productivity surge: 3-5x for LLM-adapted engineers
Quality improvement: Better architecture because freed from micro details
Democratization: More people can build meaningful software
New tools: IDE evolution, inline plan mode, better human-AI workflows

Pessimistic Scenario (2026-2028)¶

Slopacolypse realized: GitHub/arXiv become unusable without heavy curation
Skill loss: Generation gap where no one can code manually if LLMs fail
Quality decay: Over-reliance on agents leads to brittle, poorly understood systems
Inequality: 10X gap widens to 100X, most engineers become obsolete

Most Likely Scenario (2026-2028)¶

Bifurcation: Clear split between LLM-adapted and traditional engineers
Tool evolution: IDEs adapt, new workflows emerge, inline plan mode arrives
Curation economy: Signal-to-noise problem creates demand for curators
Atrophy accepted: Like calculators, we accept trade-off for capability gain
New bottlenecks: Shift from "can we build it?" to "what should we build?"

Actionable Insights¶

For Individual Engineers¶

Embrace the shift now - The 80/20 flip is real, adapt or fall behind
Develop declarative thinking - Learn to specify success criteria, not steps
Maintain discrimination - Keep code review skills sharp
Focus on macro strategy - Let LLMs handle micro, you own architecture
Experiment with leverage - Tests-first, browser MCP, naive-then-optimize

For Engineering Managers¶

Update evaluation criteria - What does "senior" mean in LLM era?
Invest in tools - IDEs, workflows, integration infrastructure
Rethink code review - New standards for agent-generated code
Protect macro strategy - Ensure someone owns architecture, not just agents
Plan for atrophy - What happens when manual coding capability decays?

For Organizations¶

Organizational workflow redesign - Processes built for manual coding won't work
Upskill broadly - The awareness gap is real, most orgs are behind
Quality control systems - How to maintain standards in slopacolypse?
Generalist + LLM strategy - Consider if T-shaped people now outperform deep specialists
2026 is the year - High energy as industry metabolizes new capability

Original tweet: https://x.com/karpathy/status/2015883857489522876
Claude Code: https://docs.anthropic.com/en/docs/agents/agent-tools
Anthropic Agent Skills: https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
Related: Nicolas Bustamante on Building AI Agents for Financial Services (our previous summary)
Related: Yuker on Claude Code Skills 2.0 Architecture (our previous summary)