Skip to content

2025 Letter: We're So Wildly Early

Summary (English)

Zhengdong Wang, a DeepMind researcher, argues that AI progress is fundamentally driven by compute scaling—a compound growth trend that has persisted for 15 years and will continue despite repeated appearances of saturation, as human ingenuity consistently pushes it to the next stage. The implication: we remain "wildly early" in the AI revolution, and anyone who thinks they're late is missing the exponential nature of what's coming.

The Repeated Pattern: "It's So Over" → "We're Wildly Early"

The author's personal journey from 2015 to 2025 reveals a pattern: at every inflection point (high school science fair, undergrad ML class, GPT-3 release, joining DeepMind), he felt like he was "so late"—that the field had moved past him. Each time, within months, a new breakthrough made it clear he was "wildly early."

Key milestones: - 2015: AlexNet already exists, feeling late to computer vision - 2017: AlphaGo defeats world champion, feeling late to RL - 2020: GPT-3 released, feeling late to language models - 2022: ChatGPT goes viral, decides to join DeepMind - 2025: Realizes every previous "it's over" moment was actually "wildly early"

This isn't survivor bias or hindsight—it's a fundamental property of exponential curves. When you're on one, it always feels like you've missed the boat, because the recent past looks huge. But the future is always bigger.

The Bitter Lesson: Compute Beats Everything

The core technical insight is Richard Sutton's "Bitter Lesson" from the 1970s: general methods that leverage computation ultimately defeat human-engineered, domain-specific approaches.

This has played out repeatedly: - Go (2016): AlphaGo's general RL + compute defeated human intuition about Go strategy - Speech recognition (2010s): End-to-end neural models defeated decades of linguistic feature engineering - Computer vision (2012-2020): CNNs → Transformers defeated hand-crafted features - Coding (2021-2025): Large language models defeat rule-based program synthesis

The pattern: researchers invest years building domain expertise into their systems, then someone trains a bigger general model and it wins. It's "bitter" because it devalues specialized knowledge. It's a "lesson" because we keep forgetting it.

Author's personal proof: At DeepMind, he scaled an embodied simulation agent by 1000x compute. The result shocked him: "I thought there was a bug, because there's no way I could have operated it that smoothly even myself." The agent was doing things he couldn't articulate teaching it to do.

The Three Layers of Compute Scaling

Layer 1: Time Horizons (Doubling Every 7 Months) - AI is getting better at longer-horizon tasks: from image classification → multi-turn conversation → multi-day coding projects - This is scaling predictably: the "horizon" AI can handle doubles every 7 months - Implication: Tasks that seem "obviously require human judgment" become automatable on a schedule

Layer 2: Scaling Laws (4-5x Compute Growth Per Year, 15 Years) - Model performance scales smoothly with compute, data, and model size - This has held since ~2010 (AlexNet era) through 2025 - No signs of fundamental saturation—each apparent plateau gets solved (better data, better architectures, better post-training) - Implication: If you can afford 10x more compute, you can predict the capabilities you'll unlock

Layer 3: Moore's Law (Chip Doubling, 50 Years) - Underlying everything is the 50-year trend of exponential chip improvement - Even though transistor scaling has slowed, specialized AI chips (TPUs, H100s) continue the trend - The next frontier: 1GW datacenters (enough to power a small city) - Implication: The physical infrastructure for the next 10-100x compute is being built right now

Progress as a Meta-Trend

The author zooms out to a billion-year view: information organization has been growing exponentially since the origin of life.

Timeline: - 3.5 billion years ago: Single-cell life - 600 million years ago: Multicellular life → Cambrian explosion - 2 million years ago: Upright walking → tool use - 10,000 years ago: Agriculture → civilization - 500 years ago: Printing press → knowledge explosion - 250 years ago: Industrial revolution → energy abundance - 50 years ago: Information age → exponential compute

AI is the latest chapter in this multi-billion-year story. Viewing it as "just another tech hype cycle" misses that it's a phase transition in how information is processed and organized on Earth.

Key insight: Progress itself is a trend. The rate at which we solve problems has been accelerating for billions of years. Betting against continuation is betting against a meta-pattern that's older than our species.

First-Order Effects: What We Know Is Coming

Models will get more general: - Current frontier models are already superhuman at narrow tasks (coding, math, image generation) - Next: superhuman at integrative tasks (research, strategy, long-term planning) - Timeline: GPT-2 → GPT-3 → GPT-4 → GPT-5 happened in 4 years. We're 3 years into the ChatGPT era. The next 5-10 years will be wild.

Models will get more efficient: - 2023: ChatGPT ran on H100 clusters costing millions - 2025: Equivalent quality models run on consumer GPUs - Implication: What's research-grade today will be commodity tomorrow

There's so much headroom: - We haven't even built the 1GW datacenters yet (they're being planned now) - We're still discovering better architectures (transformers are only 7 years old) - We're still figuring out post-training (RLHF, reasoning time compute) - AGI timeline: 8-15 years is the author's estimate, given current trends

COVID parallel: The pandemic went from "Wuhan thing" → "everywhere" in weeks. AI feels the same: it's already transforming high school homework, software engineering, and math research. Next: molecular biology, physical product design, scientific discovery.

Second-Order Effects: What We Need to Think Through

First-order effects are about AI capabilities. Second-order effects are about societal adaptation.

These are much harder to predict: - How does democracy function when AI can generate perfect propaganda for each voter? - How does education work when AI can do all the assignments? - How does employment work when AI can do most knowledge work? - How does IP law work when AI trains on everything? - How does geopolitics work when AI is a strategic resource like oil?

Author's position: AI researchers shouldn't dominate these conversations. We need political scientists, economists, philosophers, artists, and domain experts to deeply engage. The current discourse is too AI-researcher-centric.

Peter Thiel's critique of Elon Musk: "If we'll have a billion humanoid robots in 10 years, you don't need to worry about budget deficits—growth will take care of it. But he's still worried about budget deficits." Thiel's point: these things aren't thought through. Either the AI revolution happens (and transforms everything), or it doesn't (and we have other problems). You can't half-believe it.

The Author's Decision: Joining the Post-AGI Team

At the end of the letter, Wang reveals he's joining DeepMind's "Post-AGI team."

What does that mean? It's the team thinking about what comes after AGI—not just "how do we build it" but "what do we do with it."

Why now? Because if AGI is 8-15 years away, post-AGI planning needs to start now. Not in 2035 when it's here.

Author's final line: "This is where I get on the train—we're so wildly early."

Why "Wildly Early" Matters

The phrase appears throughout the letter as a refrain. It's not just optimism—it's a claim about epistemic humility.

Two failure modes: 1. "It's so over" → Assuming the important progress happened in the past, giving up 2. "We've got this figured out" → Assuming the current paradigm will smoothly continue without disruption

"Wildly early" avoids both: - It acknowledges massive progress has happened (we're not pre-AlexNet anymore) - But insists the bulk of the transformation is still ahead (we're not post-AGI yet)

Practical implication: If you're choosing what to work on, bet on the discontinuity, not the continuity. The skills and assumptions that mattered in 2020 may not matter in 2030. The only constant is that compute scaling will continue.

Meta-Commentary: Thinking It Through

The letter opens and closes with Peter Thiel's challenge to think things through. If Musk's billion-robot prediction is true, budget deficits don't matter. If it's not true, then why are we acting like it is?

Wang's answer: we need to think through both the technology (first-order effects) and the societal implications (second-order effects). The current discourse does neither well.

First-order: Most people underestimate how much compute scaling will deliver. They extrapolate linearly from current models, missing the exponential.

Second-order: Most people haven't seriously modeled how society adapts. They either wave it away ("we'll figure it out") or catastrophize ("it's all doomed").

The synthesis: Compute scaling is real, predictable, and will continue. The societal effects are real, unpredictable, and need serious interdisciplinary work. Both can be true.


總結(繁體中文)

Zhengdong Wang(DeepMind 研究員)主張 AI 進步的根本驅動力是 compute scaling(運算能力的複合成長),這個趨勢已持續 15 年並將繼續,儘管每次看起來都像是要飽和,人類創新總是將它推向下一階段。結論:我們仍處於「wildly early」(極度早期)的階段,任何覺得自己來晚了的人都誤解了即將到來的指數性質。

重複的模式:「太遲了」→「極度早期」

作者從 2015 到 2025 的個人旅程揭示了一個模式:在每個轉折點(高中科展、大學 ML 課、GPT-3 發布、加入 DeepMind),他都覺得自己「太遲了」——這個領域已經超越他了。每次,在幾個月內,新的突破讓他意識到自己其實「極度早期」。

關鍵里程碑: - 2015: AlexNet 已經存在,覺得電腦視覺領域太遲了 - 2017: AlphaGo 擊敗世界冠軍,覺得強化學習太遲了 - 2020: GPT-3 發布,覺得語言模型太遲了 - 2022: ChatGPT 爆紅,決定加入 DeepMind - 2025: 意識到每個「太遲了」的時刻其實都是「極度早期」

這不是倖存者偏差或事後諸葛——這是指數曲線的基本屬性。當你在指數曲線上時,總是感覺錯過了機會,因為最近的過去看起來很巨大。但未來總是更大。

痛苦的教訓:運算能力打敗一切

核心技術洞察是 Richard Sutton 在 1970 年代提出的「痛苦的教訓」:利用運算能力的通用方法最終會打敗人類設計的、領域特定的方法。

這已經反覆上演: - 圍棋(2016): AlphaGo 的通用 RL + 運算打敗了人類對圍棋策略的直覺 - 語音識別(2010 年代): 端到端神經模型打敗了數十年的語言特徵工程 - 電腦視覺(2012-2020): CNNs → Transformers 打敗了手工特徵 - 編程(2021-2025): 大型語言模型打敗了基於規則的程式合成

模式:研究人員花費數年將領域專業知識建入系統,然後有人訓練一個更大的通用模型,它就贏了。這是「痛苦的」因為它貶低了專業知識。這是「教訓」因為我們不斷忘記它。

作者的親身證明: 在 DeepMind,他將一個具身模擬 agent 的運算能力提升 1000 倍。結果讓他震驚:「我以為有 bug,因為即使是我自己操作也不可能這麼流暢。」Agent 在做一些他無法明確教它做的事情。

運算能力 Scaling 的三層結構

第一層:時間視野(每 7 個月翻倍) - AI 在更長視野的任務上變得更好:從圖像分類 → 多輪對話 → 多天編程專案 - 這是可預測的 scaling:AI 能處理的「視野」每 7 個月翻倍 - 含義:看似「明顯需要人類判斷」的任務會按時間表變得可自動化

第二層:Scaling Laws(每年 4-5 倍運算成長,持續 15 年) - 模型性能隨運算能力、資料和模型大小平滑 scaling - 這從 ~2010(AlexNet 時代)持續到 2025 - 沒有根本飽和的跡象——每個表面的平台都被解決(更好的資料、更好的架構、更好的後訓練) - 含義:如果你能負擔 10 倍的運算,你可以預測將解鎖的能力

第三層:摩爾定律(晶片翻倍,50 年) - 支撐一切的是 50 年的指數晶片改進趨勢 - 即使電晶體 scaling 已放緩,專用 AI 晶片(TPUs、H100s)仍在延續這個趨勢 - 下一個前沿:1GW 資料中心(足以為一個小城市供電) - 含義:下一個 10-100 倍運算的實體基礎設施正在建造中

進步作為元趨勢

作者拉遠到十億年的視野:資訊組織自生命起源以來一直呈指數成長。

時間軸: - 35 億年前:單細胞生命 - 6 億年前:多細胞生命 → 寒武紀大爆發 - 200 萬年前:直立行走 → 工具使用 - 1 萬年前:農業 → 文明 - 500 年前:印刷術 → 知識爆炸 - 250 年前:工業革命 → 能源豐富 - 50 年前:資訊時代 → 指數運算

AI 是這個數十億年故事的最新章節。將它視為「只是另一個科技炒作週期」忽略了它是地球上資訊處理和組織方式的相變。

關鍵洞察: 進步本身就是一個趨勢。我們解決問題的速度已經加速了數十億年。押注反對延續就是押注反對一個比我們物種更古老的元模式。

一階效應:我們知道即將到來的事

模型將變得更通用: - 目前的前沿模型已經在狹窄任務上超越人類(編程、數學、圖像生成) - 下一步:在整合任務上超越人類(研究、策略、長期規劃) - 時間軸:GPT-2 → GPT-3 → GPT-4 → GPT-5 發生在 4 年內。我們處於 ChatGPT 時代的第 3 年。接下來的 5-10 年會很瘋狂。

模型將變得更高效: - 2023: ChatGPT 運行在價值數百萬的 H100 集群上 - 2025: 同等質量的模型運行在消費級 GPU 上 - 含義:今天的研究級產品明天就是商品

還有巨大的進步空間: - 我們甚至還沒建造 1GW 資料中心(現在正在規劃) - 我們仍在發現更好的架構(transformers 才 7 年) - 我們仍在弄清楚後訓練(RLHF、推理時運算) - AGI 時間軸: 作者估計 8-15 年,考慮當前趨勢

COVID 類比: 疫情從「武漢的事」→「無處不在」只用了幾週。AI 感覺一樣:它已經在改變高中作業、軟體工程和數學研究。接下來:分子生物學、實體產品設計、科學發現。

二階效應:我們需要深思的事

一階效應是關於 AI 能力。二階效應是關於社會適應。

這些更難預測: - 當 AI 能為每個選民生成完美宣傳時,民主如何運作? - 當 AI 能做所有作業時,教育如何運作? - 當 AI 能做大部分知識工作時,就業如何運作? - 當 AI 在所有東西上訓練時,智財法如何運作? - 當 AI 是像石油一樣的戰略資源時,地緣政治如何運作?

作者立場: AI 研究者不應主導這些對話。我們需要政治學家、經濟學家、哲學家、藝術家和領域專家深度參與。目前的討論太以 AI 研究者為中心。

Peter Thiel 對 Elon Musk 的批評: 「如果 10 年內會有 10 億個人形機器人,你不需要擔心預算赤字——成長會解決它。但他仍然擔心預算赤字。」Thiel 的觀點:這些事情沒有被深思熟慮。要麼 AI 革命發生(並改變一切),要麼不發生(我們有其他問題)。你不能半信半疑。

作者的決定:加入 Post-AGI 團隊

在信的結尾,Wang 透露他正在加入 DeepMind 的「Post-AGI 團隊」。

這意味著什麼? 這是思考 AGI 之後會發生什麼的團隊——不只是「我們如何建造它」,而是「我們用它做什麼」。

為什麼是現在? 因為如果 AGI 在 8-15 年後到來,post-AGI 規劃需要現在開始。不是在 2035 年它到來時才開始。

作者的最後一句話: 「這是我上車的地方——我們太早期了。」

為什麼「Wildly Early」很重要

這個短語貫穿整封信,作為一個副歌。這不只是樂觀主義——這是關於認識論謙遜的主張。

兩種失敗模式: 1. 「太遲了」 → 假設重要進展發生在過去,放棄 2. 「我們已經搞清楚了」 → 假設當前範式會平穩延續而不會中斷

「Wildly early」避免了兩者: - 它承認已經發生了巨大進展(我們不再是 AlexNet 之前了) - 但堅持大部分轉型仍在前方(我們還不是 post-AGI)

實際含義: 如果你在選擇做什麼,押注不連續性,而不是連續性。2020 年重要的技能和假設在 2030 年可能不重要。唯一不變的是運算能力 scaling 將繼續。

元評論:深思熟慮

這封信以 Peter Thiel 挑戰深思熟慮開頭和結尾。如果 Musk 的十億機器人預測是真的,預算赤字不重要。如果不是真的,那我們為什麼要這樣做?

Wang 的答案:我們需要深思技術(一階效應)和社會影響(二階效應)。當前的討論兩者都做得不好。

一階: 大多數人低估了運算能力 scaling 將帶來多少。他們從當前模型線性外推,錯過了指數性質。

二階: 大多數人沒有認真建模社會如何適應。他們要麼揮手說「我們會想出辦法」,要麼災難化「一切都注定失敗」。

綜合: 運算能力 scaling 是真實的、可預測的,並將繼續。社會效應是真實的、不可預測的,需要認真的跨學科工作。兩者都可以是真的。


Key Quotes

"This is where I get on the train—we're so wildly early."

"The Bitter Lesson: general methods that leverage computation ultimately defeat human-engineered, domain-specific approaches."

"When you're on an exponential curve, it always feels like you've missed the boat, because the recent past looks huge. But the future is always bigger."

"I thought there was a bug, because there's no way I could have operated it that smoothly even myself." (On scaling compute 1000x)

"Progress itself is a trend. The rate at which we solve problems has been accelerating for billions of years."

"If we'll have a billion humanoid robots in 10 years, you don't need to worry about budget deficits. But he's still worried about budget deficits." (Thiel on Musk)

"Skills are depreciating as fast as the GPUs that automate them."


  • Bitter Lesson - Richard Sutton's principle that general computation beats domain expertise
  • Scaling Laws - Predictable relationship between compute, model size, and performance
  • Time Horizons - Length of tasks AI can complete (doubling every 7 months)
  • Moore's Law - Exponential growth in chip capabilities over 50 years
  • AGI (Artificial General Intelligence) - AI systems that match or exceed human capability across domains
  • Post-AGI - Societal and technical challenges that emerge after AGI is achieved
  • Compute Scaling - The compound growth of computational resources driving AI progress
  • 1GW Datacenter - Next-generation infrastructure capable of powering AI at city-scale energy consumption