Karpathy's 2025 LLM Review

Overview

Reinforcement Learning from Verifiable Rewards (RLVR): Emerged as a new major training stage where LLMs learn reasoning strategies by optimizing against verifiable rewards in math/code puzzles. Unlike thin SFT/RLHF stages, RLVR allows much longer optimization runs, shifting compute away from pretraining toward RL.
Ghosts vs. Animals / Jagged Intelligence: LLMs are "summoned ghosts" not "evolved animals"—optimized for text imitation and puzzle rewards rather than survival. They display jagged capabilities: genius polymath in some areas, confused grade-schooler in others, easily tricked by jailbreaks.
Cursor and the LLM App Layer: Cursor revealed a new category of "LLM apps" that bundle context engineering, orchestrate multiple LLM calls into complex DAGs, provide vertical-specific GUIs, and offer autonomy sliders for human oversight.
Claude Code / AI on Your Computer: First convincing LLM agent demonstration—runs locally with access to your environment, data, and context. Anthropic got the form factor right with a minimal CLI, creating a new paradigm where AI "lives" on your machine.
Vibe Coding: AI crossed a threshold enabling programming via English without thinking about code. Empowers non-programmers and lets professionals write ephemeral, single-use apps they'd never otherwise build.
Nano Banana / LLM GUI: Google's model hints at the future LLM interface—moving beyond chat to images, infographics, and visual formats humans prefer, combining text generation, image generation, and world knowledge.

Takeaways

Andrej Karpathy wrote this year-in-review of LLM progress. The key insight is that LLMs are a fundamentally different kind of intelligence—jagged, ghost-like entities that spike in verifiable domains while remaining surprisingly brittle elsewhere.

LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected.