Overview
-
ImageNet's Origin Story: Fei-Fei Li pioneered the insight that AI needed massive labeled datasets, not just better algorithms. Her ImageNet project (2006-2012) with 15 million curated images catalyzed the deep learning revolution when combined with neural networks and GPUs.
-
AI Winter Reality: Less than 10 years ago (2015-2016), tech companies avoided the term "AI" because it was considered a dirty word. By 2017, everyone was calling themselves an AI company—a complete reversal in Silicon Valley's perception.
-
World Models vs LLMs: Li argues spatial intelligence is the missing piece for robotics and embodied AI. Her company World Labs just launched Marble, creating infinitely explorable 3D worlds from text prompts—cutting virtual production time by 40x.
-
Bitter Lesson Limits: While "more data + compute = better AI" worked for language models, robotics faces unique challenges: harder data collection, misalignment between training data (videos) and desired outputs (3D actions), and physical system complexities.
-
Current AI Limitations: Today's models can't count chairs in a video, derive Newton's laws from celestial data, or hold emotionally intelligent conversations—tasks humans handle easily on 20 watts of brain power.
Takeaways
Fei-Fei Li sparked the AI revolution with ImageNet but emphasizes how much remains unsolved. Her fascinating revelation: tech companies avoided the term "AI" just nine years ago, considering it career suicide. Now her World Labs focuses on spatial intelligence—the missing piece between language models and true embodied AI.
As a scientist, I take science very seriously and I enter the field because I was inspired by this audacious question of can machines think and do things in the way that humans can do. For me, that's always the northstar of AI.
The bitter lesson that "scale conquers all" may hit limits when physics enters the equation.