Summary of "Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are"

Core Idea

People lie in surveys, interviews, and public self-presentation, but digital traces often reveal what they really think, want, fear, and do.
The book’s central claim is that anonymous, behavioral data—especially Google searches, but also clicks, views, logs, photos, and other digital exhaust—can expose hidden truths that traditional social science misses.
Stephens-Davidowitz argues that Big Data is most valuable not just because it is large, but because it is honest, granular, and experimentally useful when older methods are distorted by self-censorship and social desirability bias.

Surveys routinely overstate socially approved behavior like voting, giving, churchgoing, or sex, because people want to look better, protect privacy, or deceive even themselves.
Search data works as a digital truth serum because it is anonymous, solitary, and cheap to use without social penalty.
The author repeatedly contrasts what people say with what they search, click, and view: search logs expose depression, infidelity fears, homosexuality, racist curiosity, sexual insecurity, regret about children, and other taboo topics.
One recurring proof is external reality checks: for example, condom sales and other real-world measures show that reported sexual frequency and condom use are implausibly inflated in surveys.
Google search patterns reveal hidden social truths at scale, such as strong explicit racism, often in places and combinations that conventional wisdom would not predict.
His Obama-racism work suggested that explicit racism cost Obama measurable support; the finding was initially rejected as too surprising, but later aligned with Trump-support patterns in similar areas.
The book also shows that Big Data can uncover hidden suffering and unmet needs, such as child abuse during the Great Recession, abortion demand under restrictive laws, and the prevalence of closeted gay men searching for confirmation and relief.

Stephens-Davidowitz treats Big Data broadly: it includes new digital exhaust, digitized historical records, and inventive new ways of turning old signals into measurable evidence.
He argues there are four big powers here: new data types, honest data, zooming in, and causal experiments.
The strongest examples are not necessarily the largest datasets but the ones with a sharp signal, such as Google Flu Trends, Google Correlate, nighttime satellite lights, and image-based measures from smartphones and satellites.
He shows how nontraditional variables can predict outcomes surprisingly well: horse physiology beats pedigree in one racing system, Bordeaux weather predicts wine quality, and newspaper word choice reveals political slant.
The book’s “words as data” and “pictures as data” chapters emphasize that language, images, and even satellite lights can become quantitative evidence about politics, economics, culture, and emotion.
A major theme is that prediction often matters more than causal explanation in the short run; Jeff Seder’s horse model and Orley Ashenfelter’s wine formula worked even without full mechanistic understanding.

Big Data is not magic: the book warns about the curse of dimensionality, where many variables tested against too little data create fake discoveries.
Summers’s challenge about predicting the stock market becomes a lesson in humility: finance is already so heavily mined for edges that many apparent patterns are just luck or arbitrage.
The author warns against becoming obsessed with what is measurable, because proxies can distort behavior; the pedometer, standardized tests, and platform metrics all can be gamed.
Big Data should complement, not replace, small data and human judgment; surveys, observation, expert review, and simple checking still matter.
Corporate uses of digital traces can become creepy or unfair: lending, hiring, pricing, and casino retention can all exploit hidden proxies like wording, likes, or “pain points.”
The Prosper lending example shows how harmless phrases can become credit signals; Facebook-like data can be used to infer IQ or personality in ways that raise obvious fairness concerns.
Governments can use search data helpfully at the area level—targeting suicide prevention, hate-crime patrols, or abortion resources—but the author is skeptical of acting on individual searches alone because the privacy and error costs are too high.
A recurring distinction is between area-level prediction, which can guide public health or safety responses, and individual-level intervention, which is usually not yet justified by the data.

The internet is not just a place where people perform; it is a massive archive of confession, impulse, and behavior.
The book’s most distinctive lesson is that anonymous actions are often truer than declared beliefs, and that social science improves when it studies behavior at scale.
Big Data’s promise is real, but its best use is selective: it shines when old data are dishonest or weak, and it fails when researchers confuse statistical pattern with meaning or morality.
The ending’s broader point is that a better science of human behavior will come from many specific, testable findings—not grand theory alone—and from following what people actually do rather than what they say.