Performance Tuning Best Practices

Overview

The Importance of Thinking About Performance: Dismissing performance concerns until later often creates flat profiles where slowness is everywhere, making optimization difficult. Choose faster alternatives when they don't significantly impact readability.
Estimation: Use back-of-the-envelope calculations with known operation costs (L1 cache ~0.5ns, main memory ~50ns, SSD read 4KB ~20µs, disk seek ~5ms) to compare design alternatives before implementation.
Measurement: Profiling is essential before making improvements. Use tools like pprof and perf, write microbenchmarks, and gather hardware performance counter data to identify hotspots.
- When profiles are flat, pursue many small optimizations, look for structural changes, or reduce allocations.
API Considerations: Design APIs that enable performance—bulk operations reduce boundary crossings, view types avoid copying, pre-allocated arguments prevent redundant work, and thread-compatible types avoid unnecessary synchronization overhead.
Algorithmic Improvements: The most critical opportunities come from better algorithms—replacing O(N²) with O(N log N), using hash tables for O(1) lookups instead of O(log N) sorted structures, and avoiding exponential behavior.
Better Memory Representation: Use compact data structures, careful field ordering, indices instead of pointers, batched/inlined storage, and bit vectors instead of sets to reduce cache footprint and memory usage.
Reduce Allocations: Avoid unnecessary allocations, pre-size containers with reserve/resize, prefer moving over copying, and reuse temporary objects by hoisting declarations outside loops.

Takeaways

Jeff Dean and Sanjay Ghemawat created this guide drawing from decades of Google performance work. The key insight is that ignoring performance during development creates diffuse slowness that's harder to fix than building performance awareness into code from the start.

We should not pass up our opportunities in that critical 3%.