perf

perf is the Linux kernel's built-in profiler. It uses hardware performance counters (built into every modern CPU) and software events to sample what your code is doing with almost zero overhead. From counting cache misses to generating flame graphs showing where CPU time is spent, perf is the gold standard for performance analysis.

perf stat — Hardware Counters

# Count hardware events for a command: perf stat ls /tmp # Performance counter stats for 'ls /tmp': # 0.543451 task-clock (msec) # 0.540 CPUs utilized # 0 context-switches # 0 per second # 0 cpu-migrations # 0 per second # 210 page-faults # 386K per second # 993,542 cycles # 1.829 GHz # 851,234 instructions # 0.86 insn per cycle # 180,432 branches # 332M per second # 5,231 branch-misses # 2.90% of all branches # Key metrics: # insn per cycle (IPC) > 1 = CPU efficient, < 1 = lots of stalls # branch-miss % > 5% = branch predictor struggling # cache-miss % > 5% = memory bound # Watch live hardware counters: perf stat -r 5 mybenchmark # run 5 times, show variance # Specific events: perf stat -e cache-misses,cache-references,LLC-misses myapp

perf record & report — CPU Profiling

# Profile at 99Hz for 30 seconds: perf record -g -F 99 -p PID -- sleep 30 # -g = capture call graphs (stack traces) # -F 99 = sample at 99 Hz (not 100 to avoid lockstep with timers) # Records to perf.data # Profile a specific command: perf record -g ./mybenchmark # View the report: perf report # Samples: 1234 of event 'cpu-clock', Event count: 12345678 # Overhead Command Shared Object Symbol # 45.23% nginx nginx do_epoll_wait # 23.11% nginx libc malloc # 15.44% nginx nginx ngx_hash_find # 8.12% [kernel] [kernel] copy_user_enhanced_fast_string # Navigate: arrow keys, Enter to expand, q to quit # Shows where CPU time is being spent

Flame Graphs — Visualizing Profiles

What is a flame graph and why is it useful? A flame graph shows CPU time by stack trace. Each row is a stack frame. Width = time spent. The widest "plateaus" at the top of stacks are where time is being consumed. You can see at a glance which functions are hot, even across deep call chains. Created by Brendan Gregg — the gold standard for CPU profiling visualization.
# Generate flame graph (requires FlameGraph scripts): git clone https://github.com/brendangregg/FlameGraph # Record profile: perf record -g -F 99 -p PID -- sleep 30 # Convert to flame graph: perf script | ./FlameGraph/stackcollapse-perf.pl | \ ./FlameGraph/flamegraph.pl > flame.svg # Open flame.svg in browser — interactive, click to zoom # For Java/.NET (requires different symbolization): # Java: -XX:+PreserveFramePointer # Python: py-spy (separate tool) # Node.js: perf + --perf-basic-prof flag

perf top — Live Profiling

# Live CPU profiler (like top but at function level): perf top # Samples: 12K of event 'cpu-clock', 4000 Hz, lost: 0/0 # Overhead Shared Object Symbol # 18.23% [kernel] native_queued_spin_lock_slowpath # 9.45% [kernel] _raw_spin_lock_irqsave # 8.12% libpthread-2.31.so pthread_cond_wait # 6.34% nginx ngx_process_events_and_timers # perf top of a specific process: perf top -p PID # Show kernel symbols (needs kernel debug symbols or kallsyms): perf top --kallsyms /proc/kallsyms

Memory and Cache Analysis

# Find functions causing the most cache misses: perf record -e cache-misses -g myapp perf report # Memory access profiling (Intel PT — needs hardware support): perf mem record myapp perf mem report # TLB miss analysis: perf stat -e dTLB-misses,dTLB-loads myapp # dTLB-misses: 1,234,567 (1.23% of loads) # dTLB-loads: 100,000,000 # Branch misprediction analysis: perf record -e branch-misses -g myapp

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.