person Tute World Team schedule Updated April 28, 2026

NUMA

In multi-socket servers, not all RAM is equally fast. Each CPU socket has RAM physically attached to it (local memory) and can also access RAM attached to other sockets (remote memory) — but remote access is 50-100% slower due to the interconnect. NUMA (Non-Uniform Memory Access) is the architecture; Linux's NUMA scheduler tries to keep processes on CPUs near their memory.

NUMA Topology

Dual-socket server (2 NUMA nodes):

Socket 0 (NUMA node 0):           Socket 1 (NUMA node 1):
  CPU cores 0-23                     CPU cores 24-47
  Local RAM: 128GB                   Local RAM: 128GB
         |                                  |
         +------------ QPI/UPI ------------+
              (inter-socket interconnect)

Access latency:
  Node 0 CPU reading Node 0 RAM: ~70ns  (local)
  Node 0 CPU reading Node 1 RAM: ~140ns (remote, 2x slower!)

# Real 4-socket server: 4 NUMA nodes
# Remote access through 2 hops: 3x slower than local

Discovering NUMA Topology

# numactl (install: apt install numactl)
numactl --hardware
# available: 2 nodes (0-1)
# node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
# node 0 size: 128000 MB
# node 0 free: 98765 MB
# node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
# node 1 size: 128000 MB
# node 1 free: 76543 MB
# node distances:
# node   0   1
#   0:  10  21    ← local=10 (relative), remote=21 (2.1x slower)
#   1:  21  10

# lscpu shows NUMA info:
lscpu | grep NUMA
# NUMA node(s):   2
# NUMA node0 CPU(s): 0-23,48-71
# NUMA node1 CPU(s): 24-47,72-95

# Kernel NUMA stats:
cat /proc/zoneinfo | grep -A5 "Node 0"

Binding Processes to NUMA Nodes

# Run a process on specific NUMA node (CPUs + memory local to that node):
numactl --cpunodebind=0 --membind=0 myapp
# All CPUs from node 0, all memory from node 0

# Preferred node (use local, fall back to remote if needed):
numactl --preferred=0 myapp

# Interleave memory across nodes (for large working sets):
numactl --interleave=all myapp
# Alternates allocation between nodes — averages out latency

# Bind to specific CPUs:
numactl --physcpubind=0,1,2,3 myapp
taskset -c 0-3 myapp   # alternative using taskset

# Check which NUMA node a process is using:
numastat -p myapp
# Per-node process memory usage
# Node   0     1
# Huge   0     0
# Heap   1024  0    ← all heap on node 0 (good if running on node 0)

Linux NUMA Memory Policies

# Default policy: local allocation
# Memory allocated on the node where the CPU is currently running
# Works well when process stays on one node

# NUMA allocation policies (set per-thread via syscall or numactl):
# MPOL_DEFAULT       = local node allocation (default)
# MPOL_BIND          = must allocate on specified nodes
# MPOL_PREFERRED     = prefer specified node, fall back if full
# MPOL_INTERLEAVE    = round-robin across nodes

# Check kernel NUMA behavior:
cat /proc/sys/vm/zone_reclaim_mode
# 0 = try other nodes before reclaiming (default, usually best)
# 1 = reclaim local memory before going remote (NUMA strict)

# NUMA balancing (automatic migration):
cat /proc/sys/kernel/numa_balancing
# 1 = enabled (kernel migrates pages to the node accessing them most)

# numastat — see remote vs local allocations:
numastat
# node              node0    node1
# numa_hit        9876543  8765432   (allocations that went to preferred node)
# numa_miss        123456   234567   (allocations that went to wrong node)
# local_node      9000000  8000000   (allocated on running CPU's node)
# other_node       876543   765432   (allocated on remote node)

NUMA and Databases

Why do databases like PostgreSQL have NUMA configuration options? A database process with a large shared buffer pool (say, 64GB) that spans NUMA nodes will have roughly half its memory accesses going to remote nodes — halving the effective memory bandwidth. PostgreSQL's huge_pages setting and Linux's NUMA interleave policy help. Many DBAs pin PostgreSQL to a single NUMA node or use numactl --interleave=all for large SMP systems.

# Run PostgreSQL with NUMA interleaving:
numactl --interleave=all pg_ctlcluster 14 main start

# Or in systemd service override:
# [Service]
# ExecStart=
# ExecStart=numactl --interleave=all /usr/lib/postgresql/14/bin/postgres ...

# Check PostgreSQL NUMA memory distribution:
numastat -p postgres

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.