/proc & /sys Observability

Every monitoring tool — Prometheus node_exporter, Datadog, Grafana — reads from /proc and /sys. Learning to read these directly means you can diagnose problems on any Linux system with no tools installed, and understand exactly what your monitoring stack is actually measuring.

/proc/stat — CPU Utilization

cat /proc/stat # cpu user nice system idle iowait irq softirq steal guest guest_nice # cpu 123456 789 23456 987654 3456 0 234 0 0 0 # cpu0 61728 400 11728 493827 1728 0 117 0 0 0 # cpu1 61728 389 11728 493827 1728 0 117 0 0 0 # Values are in jiffies (typically 1/100th second = 10ms each) # To get CPU percentage: # Read /proc/stat twice, 1 second apart # delta_idle = idle2 - idle1 # delta_total = (user+nice+system+idle+iowait+irq+softirq)2 - ...1 # CPU% = 100 * (1 - delta_idle / delta_total) # Also in /proc/stat: # ctxt NNN = total context switches since boot # btime NNN = boot time (Unix timestamp) # processes N = total forks since boot # procs_running N = currently running (not sleeping) # procs_blocked N = in uninterruptible sleep (D state)

/proc/meminfo — Memory State

cat /proc/meminfo # MemTotal: 32768000 kB = physical RAM # MemFree: 1234568 kB = completely unused # MemAvailable: 18432000 kB = available for new processes (USE THIS) # Buffers: 512000 kB = filesystem metadata cache # Cached: 14000000 kB = file page cache # Dirty: 123456 kB = modified pages not yet on disk # Writeback: 1234 kB = currently being written to disk # AnonPages: 5000000 kB = anonymous (malloc, stack) # Mapped: 1500000 kB = mmapped files # Shmem: 200000 kB = shared memory # Slab: 987654 kB = kernel slab allocator # SReclaimable: 654321 kB = slab pages that can be freed # SUnreclaim: 333333 kB = slab pages that cannot be freed # CommitLimit: 33554432 kB = max committable memory (overcommit) # Committed_AS: 20000000 kB = total committed (promised) memory # Key formula: # Real free = MemAvailable (not MemFree) # Used by processes = AnonPages + Mapped # Kernel used = Slab + PageTables + KernelStack

/proc/diskstats — Disk I/O Stats

cat /proc/diskstats # 8 0 sda 12345 678 98765 43210 5678 910 45678 21098 0 30000 64308 # ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ # | | | | | | | | | io_ticks # | | | | | | | | time_io_ms # | | | | | | | writes_ms # | | | | | | write_sectors # | | | | | writes # | | | | reads_ms # | | | read_sectors # | | reads_merged # | reads # minor # Key derived metrics: # I/O utilization = io_ticks / total_time (100% = fully saturated) # Read latency = reads_ms / reads (ms per read) # Write latency = writes_ms / writes (ms per write) # These are what iostat -x shows # iostat uses /proc/diskstats: iostat -x 1 # Device r/s w/s rkB/s wkB/s await util% # sda 50 200 6400 51200 8.5 67.3

/proc/schedstat and /proc/PID/schedstat

# Per-CPU scheduler statistics: cat /proc/schedstat # version 15 # cpu0 0 0 0 0 0 0 12345678 9876543 123456 # ^ ^ ^ # | | nr_running_total # | wait_start # run_delay_ns (total time waited for CPU) # Per-process scheduler stats: cat /proc/1234/schedstat # 12345678 9876543 100 # ^ ^ ^ # | | timeslices run # | wait_time_ns (time waiting for CPU) # run_time_ns (time on CPU) # High wait_time relative to run_time = CPU contention # Process wants to run but can't get CPU time # See scheduler info for a process: cat /proc/1234/sched # nginx (1234, #threads: 4) # se.exec_start : 12345678.901234 # se.vruntime : 4567890.123456 # se.sum_exec_runtime : 12345.678901 (ms on CPU) # nr_voluntary_switches : 98765 (voluntarily yielded CPU) # nr_involuntary_switches : 1234 (preempted — ran out of time slice)

/sys/block — Real-Time I/O Monitoring

# Device queue depth and saturation: cat /sys/block/sda/queue/nr_requests # queue depth cat /sys/block/sda/stat # same as diskstats # Instantaneous queue depth (pending I/Os): cat /sys/block/sda/inflight # reads writes # 0 3 ← 3 writes currently in flight # CPU frequency scaling (thermal throttling check): for cpu in /sys/devices/system/cpu/cpu*/; do echo -n "$(basename $cpu): " cat ${cpu}cpufreq/scaling_cur_freq 2>/dev/null || echo "no cpufreq" done # cpu0: 2400000 (2.4GHz — may be lower than max if throttling) # cpu1: 3200000 (3.2GHz — at max) # Check for CPU throttling events: cat /sys/devices/system/cpu/cpu0/thermal_throttle/core_throttle_count

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.