Linux Capabilities

Traditionally, Linux had two modes: root (can do anything) and non-root (very restricted). Capabilities break that binary into ~40 granular permissions. A process can have just the capabilities it needs — listen on port 80, set network interfaces, load kernel modules — without being fully root. This is how containers can do some privileged things without being dangerous.

What Are Capabilities?

Why not just run containers as root? A process with full root can do anything: install kernel modules, read any file, bypass all security controls, kill any process. Even if the container has a chroot, a full-root process can escape. Capabilities let you grant specific privileges while denying the rest. A web server needs CAP_NET_BIND_SERVICE (bind to port 80) but not CAP_SYS_MODULE (load kernel modules).

Key Linux Capabilities

CapabilityAllowsRisk
CAP_NET_BIND_SERVICEBind to ports below 1024Low
CAP_NET_ADMINNetwork config: interfaces, routes, iptablesHigh
CAP_SYS_ADMINEverything — mount, sethostname, ptraceVery High (near-root)
CAP_SYS_MODULELoad/unload kernel modulesCritical
CAP_SYS_PTRACETrace any process — read its memoryHigh
CAP_DAC_OVERRIDEBypass file permission checksHigh
CAP_SETUIDChange UID (become any user)High
CAP_CHOWNChange file ownershipMedium
CAP_KILLSend signals to any processMedium

Checking and Managing Capabilities

# Check current process capabilities cat /proc/self/status | grep Cap # CapInh: 0000000000000000 # CapPrm: 000001ffffffffff (Permitted set) # CapEff: 000001ffffffffff (Effective — what's actually active) # CapBnd: 000001ffffffffff (Bounding set — max possible) # Decode the hex bitmask: capsh --decode=000001ffffffffff # View capabilities of a running process: getpcaps 1234 # Give a binary a specific capability (without setuid root): setcap cap_net_bind_service=ep /usr/bin/nginx getcap /usr/bin/nginx # /usr/bin/nginx = cap_net_bind_service+ep

Docker and Capabilities

# Docker drops many capabilities by default for security # Default Docker capabilities (subset of full root): # CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, # CAP_NET_BIND_SERVICE, CAP_NET_RAW, CAP_SETGID, CAP_SETUID, # CAP_SETPCAP, CAP_MKNOD, CAP_AUDIT_WRITE, CAP_KILL, CAP_SYS_CHROOT # Dropped by default (too dangerous): # CAP_SYS_ADMIN, CAP_SYS_MODULE, CAP_SYS_PTRACE, CAP_NET_ADMIN ... # Drop additional capabilities from container: docker run --cap-drop NET_RAW --cap-drop CHOWN nginx # Add a specific capability: docker run --cap-add NET_ADMIN myimage # Completely drop all capabilities (most secure): docker run --cap-drop ALL --cap-add NET_BIND_SERVICE myapp # Run privileged (all capabilities — avoid in production): docker run --privileged myimage # DO NOT do this without reason

Ambient Capabilities — Non-root Processes Keeping Caps

What happens to capabilities when a process drops root via setuid? Normally when a process switches from root to a regular user (via setuid), it loses its effective capabilities. Ambient capabilities (Linux 4.3+) solve this: they persist across setuid() calls and are inherited by child processes even when not root. This lets container init processes set up capabilities once, then exec as a non-root user while retaining needed caps.
# Set ambient capabilities for a container: # In Dockerfile / container spec — the process keeps caps after dropping root # systemd CapabilityBoundingSet/AmbientCapabilities directives: # /lib/systemd/system/myservice.service # [Service] # User=myuser # AmbientCapabilities=CAP_NET_BIND_SERVICE # CapabilityBoundingSet=CAP_NET_BIND_SERVICE # ExecStart=/usr/bin/myapp

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.