seccomp

Even with namespaces, cgroups, and capabilities limiting what a container can do, the container still shares the host kernel. Every system call goes directly to that kernel. seccomp (Secure Computing Mode) lets you define exactly which syscalls a process can make — and kill it if it tries anything else.

What Is seccomp?

Why is syscall filtering important for containers? The kernel has hundreds of system calls. Many are dangerous in a container context: kexec_load (replace the kernel), reboot, perf_event_open (kernel profiling that can leak data), ptrace (inspect other processes), create_module. A container application needs only a small subset — typically 40-60 syscalls. Block the rest and even a compromised container can't use dangerous kernel interfaces.
# Linux has ~350+ syscalls — most containers need only ~50 # Check syscall numbers: ausyscall --dump | head -20 # 0 read # 1 write # 2 open # 3 close # 4 stat # ... # seccomp modes: # SECCOMP_MODE_STRICT — only read, write, exit, sigreturn allowed # SECCOMP_MODE_FILTER — BPF filter program defines allowed syscalls

Docker's Default seccomp Profile

# Docker applies a seccomp profile by default # Blocks ~44 syscalls including: # kexec_load — replace running kernel # reboot — reboot the system # mount — mount filesystems (namespace escape) # swapon/swapoff — manage swap # create_module — load kernel modules # init_module — insert module into kernel # perf_event_open — performance monitoring # ptrace — trace/inject other processes # keyctl — kernel key management # Verify seccomp is active in a container: docker run alpine grep Seccomp /proc/self/status # Seccomp: 2 (2 = filter mode, 0 = disabled) # Disable seccomp (for debugging, never in production): docker run --security-opt seccomp=unconfined alpine # Apply custom profile: docker run --security-opt seccomp=/path/to/profile.json myimage

Writing a Custom seccomp Profile

# seccomp profile JSON format (subset): { "defaultAction": "SCMP_ACT_ERRNO", "syscalls": [ { "names": ["read", "write", "open", "close", "stat", "fstat", "lstat", "poll", "lseek", "mmap", "mprotect", "munmap", "brk", "rt_sigaction", "rt_sigprocmask", "rt_sigreturn", "ioctl", "pread64", "pwrite64", "readv", "writev", "access", "pipe", "select", "sched_yield", "mremap", "msync", "mincore", "madvise", "dup", "dup2", "pause", "nanosleep", "getitimer", "alarm", "setitimer", "getpid", "sendfile", "socket", "connect", "accept", "sendto", "recvfrom", "sendmsg", "recvmsg", "shutdown", "bind", "listen", "getsockname", "getpeername", "socketpair", "setsockopt", "getsockopt", "clone", "fork", "execve", "exit", "wait4", "kill", "uname", "fcntl", "flock", "fsync", "fdatasync", "getcwd", "chdir", "exit_group", "set_tid_address", "futex", "set_robust_list", "get_robust_list", "getdents64", "getrlimit", "getuid", "getgid", "geteuid", "getegid"], "action": "SCMP_ACT_ALLOW" } ] } # Actions: # SCMP_ACT_ALLOW — permit the syscall # SCMP_ACT_ERRNO — return EPERM (default deny) # SCMP_ACT_KILL — kill the process immediately # SCMP_ACT_TRACE — notify a tracer (for seccomp-bpf debugging)

Discovering What Syscalls Your App Needs

# Use strace to record all syscalls: strace -ff -e trace=all -o /tmp/syscalls myapp # Extract unique syscalls: grep -h "^[a-z]" /tmp/syscalls.* | awk -F'(' '{print $1}' | sort -u # Use oci-seccomp-bpf-hook (auto-generate profiles): # Runs app, records syscalls, generates minimal profile # Available in containers-common package # Check if a container is being seccomp-filtered: cat /proc/$(pgrep myapp)/status | grep Seccomp # Seccomp: 2 ← 2 = BPF filter active

seccomp vs Capabilities — Layered Defense

If capabilities already restrict what root can do, why also need seccomp? They defend different things. Capabilities control which privileged operations are permitted based on capability checks in the kernel. But many dangerous operations don't require capabilities — they're available to any process with the right syscalls. seccomp blocks at the syscall level, before any capability check happens. Together: capabilities reduce what root can do, seccomp reduces the attack surface regardless of privilege level.

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.