Linux Networking Stack
Every network packet — from a web request to a ping — goes on a journey through the Linux kernel. It starts as electrical signals on a wire and ends as bytes in your application's buffer. Understanding this journey explains why networking performance works the way it does and how to tune it.
Receiving a Packet — The Inbound Journey
Inbound packet path:
1. NIC receives frame from network
2. NIC uses DMA to copy frame into ring buffer (kernel memory)
3. NIC raises hardware interrupt (IRQ)
4. CPU interrupt handler runs, acknowledges IRQ
5. Handler schedules NAPI poll (deferred softirq — avoids per-packet IRQ overhead)
6. softirq runs: NET_RX_SOFTIRQ
- NIC driver poll() drains ring buffer
- creates sk_buff (socket buffer) for each frame
7. sk_buff passed up protocol stack:
- L2: Ethernet demux (strip header, check dest MAC)
- Netfilter PREROUTING hook (iptables NAT, conntrack)
- L3: IP routing decision (local delivery or forward)
- Netfilter INPUT hook (iptables filter)
- L4: TCP/UDP demux — find the right socket
8. sk_buff placed in socket's receive queue
9. Application calls recv() — data copied from socket queue to user space
Sending a Packet — The Outbound Journey
Outbound packet path:
1. Application calls send() / write()
2. Data copied from user space into sk_buff in kernel
3. TCP layer: segment data, add TCP header, manage window/retransmit
4. IP layer: add IP header, fragment if MTU exceeded
5. Netfilter OUTPUT hook (iptables filter/mangle)
6. Routing: look up next hop and outbound interface
7. Netfilter POSTROUTING hook (iptables SNAT/masquerade)
8. Traffic control (tc): qdisc queuing disciplines
9. NIC driver: place sk_buff in TX ring buffer
10. NIC: DMA reads from ring buffer, transmits frame
11. NIC raises TX completion IRQ, kernel frees sk_buff
sk_buff — The Central Data Structure
What is an sk_buff and why does it matter for performance?
sk_buff (socket buffer) is the C struct that represents a network packet as it moves through the kernel. It contains pointers to packet data, metadata (protocol, timestamps, marks), and headers. Headers are added by prepending — no data copying. When a packet moves between layers, only head/tail pointers change. This zero-copy design is why the Linux network stack is fast.
# Check socket buffer usage:
cat /proc/net/sockstat
# sockets: used 512
# TCP: inuse 48 orphan 0 tw 12 alloc 50 mem 35
# UDP: inuse 12 mem 4
# Default buffer sizes:
cat /proc/sys/net/core/rmem_default # 212992 (208KB receive)
cat /proc/sys/net/core/wmem_default # 212992 (208KB send)
cat /proc/sys/net/core/rmem_max # 134217728 (128MB max)
# Tune for high-throughput servers:
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728"
sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728"
NAPI — Why Linux Doesn't Use One Interrupt Per Packet
Why does the kernel use polling instead of interrupts for network packets?
At high packet rates (millions per second), one interrupt per packet overwhelms the CPU — a problem called "receive livelock." NAPI (New API) switches to polling mode under load: the first packet triggers an interrupt, which disables further IRQs and schedules a poll. The poll drains many packets at once before re-enabling interrupts. Under low load, interrupts resume. This hybrid approach handles both low latency and high throughput.
# See NIC interrupts:
cat /proc/interrupts | grep eth
# 42: 1234567 PCI-MSI eth0-TxRx-0
# 43: 987654 PCI-MSI eth0-TxRx-1
# Check NIC ring buffer sizes (NAPI polling drains these):
ethtool -g eth0
# Ring parameters for eth0:
# RX: 1024
# TX: 1024
# Increase ring buffer to avoid drops at high traffic:
ethtool -G eth0 rx 4096 tx 4096
# Check for RX drops (NAPI budget exhaustion):
ethtool -S eth0 | grep -i drop
Observing the Network Stack
# Show all connections (modern tool: ss, replaces netstat)
ss -tunap
# -t=TCP, -u=UDP, -n=numeric, -a=all, -p=process
# Show listen ports:
ss -tlnp
# State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
# LISTEN 0 128 0.0.0.0:22 0.0.0.0:* sshd
# Show TCP stats:
ss -s
# Total: 512
# TCP: 50 (estab 48, closed 2, orphaned 0, timewait 0)
# Full kernel network stats:
netstat -s | head -20 # or: nstat
# Watch packet drop counters:
watch -n1 "cat /proc/net/dev | grep eth0"
# eth0: RX bytes packets errs drop ... TX bytes packets errs drop ...
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.