Kubernetes Orchestration
Running one Docker container on your laptop is easy. Running 500 containers across 50 servers — making sure they're healthy, restarting crashed ones, scaling up under load, and routing traffic to the right ones — is a different problem entirely. That's the problem Kubernetes solves.
What is Kubernetes?
Kubernetes (often abbreviated as K8s — the 8 letters between "K" and "s") is an open-source system for automating deployment, scaling, and management of containerized applications. It was originally designed by Google engineers who had been running containers at massive scale internally for years, and open-sourced in 2014.
The Problem It Solves
Imagine your AI inference API suddenly goes from 100 requests/second to 10,000 — maybe your app went viral. Without Kubernetes: you'd need to manually spin up more servers, install Docker, start more containers, and update your load balancer. With Kubernetes: it detects the load spike via metrics and automatically starts more container replicas within seconds. When load drops, it scales back down to save cost.
Core Kubernetes Concepts
Pods — The Smallest Unit
A pod is one or more containers that always run together on the same node (server) and share the same network and storage. Think of a pod as a logical host for your containers. Usually one container per pod, but sometimes you'll see sidecar containers (like a logging agent) paired alongside the main container.
Nodes — The Workers
A node is a worker machine (usually a VM) in your Kubernetes cluster. Pods run on nodes. A typical production cluster has 5–100+ nodes. For AI workloads, nodes are often GPU instances (each node is an EC2 p4d with 8 A100 GPUs, for example).
Deployments — Managing Pod Replicas
A Deployment tells Kubernetes: "I want 5 replicas of this pod running at all times." Kubernetes makes it so — and if a pod crashes or a node fails, it automatically starts a replacement. Deployments also handle rolling updates: deploy new versions without downtime by replacing pods one at a time.
Services — Stable Network Endpoints
Pods are temporary — they get created and destroyed. A Service provides a stable IP address and DNS name that always points to the current live pods, no matter which specific pods are running. Your load balancer points to a Service, not individual pods.
Namespaces — Logical Isolation
Namespaces partition a single Kubernetes cluster into logically isolated segments. You might have a "production" namespace and a "staging" namespace in the same cluster, with different resource limits and access controls for each.
Autoscaling in Kubernetes
Kubernetes has three levels of autoscaling:
HPA
Horizontal Pod Autoscaler — adds/removes pod replicas based on CPU, memory, or custom metrics (requests/sec, queue depth).
VPA
Vertical Pod Autoscaler — adjusts CPU/memory limits for existing pods based on observed usage. Good for long-running training jobs.
Cluster Autoscaler
Adds/removes nodes when pods can't be scheduled (not enough capacity) or nodes are underutilized. Integrates with cloud provider APIs to spin up VMs.
Kubernetes for AI & ML Workloads
Serving ML Models at Scale
Kubernetes is the standard platform for deploying ML inference APIs. You package your model server (vLLM, Triton, TorchServe) in a Docker image, define a Kubernetes Deployment with GPU resource requests, and let Kubernetes handle scheduling, health checks, and autoscaling.
ML Training on Kubernetes
The Kubeflow project extends Kubernetes with ML-specific primitives — distributed training operators (for PyTorch, TensorFlow), pipeline management, and experiment tracking. Many ML platforms (Vertex AI Pipelines, AWS SageMaker) use Kubernetes under the hood.
GPU Scheduling
Kubernetes schedules GPU resources using NVIDIA's device plugin. You request GPUs in your pod spec: nvidia.com/gpu: 4. Kubernetes places that pod on a node with 4 available GPUs and prevents other pods from using those GPUs simultaneously.
Managed Kubernetes Services
Running Kubernetes yourself is complex. The "control plane" (scheduler, API server, etcd database) is notoriously tricky to operate. All major cloud providers offer managed Kubernetes — they run the control plane for you:
Frequently Asked Questions
Is Kubernetes too complex for a small team?
It can be. Kubernetes has significant operational complexity. For small teams or simple applications, managed alternatives like Google Cloud Run, AWS App Runner, or Heroku may be better. Start with Kubernetes when you have multiple services, need GPU scheduling, or require fine-grained control over how your containers are deployed. If you use a managed service like GKE Autopilot, much of the complexity is abstracted away.
What is Helm in the Kubernetes ecosystem?
Helm is the package manager for Kubernetes. Instead of writing raw Kubernetes YAML files for every service, Helm packages them into reusable "charts" with configurable values. Deploying Prometheus monitoring to your cluster? helm install prometheus. Deploying an NVIDIA GPU operator? helm install gpu-operator. Helm dramatically simplifies deploying complex applications to Kubernetes.
What's the difference between Docker Compose and Kubernetes?
Docker Compose is for running multiple containers on a single machine, typically for local development. Kubernetes is for running containers across a cluster of machines in production, with autoscaling, self-healing, and advanced networking. Think of Compose as the local development tool and Kubernetes as the production system.
How much does Kubernetes cost?
Kubernetes itself is open source and free. You pay for the underlying infrastructure (VM nodes). With managed services: EKS charges $0.10/hour for the control plane plus EC2 node costs; GKE Autopilot charges per pod resource rather than per node; AKS has a free control plane but charges for nodes. For AI workloads, GPU node costs dominate — a single H100 GPU node can run $30–60/hour.
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.