Cloud Networking Basics

Every application in the cloud needs to communicate — with users, with databases, with other services. Cloud networking is the invisible plumbing that makes it all work. Understanding it isn't optional: a misconfigured network is one of the most common causes of security breaches and outages in the cloud.

What is a VPC?

A Virtual Private Cloud (VPC) is your own private, isolated network within a cloud provider's infrastructure. Think of it as your slice of the cloud's networking fabric — you decide the IP address ranges, how traffic flows, and what can communicate with what.

Why VPCs Exist

Without a VPC, all your cloud resources would be on a shared public network — anyone could potentially reach them. A VPC gives you a private bubble where you control who gets in and how traffic moves around.

IP Address Ranges (CIDR Blocks)

When you create a VPC, you assign it an IP address range using CIDR notation — for example, 10.0.0.0/16, which gives you 65,536 possible IP addresses. Your VMs and services get assigned IPs from this pool. Private IP ranges (10.x.x.x, 172.16.x.x, 192.168.x.x) can't be routed over the public internet — they're only visible inside your VPC.

Analogy: A VPC is like the floorplan of your office building. You decide which rooms connect to which, which doors lock, and who has keycard access. The public internet is the street outside — you control what (if anything) connects to it.

Subnets: Public vs. Private

A subnet is a subdivision of your VPC — a smaller IP address range within your larger VPC range. Subnets come in two flavors:

Public Subnets

Resources in a public subnet can communicate directly with the internet (via an Internet Gateway). Web servers, load balancers, and bastion hosts typically live here. Anything in a public subnet can potentially be reached from the internet — so only put things that need to be publicly accessible here.

Private Subnets

Resources in a private subnet have no direct path to or from the internet. Databases, application servers, ML training jobs, and AI model stores should live here. If they need to reach the internet (to download packages, for example), they do so through a NAT Gateway, which lets traffic out but blocks unsolicited traffic in.

The golden rule: Put databases and sensitive services in private subnets. Only load balancers and intentionally public endpoints go in public subnets. This single pattern prevents most network-level breaches.

Security Groups & Network ACLs

These are your firewall rules in the cloud.

Security Groups

A security group is a stateful virtual firewall attached to individual resources (VMs, databases, load balancers). You define rules like "allow inbound port 443 from anywhere" or "allow inbound port 5432 from my app servers only." Security groups are the most commonly used access control in AWS. They're attached to instances, not subnets.

Network ACLs (NACLs)

Network Access Control Lists work at the subnet level and are stateless (you need to explicitly allow both inbound AND outbound). They're an additional layer of defense, evaluated before traffic reaches security groups. Most configurations rely primarily on security groups and only add NACLs for compliance or defense-in-depth.

Load Balancers

A load balancer distributes incoming traffic across multiple backend servers, so no single server gets overwhelmed. It also provides a single stable entry point (IP address or DNS name) for your service, even when the servers behind it change.

🔀

Application Load Balancer (L7)

Routes based on HTTP content — URL paths, headers, hostnames. Perfect for web apps and microservices.

Network Load Balancer (L4)

Routes based on TCP/UDP. Ultra-low latency, handles millions of requests per second. Used for real-time AI inference APIs.

🌐

Global Load Balancer

Routes traffic to the nearest healthy region globally. GCP's Global LB is excellent for AI APIs served worldwide.

DNS in the Cloud

DNS (Domain Name System) translates human-readable names (api.yourapp.com) into IP addresses that computers use. Cloud providers offer managed DNS services — AWS Route 53, GCP Cloud DNS, Azure DNS — that integrate tightly with their load balancers and auto-scale groups.

Health Checks & Failover

Cloud DNS services can route traffic based on health checks. If your primary region goes down, DNS failover automatically points traffic to a secondary region — no manual intervention needed. This is a key piece of any highly-available, multi-region architecture.

Frequently Asked Questions

What is a NAT Gateway and why do I need one?

A NAT (Network Address Translation) Gateway lets resources in a private subnet reach the internet (for things like downloading OS updates or calling external APIs) without having a public IP address — and without allowing unsolicited inbound connections from the internet. It acts as an intermediary: traffic goes out through the NAT Gateway, which substitutes its own public IP. Responses come back through the NAT Gateway and are forwarded to the private resource. Think of it as a one-way door.

What is VPC peering?

VPC peering lets two VPCs communicate privately as if they were on the same network. Useful when you have separate VPCs for development and production, or when a partner needs to access your services privately. Traffic stays on the provider's backbone — it never traverses the public internet. For AI systems that span multiple teams or accounts, VPC peering (or AWS Transit Gateway for many-to-many) is common.

How does cloud networking relate to AI infrastructure?

AI training clusters need extremely high-bandwidth, low-latency networking between GPU nodes. This is typically handled by specialized networks (InfiniBand, RoCE) within a cluster, but the cluster itself is connected to storage (for datasets) and the internet (for APIs) via VPC networking. Getting VPC configuration right — security groups, routing, data paths — is essential for both security and performance in AI workloads.

What is a bastion host?

A bastion host (also called a jump server) is a hardened server in a public subnet that you SSH into first, and then use to SSH into private servers. Instead of exposing your database or app servers to the internet, you only expose one small, heavily audited machine. Modern alternatives include AWS Systems Manager Session Manager, which eliminates the need for a bastion by tunneling through the AWS control plane — no exposed SSH at all.

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.