Serverless Computing
"Serverless" doesn't mean there are no servers — it means you don't think about them. You write a function, upload it, and the cloud runs it for you. You pay only for the milliseconds it actually executes. No idle servers, no capacity planning, no patching. Just code.
What is Serverless Computing?
In serverless computing, you write small, self-contained functions — not full applications. You define when each function should run (in response to an HTTP request, a file upload, a message in a queue, a timer). The cloud provider allocates compute resources on demand, runs your function, and shuts everything down when it's done.
AWS Lambda — The Originator
AWS launched Lambda in 2014, igniting the serverless revolution. You write a function in Python, Node.js, Java, or Go. You configure a trigger (API Gateway for HTTP, S3 for file events, SNS for messages). Lambda runs your function in an isolated container — 128MB to 10GB RAM, up to 15 minutes execution time — and you pay per 100ms of execution. Zero requests = zero cost.
Equivalents: Cloud Functions & Azure Functions
Google Cloud Functions and Azure Functions work the same way — write a function, configure a trigger, deploy. They differ slightly in language support, maximum execution time, and pricing. Cloud Run (Google) and Azure Container Apps extend the serverless model to full containers.
Cold Starts: The Serverless Trade-off
The main downside of serverless is the cold start — when your function hasn't been called recently, the provider needs to initialize a new container before running it. This adds 100ms–2s of latency to the first request after a period of inactivity.
Why Cold Starts Happen
The provider doesn't keep idle containers running (that's the whole point — no idle costs). When a new request arrives and no warm container is available, it downloads your code, creates a container, initializes your runtime, runs your startup code, and then handles the request. All of that takes time.
Mitigations
Provisioned concurrency (Lambda) keeps a specified number of instances warm at all times — you pay for this even when idle, but eliminate cold starts. Keep-alive patterns — scheduled pings to keep functions warm — are a scrappier alternative. For AI inference APIs where latency matters, provisioned concurrency or moving to containers is common.
When to Use Serverless
APIs & Webhooks
Lightweight REST APIs, webhook handlers, and backend-for-frontend patterns. Perfect when traffic is variable or unpredictable.
Event Processing
React to file uploads (resize an image, process a dataset), database changes, or queue messages. Fire-and-forget processing at any scale.
Scheduled Jobs
Cron jobs, nightly reports, data pipelines that run on a schedule. Pay only for the execution time, not for a VM running 24/7.
AI Post-Processing
Trigger serverless functions to post-process AI outputs — transcribe audio after upload, generate thumbnails, run classifiers on incoming data.
Serverless for AI Inference
A growing pattern in Cloud 3.0 is serverless AI inference — running model inference in serverless containers that scale to zero when idle. Services like AWS Lambda with container images (up to 10GB), Google Cloud Run, and specialized platforms like Modal or Replicate make this possible.
The Pattern
Package your model (a few hundred MB to a few GB) in a container. Deploy to a serverless container service. At zero requests, you pay nothing. When a request arrives, a container spins up in 1–5 seconds, loads the model, runs inference, and returns the result. For low-traffic AI APIs (a demo, an internal tool), this is dramatically cheaper than a dedicated GPU server running 24/7.
Frequently Asked Questions
How is serverless different from PaaS?
PaaS (like Heroku or App Engine) runs your application continuously — even when there's no traffic, a server is running and you're paying for it. Serverless runs your functions only when triggered — zero traffic means zero cost. PaaS is better for always-on applications with complex state; serverless is better for event-driven, bursty, or infrequent workloads.
Can I use databases with serverless functions?
Yes, but carefully. Traditional databases use persistent TCP connections, and opening a new connection per Lambda invocation can overwhelm the database. Solutions include: connection pooling proxies (AWS RDS Proxy, PgBouncer), serverless databases designed for this pattern (PlanetScale, Neon, Turso), or DynamoDB (AWS's NoSQL DB with an HTTP API that works perfectly with Lambda).
What is "serverless" at the infrastructure layer vs. the application layer?
At the infrastructure layer, serverless means no servers to manage — Lambda, Cloud Functions, Cloud Run. At the application layer, "serverless" is sometimes used loosely for any architecture that scales to zero and has no fixed capacity (including serverless databases and serverless data warehouses like BigQuery or Athena). The unifying theme is: pay for what you use, no capacity planning required.
Is serverless less secure than running my own servers?
Not inherently. The serverless security model means the provider handles OS and runtime security (an advantage). Your responsibilities are: securing your function code (no injection vulnerabilities), properly configuring IAM permissions (least privilege), and validating all input. The main new risk is excessive permissions — a Lambda with an overly broad IAM role can do a lot of damage if exploited. Scope permissions tightly.
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.