Serverless Computing

"Serverless" doesn't mean there are no servers — it means you don't think about them. You write a function, upload it, and the cloud runs it for you. You pay only for the milliseconds it actually executes. No idle servers, no capacity planning, no patching. Just code.

What is Serverless Computing?

In serverless computing, you write small, self-contained functions — not full applications. You define when each function should run (in response to an HTTP request, a file upload, a message in a queue, a timer). The cloud provider allocates compute resources on demand, runs your function, and shuts everything down when it's done.

AWS Lambda — The Originator

AWS launched Lambda in 2014, igniting the serverless revolution. You write a function in Python, Node.js, Java, or Go. You configure a trigger (API Gateway for HTTP, S3 for file events, SNS for messages). Lambda runs your function in an isolated container — 128MB to 10GB RAM, up to 15 minutes execution time — and you pay per 100ms of execution. Zero requests = zero cost.

Equivalents: Cloud Functions & Azure Functions

Google Cloud Functions and Azure Functions work the same way — write a function, configure a trigger, deploy. They differ slightly in language support, maximum execution time, and pricing. Cloud Run (Google) and Azure Container Apps extend the serverless model to full containers.

Cold Starts: The Serverless Trade-off

The main downside of serverless is the cold start — when your function hasn't been called recently, the provider needs to initialize a new container before running it. This adds 100ms–2s of latency to the first request after a period of inactivity.

Why Cold Starts Happen

The provider doesn't keep idle containers running (that's the whole point — no idle costs). When a new request arrives and no warm container is available, it downloads your code, creates a container, initializes your runtime, runs your startup code, and then handles the request. All of that takes time.

Mitigations

Provisioned concurrency (Lambda) keeps a specified number of instances warm at all times — you pay for this even when idle, but eliminate cold starts. Keep-alive patterns — scheduled pings to keep functions warm — are a scrappier alternative. For AI inference APIs where latency matters, provisioned concurrency or moving to containers is common.

When to Use Serverless

📡

APIs & Webhooks

Lightweight REST APIs, webhook handlers, and backend-for-frontend patterns. Perfect when traffic is variable or unpredictable.

📁

Event Processing

React to file uploads (resize an image, process a dataset), database changes, or queue messages. Fire-and-forget processing at any scale.

⏰

Scheduled Jobs

Cron jobs, nightly reports, data pipelines that run on a schedule. Pay only for the execution time, not for a VM running 24/7.

🤖

AI Post-Processing

Trigger serverless functions to post-process AI outputs — transcribe audio after upload, generate thumbnails, run classifiers on incoming data.

When NOT to use serverless: Long-running processes (training ML models), latency-sensitive applications where cold starts are unacceptable, and workloads that maintain persistent state in memory between requests. These are better served by containers or VMs.

Serverless for AI Inference

A growing pattern in Cloud 3.0 is serverless AI inference — running model inference in serverless containers that scale to zero when idle. Services like AWS Lambda with container images (up to 10GB), Google Cloud Run, and specialized platforms like Modal or Replicate make this possible.

The Pattern

Package your model (a few hundred MB to a few GB) in a container. Deploy to a serverless container service. At zero requests, you pay nothing. When a request arrives, a container spins up in 1–5 seconds, loads the model, runs inference, and returns the result. For low-traffic AI APIs (a demo, an internal tool), this is dramatically cheaper than a dedicated GPU server running 24/7.

Frequently Asked Questions

How is serverless different from PaaS?

PaaS (like Heroku or App Engine) runs your application continuously — even when there's no traffic, a server is running and you're paying for it. Serverless runs your functions only when triggered — zero traffic means zero cost. PaaS is better for always-on applications with complex state; serverless is better for event-driven, bursty, or infrequent workloads.

Can I use databases with serverless functions?

Yes, but carefully. Traditional databases use persistent TCP connections, and opening a new connection per Lambda invocation can overwhelm the database. Solutions include: connection pooling proxies (AWS RDS Proxy, PgBouncer), serverless databases designed for this pattern (PlanetScale, Neon, Turso), or DynamoDB (AWS's NoSQL DB with an HTTP API that works perfectly with Lambda).

What is "serverless" at the infrastructure layer vs. the application layer?

At the infrastructure layer, serverless means no servers to manage — Lambda, Cloud Functions, Cloud Run. At the application layer, "serverless" is sometimes used loosely for any architecture that scales to zero and has no fixed capacity (including serverless databases and serverless data warehouses like BigQuery or Athena). The unifying theme is: pay for what you use, no capacity planning required.

Is serverless less secure than running my own servers?

Not inherently. The serverless security model means the provider handles OS and runtime security (an advantage). Your responsibilities are: securing your function code (no injection vulnerabilities), properly configuring IAM permissions (least privilege), and validating all input. The main new risk is excessive permissions — a Lambda with an overly broad IAM role can do a lot of damage if exploited. Scope permissions tightly.

Serverless Computing

What is Serverless Computing?

AWS Lambda — The Originator

Equivalents: Cloud Functions & Azure Functions

Cold Starts: The Serverless Trade-off

Why Cold Starts Happen

Mitigations

When to Use Serverless

APIs & Webhooks

Event Processing

Scheduled Jobs

AI Post-Processing

Serverless for AI Inference

The Pattern

Frequently Asked Questions

Frequently Asked Questions

What will I learn here?

How should I use this page?

What should I read next?