Identity & Access Management (IAM)

IAM is the single most important security control in any cloud environment. Get it right and you limit the blast radius of every other mistake. Get it wrong — give too much access to too many identities — and a single compromised credential can cost your company millions.

What is IAM?

Identity and Access Management (IAM) is the system that controls who (or what) can do what to which resources in your cloud environment. "Who" can be a human, an application, a service, or an automated process. "What" can be reading an S3 file, launching a GPU instance, calling an AI API, or deleting a database.

The two questions IAM answers: Authentication — "Are you who you say you are?" (identity verification). Authorization — "Are you allowed to do what you're trying to do?" (access control). Both must be correct for a request to succeed.

Core IAM Concepts

Users — Human Identities

An IAM user represents a specific person. Each user has credentials (password, access keys) and permissions. Best practice: create individual users for each person — never share credentials. AWS root account (with its god-mode access) should be locked down immediately after account creation and only used for billing/account management.

Groups — Collections of Users

Instead of assigning permissions to each user individually, assign them to groups and add users to groups. "Developers" group gets access to EC2 and S3. "ML Engineers" group gets additional SageMaker and GPU instance access. "Admins" group gets broad access. Managing permissions at the group level scales — changing a policy on one group affects all its members.

Roles — Assumed Identities

An IAM role is a set of permissions that can be assumed by any authorized entity — a service, a user, or another account. Roles are the right way to grant permissions to services and applications (a Lambda function assumes a role, not a user). Unlike users, roles have no long-term credentials — they issue temporary security tokens. This is much more secure.

Policies — Permission Documents

An IAM policy is a JSON document that defines what actions are allowed (or denied) on which resources under which conditions. Policies are attached to users, groups, or roles. AWS has hundreds of managed policies (pre-built for common use cases) and you can write custom policies for fine-grained control.

The Principle of Least Privilege

The most fundamental IAM principle: give every identity only the minimum permissions required to do its job — nothing more. A Lambda function that reads from one S3 bucket should have permission to read from exactly that bucket, not all S3 buckets. Not S3 + EC2 + RDS. Just that one bucket.

Why It Matters

If a Lambda function with full S3 access gets compromised (a code injection vulnerability, for example), the attacker can exfiltrate all your S3 data. If it only has access to one bucket, they can only access that bucket. Least privilege doesn't prevent compromise — it limits damage when compromise happens. And it will happen eventually.

The Permission Creep Problem

In practice, permissions tend to grow over time (developers grant broad access for convenience and never clean it up) and shrink only when forced. Periodic access reviews, automated tools that identify unused permissions (AWS IAM Access Analyzer, GCP IAM Recommender), and "just-in-time" access systems that grant elevated privileges temporarily all combat permission creep.

For AI systems specifically: Your training job needs to read training data (read-only on one S3 prefix) and write checkpoints (write on another). It does NOT need to describe EC2 instances, create VPCs, or access production databases. Define the minimal IAM role, and use it. Many AI security incidents start with overprivileged training jobs or inference services.

Service Accounts & Workload Identity

Applications and services need identities too — to call APIs, read secrets, access storage. The wrong way: embed access keys in code or environment variables. The right way: use workload identity.

Instance Profiles / Service Account Bindings

Attach an IAM role directly to a VM, container, or serverless function. The application gets temporary credentials automatically, rotated behind the scenes, without any keys in code. An EC2 instance with an instance profile can call S3 APIs without any access key — the metadata service provides credentials on demand. This is the correct pattern for all production workloads.

Workload Identity Federation

For Kubernetes workloads, Workload Identity (GKE), IRSA (EKS IAM Roles for Service Accounts), and Pod Identity (AKS) bind Kubernetes service accounts to cloud IAM roles. A pod gets cloud credentials automatically based on its Kubernetes identity — no secrets to manage, no key rotation. This is the gold standard for cloud-native AI workloads running on Kubernetes.

Frequently Asked Questions

What is the difference between authentication and authorization?

Authentication verifies who you are — "I am user alice@company.com, here's my password and MFA code." The system confirms your identity. Authorization determines what you're allowed to do — "alice is allowed to read S3 bucket X but not delete from it." You can be authenticated (proven identity) but not authorized (no permission for the specific action). IAM handles both, but they're conceptually distinct. Authentication happens at login; authorization is checked on every API call.

What is MFA and why is it critical in the cloud?

Multi-Factor Authentication (MFA) requires a second form of verification beyond password — typically a time-based one-time code from an authenticator app (Google Authenticator, Authy). In the cloud, a compromised password alone lets attackers access everything that account can do. With MFA, they also need the physical device that generates the code — dramatically raising the cost of account takeover. MFA should be mandatory for all human accounts, especially admin accounts. AWS SCPs (Service Control Policies) can enforce MFA for specific actions organization-wide.

What are AWS Access Keys and when should I use them?

AWS Access Keys are long-lived credentials (Access Key ID + Secret Access Key) that authenticate programmatic API calls. They should be used only as a last resort when no better alternative exists. The better alternatives: EC2 instance profiles, ECS task roles, Lambda execution roles, IRSA for EKS, GitHub Actions OIDC federation. If you find yourself pasting access keys into environment variables or `.env` files in 2025, stop and ask whether workload identity is available for your use case. It almost certainly is.

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.