Cloud Storage Types Explained

Not all storage is the same. Storing a photo album, a database, and a 10TB training dataset each need a different kind of storage. Picking the wrong type wastes money, hurts performance, and can break applications. Here's how to get it right.

Object Storage — The Cloud's Filing Cabinet

Object storage stores data as discrete "objects" — files with metadata and a unique identifier. You access objects via HTTP APIs (PUT, GET, DELETE). There's no directory hierarchy in the traditional sense — just a flat namespace, though you can simulate folders with prefixes in the key names.

AWS S3, Google Cloud Storage, Azure Blob Storage

These are the canonical examples. You create a "bucket," upload files to it, and retrieve them via URL. They're infinitely scalable, replicated across multiple AZs by default, and priced per GB stored plus per request. S3 alone stores exabytes of data for companies like Netflix and Airbnb.

Object Storage for AI & ML

This is the dominant storage for AI workloads. Training datasets (image files, text corpora, audio clips), model checkpoints during training, and final model artifacts all live in object storage. A typical LLM training dataset might be 10–100TB on S3. S3's "request rate" supports millions of parallel read operations — essential for feeding hungry GPU clusters.

Block Storage — The Cloud's Hard Drive

Block storage presents raw storage volumes to a VM — like attaching a hard drive. The OS formats it, creates a file system, and uses it exactly like a local disk. It's fast, low-latency, and designed for transactional workloads.

AWS EBS, Google Persistent Disk, Azure Managed Disks

When you launch an EC2 instance, it comes with a root EBS volume (your OS disk). You can attach additional EBS volumes for data. EBS volumes are sized, persistent, and attached to one instance at a time. They range from cheap HDD volumes (gp2) to ultra-high-performance NVMe SSDs (io2) for databases requiring extreme IOPS.

Block Storage for AI

GPU training jobs often use local NVMe SSDs (instance storage) for the highest throughput during training — data is staged here from S3. For databases behind AI applications (user data, feature stores), EBS or equivalent is standard. Persistent block storage also backs Kubernetes PersistentVolumes for stateful workloads.

File Storage — The Cloud's Network Share

File storage is a shared file system that multiple VMs can mount simultaneously. It behaves like a traditional NAS (Network Attached Storage) — you see it as a directory, you can read and write files to it from multiple machines at the same time.

AWS EFS, Google Filestore, Azure Files

Ideal when multiple servers need shared read-write access to the same files. AWS EFS auto-scales with your data — no need to pre-provision capacity. Azure Files is great for lifting on-premises SMB file shares to the cloud without changing how applications access them.

File Storage for AI

Distributed training jobs that run across multiple nodes and need to read the same checkpoints or share configuration files use file storage. High-performance file systems like Lustre (AWS FSx for Lustre) are specifically optimized for AI training — streaming at 100s of GB/second from storage to GPU clusters.

Choosing the Right Storage Type

Type	Access Method	Latency	Concurrency	Best For
Object (S3)	HTTP API	Low-medium	Unlimited	Datasets, models, backups
Block (EBS)	Mounted disk	Very low (sub-ms)	Single VM	OS disk, databases
File (EFS)	NFS mount	Low	Many VMs	Shared workloads
Local NVMe	Mounted disk	Ultra-low	Single VM	Scratch space for training

Frequently Asked Questions

What is a data lake and how does it relate to cloud storage?

A data lake is a central repository that stores raw data in its native format at any scale. In practice, it's usually S3 (or Google Cloud Storage / Azure Data Lake Storage) with a structured folder hierarchy and governance layer on top. For AI, the data lake is where you collect and store all raw training data — logs, user interactions, sensor data — before it's processed into curated datasets for model training.

How much does cloud storage cost?

S3 Standard costs ~$0.023/GB/month, plus ~$0.0004 per 1,000 GET requests. EBS gp3 costs ~$0.08/GB/month. EFS costs ~$0.30/GB/month. For a 100TB training dataset on S3, you'd pay ~$2,300/month in storage alone (plus egress costs when downloading to GPUs). This is why dataset storage and retrieval efficiency matters enormously in AI infrastructure economics.

What is S3 Glacier and when would I use it?

S3 Glacier is an archival storage tier — extremely cheap ($0.004/GB/month) but with retrieval times ranging from minutes to hours. Use it for data you almost never access but must retain (regulatory archives, old model checkpoints, audit logs). You can set lifecycle policies to automatically move S3 objects to Glacier after 90 or 180 days, significantly reducing storage costs for older data.

Cloud Storage Types Explained

Object Storage — The Cloud's Filing Cabinet

AWS S3, Google Cloud Storage, Azure Blob Storage

Object Storage for AI & ML

Block Storage — The Cloud's Hard Drive

AWS EBS, Google Persistent Disk, Azure Managed Disks

Block Storage for AI

File Storage — The Cloud's Network Share

AWS EFS, Google Filestore, Azure Files

File Storage for AI

Choosing the Right Storage Type

Frequently Asked Questions

Frequently Asked Questions

What will I learn here?

How should I use this page?

What should I read next?