AWS AI & SageMaker
Amazon Web Services is the world's largest cloud provider, with the most mature and comprehensive set of AI/ML services. SageMaker is its flagship ML platform — handling everything from data preparation to model training, deployment, and monitoring.
AWS AI Service Landscape
End-to-end ML platform: notebooks, training, deployment, monitoring
Managed LLM API: Claude, Llama, Titan, Stable Diffusion. No GPU management.
Pre-built computer vision: object detection, face recognition, content moderation
NLP service: sentiment, entities, key phrases, language detection
Speech-to-text and text-to-speech services
Data preparation and SQL analytics on S3 data lakes
SageMaker: The ML Lifecycle
Web-based IDE. Jupyter notebooks with managed compute. No local setup.
Visual data preparation: clean, transform, and visualise S3 data.
Managed container-based training. Auto-provisions GPUs, handles checkpointing.
Track hyperparameters, metrics, and artefacts across training runs.
Version models, track lineage, approve/reject for deployment.
Deploy models as real-time or batch REST APIs. Auto-scaling built-in.
Detect data drift and model quality degradation in production.
Launch a Training Job
import sagemaker
from sagemaker.pytorch import PyTorch
role = sagemaker.get_execution_role()
sess = sagemaker.Session()
# Upload training data to S3
s3_data = sess.upload_data('data/', bucket='my-bucket', key_prefix='training')
# Define estimator
estimator = PyTorch(
entry_point='train.py', # Your training script
role=role,
framework_version='2.1',
py_version='py310',
instance_type='ml.p3.2xlarge', # V100 GPU, $3.82/hr
instance_count=1,
hyperparameters={
'epochs': 50,
'learning-rate': 0.001,
}
)
# Start training (provisions GPU, runs, saves model to S3)
estimator.fit({'training': s3_data}) Deploy a Model Endpoint
# Deploy trained model as REST API
predictor = estimator.deploy(
initial_instance_count=1,
instance_type='ml.c5.xlarge', # CPU for inference, cheaper
endpoint_name='my-model-prod'
)
# Make predictions
import json
result = predictor.predict({'inputs': [[5.1, 3.5, 1.4, 0.2]]})
print(result)
# Clean up (endpoints cost money even when idle!)
predictor.delete_endpoint() AWS Bedrock — Managed LLM API
Bedrock gives you API access to foundation models (Claude, Llama, Titan) without managing any infrastructure. Pay per token, scale to zero automatically.
import boto3
import json
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
response = bedrock.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 512,
"messages": [{"role": "user", "content": "Summarise this document..."}]
})
)
result = json.loads(response['body'].read())
print(result['content'][0]['text']) Cost Optimisation Tips
Save up to 90% on training jobs. SageMaker handles interruption and checkpointing automatically.
ml.p3.2xlarge (~$3.82/hr) for training; ml.c5.xlarge (~$0.19/hr) for CPU inference.
Endpoints bill 24/7. Use Lambda to auto-scale to zero or delete during off-hours.
For intermittent traffic — only charges per prediction. Zero cost when idle.
Frequently Asked Questions
What IAM permissions does SageMaker need?
SageMaker needs an execution role with: AmazonSageMakerFullAccess (for SageMaker APIs), S3 read/write access for your data bucket, ECR access if using custom containers, and CloudWatch Logs access. Never use AdministratorAccess in production — follow least privilege.
How do I version my models in SageMaker?
Use the SageMaker Model Registry. Register models after training with estimator.register(). Models go through an approval workflow (Pending → Approved/Rejected) before being deployed to production endpoints. Model lineage tracks which training job and data version produced each model.
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.