MLOps & CI/CD for AI
A model that can't be reliably retrained, tested, and deployed is a liability. MLOps applies software engineering best practices to machine learning — version control, automated testing, and continuous deployment for models and data.
What is MLOps?
MLOps = DevOps for Machine Learning. The goal is to automate the journey from experiment → production, and keep models healthy once deployed.
Record every training run: hyperparameters, metrics, artefacts. Reproducibility by default.
Track which data trained which model. Roll back when things go wrong.
Automatically train, test, and deploy when code or data changes.
Detect when model performance degrades in production. Trigger retraining.
MLflow — Experiment Tracking
MLflow is the open-source standard for ML experiment tracking. Free, self-hostable, integrates with every major cloud.
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
# Start local MLflow server: `mlflow ui`
mlflow.set_experiment("fraud-detection-v2")
with mlflow.start_run(run_name="rf-100-trees"):
# Log hyperparameters
mlflow.log_params({
"n_estimators": 100,
"max_depth": 10,
"class_weight": "balanced"
})
model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train)
# Log metrics
f1 = f1_score(y_test, model.predict(X_test))
mlflow.log_metric("f1_score", f1)
# Save model artefact
mlflow.sklearn.log_model(model, "model")
print(f"Run ID: {mlflow.active_run().info.run_id}") DVC — Data Version Control
DVC (Data Version Control) extends Git to handle large datasets and model files. It stores data in remote storage (S3, GCS) while tracking versions in Git.
# Initialise DVC in your git repo
git init && dvc init
# Track a large dataset (stored in S3, version in git)
dvc add data/training_data.csv
git add data/training_data.csv.dvc .gitignore
git commit -m "Add training data v1"
# Configure remote storage
dvc remote add -d myremote s3://my-bucket/dvc
dvc push # Upload data to S3
# On a different machine — get the exact same data
git pull
dvc pull # Download from S3 CI/CD for ML with GitHub Actions
name: ML Training Pipeline
on:
push:
paths: ['src/**', 'data/*.dvc'] # Trigger on code or data change
jobs:
train-and-evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with: { python-version: '3.11' }
- name: Install dependencies
run: pip install -r requirements.txt
- name: Pull data from DVC remote
run: dvc pull
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET }}
- name: Train model
run: python src/train.py --config configs/prod.yaml
- name: Evaluate model
run: |
python src/evaluate.py
python src/check_metrics.py --min-f1 0.85 # Fail if below threshold
- name: Deploy if passed
if: success()
run: python src/deploy.py --environment production Model Monitoring — Keep Models Healthy
Once deployed, models degrade over time as the real world changes. Monitoring detects this early.
Input feature distributions shift from training data. E.g., user demographics change after a product pivot.
Tool: Evidently AI, WhyLabsThe relationship between features and labels changes. E.g., "expensive" means $500 in 2020 but $800 in 2024.
Tool: Detect via performance degradationNull values, out-of-range inputs, schema changes from upstream systems.
Tool: Great Expectations, dbt testsTrack accuracy, latency, error rates. Alert when metrics cross thresholds.
Tool: Prometheus + Grafanafrom evidently.report import Report
from evidently.metric_preset import DataDriftPreset
# Compare training data distribution to recent production data
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=train_df, current_data=production_last_7_days)
# Save HTML report
report.save_html("drift_report.html")
# Check programmatically
result = report.as_dict()
if result['metrics'][0]['result']['dataset_drift']:
trigger_retraining_pipeline() MLOps Maturity Levels
Frequently Asked Questions
What is a Feature Store and do I need one?
A Feature Store (Feast, Tecton, Vertex Feature Store) centralises feature computation and serving — ensuring that features used in training are identical to those computed at inference time (avoiding training-serving skew). You need one when: multiple teams share features, features require complex computation, or serving latency is critical.
How often should I retrain my model?
Depends on how fast your data changes. Financial models may need daily retraining. Image classifiers for stable domains can go months. Set up automated drift monitoring and trigger retraining when drift exceeds a threshold — that's better than arbitrary schedules.
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.