MLOps & CI/CD for AI

A model that can't be reliably retrained, tested, and deployed is a liability. MLOps applies software engineering best practices to machine learning — version control, automated testing, and continuous deployment for models and data.

What is MLOps?

MLOps = DevOps for Machine Learning. The goal is to automate the journey from experiment → production, and keep models healthy once deployed.

📊Experiment Tracking

Record every training run: hyperparameters, metrics, artefacts. Reproducibility by default.

📦Data & Model Versioning

Track which data trained which model. Roll back when things go wrong.

🔄CI/CD Pipelines

Automatically train, test, and deploy when code or data changes.

📡Model Monitoring

Detect when model performance degrades in production. Trigger retraining.

MLflow — Experiment Tracking

MLflow is the open-source standard for ML experiment tracking. Free, self-hostable, integrates with every major cloud.

Python · MLflow Experiment Tracking

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score

# Start local MLflow server: `mlflow ui`
mlflow.set_experiment("fraud-detection-v2")

with mlflow.start_run(run_name="rf-100-trees"):
    # Log hyperparameters
    mlflow.log_params({
        "n_estimators": 100,
        "max_depth": 10,
        "class_weight": "balanced"
    })

    model = RandomForestClassifier(n_estimators=100, max_depth=10)
    model.fit(X_train, y_train)

    # Log metrics
    f1 = f1_score(y_test, model.predict(X_test))
    mlflow.log_metric("f1_score", f1)

    # Save model artefact
    mlflow.sklearn.log_model(model, "model")
    print(f"Run ID: {mlflow.active_run().info.run_id}")

DVC — Data Version Control

DVC (Data Version Control) extends Git to handle large datasets and model files. It stores data in remote storage (S3, GCS) while tracking versions in Git.

Terminal · DVC Workflow

# Initialise DVC in your git repo
git init && dvc init

# Track a large dataset (stored in S3, version in git)
dvc add data/training_data.csv
git add data/training_data.csv.dvc .gitignore
git commit -m "Add training data v1"

# Configure remote storage
dvc remote add -d myremote s3://my-bucket/dvc
dvc push  # Upload data to S3

# On a different machine — get the exact same data
git pull
dvc pull  # Download from S3

CI/CD for ML with GitHub Actions

YAML · .github/workflows/ml_pipeline.yml

name: ML Training Pipeline

on:
  push:
    paths: ['src/**', 'data/*.dvc']  # Trigger on code or data change

jobs:
  train-and-evaluate:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4

    - name: Set up Python
      uses: actions/setup-python@v5
      with: { python-version: '3.11' }

    - name: Install dependencies
      run: pip install -r requirements.txt

    - name: Pull data from DVC remote
      run: dvc pull
      env:
        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_KEY }}
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET }}

    - name: Train model
      run: python src/train.py --config configs/prod.yaml

    - name: Evaluate model
      run: |
        python src/evaluate.py
        python src/check_metrics.py --min-f1 0.85  # Fail if below threshold

    - name: Deploy if passed
      if: success()
      run: python src/deploy.py --environment production

Model Monitoring — Keep Models Healthy

Once deployed, models degrade over time as the real world changes. Monitoring detects this early.

Data Drift

Input feature distributions shift from training data. E.g., user demographics change after a product pivot.

Tool: Evidently AI, WhyLabs

Concept Drift

The relationship between features and labels changes. E.g., "expensive" means $500 in 2020 but $800 in 2024.

Tool: Detect via performance degradation

Data Quality

Null values, out-of-range inputs, schema changes from upstream systems.

Tool: Great Expectations, dbt tests

Performance Monitoring

Track accuracy, latency, error rates. Alert when metrics cross thresholds.

Tool: Prometheus + Grafana

Python · Detect Data Drift with Evidently

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

# Compare training data distribution to recent production data
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=train_df, current_data=production_last_7_days)

# Save HTML report
report.save_html("drift_report.html")

# Check programmatically
result = report.as_dict()
if result['metrics'][0]['result']['dataset_drift']:
    trigger_retraining_pipeline()

MLOps Maturity Levels

Level 0ManualJupyter notebooks, manual deployment, no tracking

Level 1ML PipelineMLflow tracking, DVC, scripted training, basic CI

Level 2CI/CD for MLAutomated training pipeline, model registry, deployment gates

Level 3Full MLOpsAuto-retraining on drift, A/B testing, feature store, monitoring

Frequently Asked Questions

What is a Feature Store and do I need one?

A Feature Store (Feast, Tecton, Vertex Feature Store) centralises feature computation and serving — ensuring that features used in training are identical to those computed at inference time (avoiding training-serving skew). You need one when: multiple teams share features, features require complex computation, or serving latency is critical.

How often should I retrain my model?

Depends on how fast your data changes. Financial models may need daily retraining. Image classifiers for stable domains can go months. Set up automated drift monitoring and trigger retraining when drift exceeds a threshold — that's better than arbitrary schedules.

MLOps & CI/CD for AI

What is MLOps?

MLflow — Experiment Tracking

DVC — Data Version Control

CI/CD for ML with GitHub Actions

Model Monitoring — Keep Models Healthy

MLOps Maturity Levels

Frequently Asked Questions

Frequently Asked Questions

What will I learn here?

How should I use this page?

What should I read next?