AI Ethics & Governance

Powerful AI systems can cause real harm if deployed carelessly — biased hiring algorithms, discriminatory loan models, opaque medical diagnoses. Building responsibly isn't optional; it's a technical and legal requirement.

The Core Problems

⚖️ Bias

Models learn patterns from historical data — including historical discrimination. A hiring model trained on past hires may perpetuate gender or racial bias.

🔒 Opacity

Deep learning models are black boxes. When a model denies a loan, the applicant has a right to know why — but the model can't explain itself.

🕵️ Privacy

Models trained on personal data can memorise and leak it. LLMs can recite training data verbatim, including PII and copyrighted content.

⚠️ Safety

Autonomous agents that can send emails, delete files, or make purchases can cause irreversible harm if they make mistakes.

Fairness Metrics — Measuring Bias

There is no single definition of "fair" — different metrics capture different notions of fairness, and they often conflict. You must choose which matters most for your use case.

Demographic ParityEqual positive prediction rates across groupsHiring, loan approval — equal outcomes

Equalized OddsEqual TPR and FPR across groupsMedical diagnosis, risk assessment

Predictive ParityEqual precision across groupsRecidivism scoring (COMPAS controversy)

Individual FairnessSimilar individuals get similar predictionsAny sensitive decision-making

Python · Measure Bias with Fairlearn

from fairlearn.metrics import MetricFrame, demographic_parity_difference
from sklearn.metrics import accuracy_score

# y_true: actual labels, y_pred: model predictions
# sensitive_features: protected attribute (e.g., gender)
metric_frame = MetricFrame(
    metrics=accuracy_score,
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=X_test['gender']
)

print(metric_frame.by_group)
# Output:
# gender
# male      0.91
# female    0.76   ← big gap — potential bias!

# Demographic parity difference (should be close to 0)
dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=X_test['gender'])
print(f"Demographic parity difference: {dpd:.3f}")

SHAP — Explain Any Model

SHAP (SHapley Additive exPlanations) is the gold standard for model explainability. It assigns each feature a contribution score for each individual prediction — grounded in game theory (Shapley values).

🔍 Interactive SHAP Waterfall Chart

Simulated SHAP values for a loan denial decision. Each bar shows how much a feature pushed the score up or down.

Python · SHAP Explanations

import shap
import xgboost as xgb

model = xgb.XGBClassifier().fit(X_train, y_train)

# Create SHAP explainer
explainer = shap.Explainer(model)
shap_values = explainer(X_test)

# Waterfall plot — explains ONE prediction
shap.plots.waterfall(shap_values[0])

# Summary plot — feature importance across all predictions
shap.plots.beeswarm(shap_values)

# For a specific prediction:
print("Base value (average prediction):", explainer.expected_value)
print("SHAP values for first instance:")
for feature, value in zip(X_test.columns, shap_values[0].values):
    direction = "↑" if value > 0 else "↓"
    print(f"  {feature}: {direction} {value:.3f}")

LIME — Local Explanations

LIME (Local Interpretable Model-agnostic Explanations) works differently: it creates a simple, interpretable model that approximates the complex model's behaviour for a single prediction.

Python · LIME Text Explanation

from lime.lime_text import LimeTextExplainer
from sklearn.pipeline import make_pipeline

# Any text classifier + vectorizer
pipeline = make_pipeline(vectorizer, classifier)
explainer = LimeTextExplainer(class_names=['Not Spam', 'Spam'])

# Explain why this email was classified as spam
explanation = explainer.explain_instance(
    "Congratulations! You won a free iPhone. Click here!",
    pipeline.predict_proba,
    num_features=6
)

# Shows which words pushed toward "Spam"
explanation.show_in_notebook()
# "Congratulations" → +0.42 (spam indicator)
# "free" → +0.38 (spam indicator)
# "won" → +0.21 (spam indicator)

EU AI Act — Know the Law

The EU AI Act (effective August 2024, enforcement from 2026) is the world's first comprehensive AI law. It classifies AI systems by risk level:

🚫 Unacceptable Risk — Banned

Social scoring by governments, real-time biometric surveillance in public spaces, AI exploiting vulnerabilities of vulnerable groups.

🔴 High Risk — Strict Requirements

AI in hiring, credit scoring, medical devices, law enforcement, critical infrastructure. Requires conformity assessment, human oversight, transparency.

🟡 Limited Risk — Transparency

Chatbots must disclose they are AI. Deepfakes must be labelled. Emotion recognition systems must inform users.

🟢 Minimal Risk — Free to Deploy

Spam filters, AI in video games, recommendation systems. No specific requirements beyond existing laws.

Responsible AI Framework

Before Training

Audit training data for representation gaps and historical bias
Define fairness metrics that matter for your use case
Conduct a risk assessment — what's the worst-case harm?

During Development

Evaluate on disaggregated subgroups, not just overall accuracy
Add explainability (SHAP/LIME) for any high-stakes decisions
Red-team the model — try to elicit harmful outputs

In Production

Implement human-in-the-loop for irreversible decisions
Monitor for performance degradation across demographic groups
Provide an appeal mechanism for automated decisions
Maintain model cards documenting intended use and limitations

Frequently Asked Questions

Can you make a model that's fair by every metric?

No — this is mathematically impossible in most cases. The "impossibility theorems" show that demographic parity, equalized odds, and predictive parity cannot all hold simultaneously when base rates differ across groups. You must choose which fairness definition matters most for your context and be transparent about the trade-offs you made.

What is a Model Card and do I need one?

A Model Card (introduced by Google) is a short document that accompanies a model, documenting: what it does, who trained it, what data it was trained on, performance across demographic groups, limitations, and intended/prohibited uses. You need one whenever your model will be used by others or make consequential decisions. It's both a transparency tool and a liability shield.

What to Learn Next

🏗️ AI System Design

Build safe, scalable production AI architectures.

⚡ ReAct & Tool Use

Implement responsible agent loops with guardrails.

🤝 Multi-Agent Systems

Govern complex agent pipelines safely.

🔄 MLOps & CI/CD

Automate governance checks into your deployment pipeline.

AI Ethics & Governance

The Core Problems

Fairness Metrics — Measuring Bias

SHAP — Explain Any Model

LIME — Local Explanations

EU AI Act — Know the Law

Responsible AI Framework

Frequently Asked Questions

What to Learn Next

Frequently Asked Questions

What will I learn here?

How should I use this page?

What should I read next?