AI Ethics & Governance

Powerful AI systems can cause real harm if deployed carelessly — biased hiring algorithms, discriminatory loan models, opaque medical diagnoses. Building responsibly isn't optional; it's a technical and legal requirement.

📖 Covers: Bias & Fairness · SHAP · LIME · EU AI Act · Responsible AI Frameworks · Governance

The Core Problems

⚖️ Bias

Models learn patterns from historical data — including historical discrimination. A hiring model trained on past hires may perpetuate gender or racial bias.

🔒 Opacity

Deep learning models are black boxes. When a model denies a loan, the applicant has a right to know why — but the model can't explain itself.

🕵️ Privacy

Models trained on personal data can memorise and leak it. LLMs can recite training data verbatim, including PII and copyrighted content.

⚠️ Safety

Autonomous agents that can send emails, delete files, or make purchases can cause irreversible harm if they make mistakes.

Fairness Metrics — Measuring Bias

There is no single definition of "fair" — different metrics capture different notions of fairness, and they often conflict. You must choose which matters most for your use case.

MetricDefinitionWhen to Use
Demographic ParityEqual positive prediction rates across groupsHiring, loan approval — equal outcomes
Equalized OddsEqual TPR and FPR across groupsMedical diagnosis, risk assessment
Predictive ParityEqual precision across groupsRecidivism scoring (COMPAS controversy)
Individual FairnessSimilar individuals get similar predictionsAny sensitive decision-making
Python · Measure Bias with Fairlearn
from fairlearn.metrics import MetricFrame, demographic_parity_difference
from sklearn.metrics import accuracy_score

# y_true: actual labels, y_pred: model predictions
# sensitive_features: protected attribute (e.g., gender)
metric_frame = MetricFrame(
    metrics=accuracy_score,
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=X_test['gender']
)

print(metric_frame.by_group)
# Output:
# gender
# male      0.91
# female    0.76   ← big gap — potential bias!

# Demographic parity difference (should be close to 0)
dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=X_test['gender'])
print(f"Demographic parity difference: {dpd:.3f}")

SHAP — Explain Any Model

SHAP (SHapley Additive exPlanations) is the gold standard for model explainability. It assigns each feature a contribution score for each individual prediction — grounded in game theory (Shapley values).

🔍 Interactive SHAP Waterfall Chart
Simulated SHAP values for a loan denial decision. Each bar shows how much a feature pushed the score up or down.
Python · SHAP Explanations
import shap
import xgboost as xgb

model = xgb.XGBClassifier().fit(X_train, y_train)

# Create SHAP explainer
explainer = shap.Explainer(model)
shap_values = explainer(X_test)

# Waterfall plot — explains ONE prediction
shap.plots.waterfall(shap_values[0])

# Summary plot — feature importance across all predictions
shap.plots.beeswarm(shap_values)

# For a specific prediction:
print("Base value (average prediction):", explainer.expected_value)
print("SHAP values for first instance:")
for feature, value in zip(X_test.columns, shap_values[0].values):
    direction = "↑" if value > 0 else "↓"
    print(f"  {feature}: {direction} {value:.3f}")

LIME — Local Explanations

LIME (Local Interpretable Model-agnostic Explanations) works differently: it creates a simple, interpretable model that approximates the complex model's behaviour for a single prediction.

Python · LIME Text Explanation
from lime.lime_text import LimeTextExplainer
from sklearn.pipeline import make_pipeline

# Any text classifier + vectorizer
pipeline = make_pipeline(vectorizer, classifier)
explainer = LimeTextExplainer(class_names=['Not Spam', 'Spam'])

# Explain why this email was classified as spam
explanation = explainer.explain_instance(
    "Congratulations! You won a free iPhone. Click here!",
    pipeline.predict_proba,
    num_features=6
)

# Shows which words pushed toward "Spam"
explanation.show_in_notebook()
# "Congratulations" → +0.42 (spam indicator)
# "free" → +0.38 (spam indicator)
# "won" → +0.21 (spam indicator)

EU AI Act — Know the Law

The EU AI Act (effective August 2024, enforcement from 2026) is the world's first comprehensive AI law. It classifies AI systems by risk level:

🚫 Unacceptable Risk — Banned

Social scoring by governments, real-time biometric surveillance in public spaces, AI exploiting vulnerabilities of vulnerable groups.

🔴 High Risk — Strict Requirements

AI in hiring, credit scoring, medical devices, law enforcement, critical infrastructure. Requires conformity assessment, human oversight, transparency.

🟡 Limited Risk — Transparency

Chatbots must disclose they are AI. Deepfakes must be labelled. Emotion recognition systems must inform users.

🟢 Minimal Risk — Free to Deploy

Spam filters, AI in video games, recommendation systems. No specific requirements beyond existing laws.

Responsible AI Framework

Before Training
  • Audit training data for representation gaps and historical bias
  • Define fairness metrics that matter for your use case
  • Conduct a risk assessment — what's the worst-case harm?
During Development
  • Evaluate on disaggregated subgroups, not just overall accuracy
  • Add explainability (SHAP/LIME) for any high-stakes decisions
  • Red-team the model — try to elicit harmful outputs
In Production
  • Implement human-in-the-loop for irreversible decisions
  • Monitor for performance degradation across demographic groups
  • Provide an appeal mechanism for automated decisions
  • Maintain model cards documenting intended use and limitations

Frequently Asked Questions

Can you make a model that's fair by every metric?

No — this is mathematically impossible in most cases. The "impossibility theorems" show that demographic parity, equalized odds, and predictive parity cannot all hold simultaneously when base rates differ across groups. You must choose which fairness definition matters most for your context and be transparent about the trade-offs you made.

What is a Model Card and do I need one?

A Model Card (introduced by Google) is a short document that accompanies a model, documenting: what it does, who trained it, what data it was trained on, performance across demographic groups, limitations, and intended/prohibited uses. You need one whenever your model will be used by others or make consequential decisions. It's both a transparency tool and a liability shield.

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.