AI Ethics & Governance
Powerful AI systems can cause real harm if deployed carelessly — biased hiring algorithms, discriminatory loan models, opaque medical diagnoses. Building responsibly isn't optional; it's a technical and legal requirement.
The Core Problems
Models learn patterns from historical data — including historical discrimination. A hiring model trained on past hires may perpetuate gender or racial bias.
Deep learning models are black boxes. When a model denies a loan, the applicant has a right to know why — but the model can't explain itself.
Models trained on personal data can memorise and leak it. LLMs can recite training data verbatim, including PII and copyrighted content.
Autonomous agents that can send emails, delete files, or make purchases can cause irreversible harm if they make mistakes.
Fairness Metrics — Measuring Bias
There is no single definition of "fair" — different metrics capture different notions of fairness, and they often conflict. You must choose which matters most for your use case.
from fairlearn.metrics import MetricFrame, demographic_parity_difference
from sklearn.metrics import accuracy_score
# y_true: actual labels, y_pred: model predictions
# sensitive_features: protected attribute (e.g., gender)
metric_frame = MetricFrame(
metrics=accuracy_score,
y_true=y_test,
y_pred=y_pred,
sensitive_features=X_test['gender']
)
print(metric_frame.by_group)
# Output:
# gender
# male 0.91
# female 0.76 ← big gap — potential bias!
# Demographic parity difference (should be close to 0)
dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=X_test['gender'])
print(f"Demographic parity difference: {dpd:.3f}") SHAP — Explain Any Model
SHAP (SHapley Additive exPlanations) is the gold standard for model explainability. It assigns each feature a contribution score for each individual prediction — grounded in game theory (Shapley values).
import shap
import xgboost as xgb
model = xgb.XGBClassifier().fit(X_train, y_train)
# Create SHAP explainer
explainer = shap.Explainer(model)
shap_values = explainer(X_test)
# Waterfall plot — explains ONE prediction
shap.plots.waterfall(shap_values[0])
# Summary plot — feature importance across all predictions
shap.plots.beeswarm(shap_values)
# For a specific prediction:
print("Base value (average prediction):", explainer.expected_value)
print("SHAP values for first instance:")
for feature, value in zip(X_test.columns, shap_values[0].values):
direction = "↑" if value > 0 else "↓"
print(f" {feature}: {direction} {value:.3f}") LIME — Local Explanations
LIME (Local Interpretable Model-agnostic Explanations) works differently: it creates a simple, interpretable model that approximates the complex model's behaviour for a single prediction.
from lime.lime_text import LimeTextExplainer
from sklearn.pipeline import make_pipeline
# Any text classifier + vectorizer
pipeline = make_pipeline(vectorizer, classifier)
explainer = LimeTextExplainer(class_names=['Not Spam', 'Spam'])
# Explain why this email was classified as spam
explanation = explainer.explain_instance(
"Congratulations! You won a free iPhone. Click here!",
pipeline.predict_proba,
num_features=6
)
# Shows which words pushed toward "Spam"
explanation.show_in_notebook()
# "Congratulations" → +0.42 (spam indicator)
# "free" → +0.38 (spam indicator)
# "won" → +0.21 (spam indicator) EU AI Act — Know the Law
The EU AI Act (effective August 2024, enforcement from 2026) is the world's first comprehensive AI law. It classifies AI systems by risk level:
Social scoring by governments, real-time biometric surveillance in public spaces, AI exploiting vulnerabilities of vulnerable groups.
AI in hiring, credit scoring, medical devices, law enforcement, critical infrastructure. Requires conformity assessment, human oversight, transparency.
Chatbots must disclose they are AI. Deepfakes must be labelled. Emotion recognition systems must inform users.
Spam filters, AI in video games, recommendation systems. No specific requirements beyond existing laws.
Responsible AI Framework
- Audit training data for representation gaps and historical bias
- Define fairness metrics that matter for your use case
- Conduct a risk assessment — what's the worst-case harm?
- Evaluate on disaggregated subgroups, not just overall accuracy
- Add explainability (SHAP/LIME) for any high-stakes decisions
- Red-team the model — try to elicit harmful outputs
- Implement human-in-the-loop for irreversible decisions
- Monitor for performance degradation across demographic groups
- Provide an appeal mechanism for automated decisions
- Maintain model cards documenting intended use and limitations
Frequently Asked Questions
Can you make a model that's fair by every metric?
No — this is mathematically impossible in most cases. The "impossibility theorems" show that demographic parity, equalized odds, and predictive parity cannot all hold simultaneously when base rates differ across groups. You must choose which fairness definition matters most for your context and be transparent about the trade-offs you made.
What is a Model Card and do I need one?
A Model Card (introduced by Google) is a short document that accompanies a model, documenting: what it does, who trained it, what data it was trained on, performance across demographic groups, limitations, and intended/prohibited uses. You need one whenever your model will be used by others or make consequential decisions. It's both a transparency tool and a liability shield.
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.