Supervised Learning
Supervised learning is the most widely used type of machine learning. You give the model labelled examples — inputs paired with known correct outputs — and it learns to predict the output for new, unseen inputs. Think of it as learning with an answer key.
Regression vs Classification
All supervised problems fall into one of two buckets:
📈 Regression
The output is a continuous number.
- Predicting house prices ($345,000)
- Forecasting tomorrow's temperature (24°C)
- Estimating a stock's closing price
Loss function: Mean Squared Error (MSE)
🏷️ Classification
The output is a category / label.
- Is this email spam? (Yes/No)
- Which digit is this? (0-9)
- Is this tumour malignant? (Yes/No)
Loss function: Cross-Entropy / Log-Loss
Linear Regression — The Simplest Model
Linear regression fits a straight line through your data. The line is defined by two parameters:
slope (w) and intercept (b). The model predicts: y = w·x + b.
Training finds the values of w and b that minimise the average squared error between predictions and real values (the MSE loss).
🖊️ Interactive: Fit a Line to Data
Drag the sliders to change slope and intercept. Watch the line update.
Logistic Regression — Binary Classification
Despite the name, Logistic Regression is a classification algorithm. It passes the linear output through a sigmoid function to squash it between 0 and 1, giving a probability.
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.2%}")
# → Accuracy: 96.49% Decision Trees
A decision tree splits data using a series of yes/no questions. At each node it picks the feature and threshold that best separates the classes (measured by Gini impurity or information gain).
A single deep decision tree memorises training data perfectly but generalises poorly. Use max_depth to limit depth, or switch to Random Forest (many trees averaged together).
Support Vector Machines (SVM)
SVMs find the maximum-margin hyperplane — the line (or plane, in higher dimensions) that separates classes with the largest possible gap. Points closest to the boundary are called support vectors.
SVMs also work for non-linearly-separable data via the kernel trick, which implicitly maps data into a higher-dimensional space where it becomes separable.
linearData is roughly linearly separablerbfDefault choice — handles most non-linear problemspolyWhen polynomial decision boundaries are expectedEnsemble Methods — The Power of Many
Instead of relying on one model, ensemble methods combine multiple weak models into one strong one. Two key strategies:
Bagging (Random Forest)
Train many trees on random subsets of data. Average their predictions. Reduces variance. Works great on tabular data.
RandomForestClassifier(n_estimators=100) Boosting (XGBoost, LightGBM)
Train trees sequentially. Each new tree corrects the errors of the previous ones. Reduces bias. Often wins Kaggle competitions.
XGBClassifier(n_estimators=200, learning_rate=0.1) Hands-On: Titanic Survival Prediction
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Load data
df = pd.read_csv('titanic.csv')
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df = df[['Pclass', 'Sex', 'Age', 'Fare', 'Survived']].dropna()
X = df.drop('Survived', axis=1)
y = df['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
print(classification_report(y_test, model.predict(X_test))) Choosing the Right Algorithm
Not sure which algorithm to try first? Use this decision guide:
- Start simple — try Logistic Regression or Linear Regression first as a baseline
- Structured tabular data — Random Forest or XGBoost usually wins
- Need interpretability — Decision Tree or Logistic Regression
- High-dimensional sparse data — SVM with linear kernel or Lasso Regression
- Images, audio, text — Skip to Phase 3 (Deep Learning)
Frequently Asked Questions
What's the difference between a parameter and a hyperparameter?
Parameters (like weights in a neural network) are learned from data during training. Hyperparameters (like learning rate, n_estimators, max_depth) are set by you before training starts and control how the model learns.
When should I use XGBoost vs Random Forest?
XGBoost typically achieves better accuracy because it corrects errors iteratively. Random Forest is faster to train and more robust to hyperparameter choices. Start with Random Forest, then try XGBoost for that extra performance boost.
How much data do I need for supervised learning?
It depends on complexity. Simple linear regression can work with 50 rows. Deep learning needs thousands to millions. For traditional ML, aim for at least 1,000 rows per class for reliable classification.
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.