Supervised Learning

Supervised learning is the most widely used type of machine learning. You give the model labelled examples — inputs paired with known correct outputs — and it learns to predict the output for new, unseen inputs. Think of it as learning with an answer key.

Regression vs Classification

All supervised problems fall into one of two buckets:

📈 Regression

The output is a continuous number.

Predicting house prices ($345,000)
Forecasting tomorrow's temperature (24°C)
Estimating a stock's closing price

Loss function: Mean Squared Error (MSE)

🏷️ Classification

The output is a category / label.

Is this email spam? (Yes/No)
Which digit is this? (0-9)
Is this tumour malignant? (Yes/No)

Loss function: Cross-Entropy / Log-Loss

Linear Regression — The Simplest Model

Linear regression fits a straight line through your data. The line is defined by two parameters: slope (w) and intercept (b). The model predicts: y = w·x + b.

Training finds the values of w and b that minimise the average squared error between predictions and real values (the MSE loss).

🖊️ Interactive: Fit a Line to Data

Drag the sliders to change slope and intercept. Watch the line update.

Slope (w): 1.0

Intercept (b): 0

MSE Loss: —

Logistic Regression — Binary Classification

Despite the name, Logistic Regression is a classification algorithm. It passes the linear output through a sigmoid function to squash it between 0 and 1, giving a probability.

Python · Scikit-Learn

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.2%}")
# → Accuracy: 96.49%

Decision Trees

A decision tree splits data using a series of yes/no questions. At each node it picks the feature and threshold that best separates the classes (measured by Gini impurity or information gain).

Is Age > 30?

Yes

Income > 50k?

Yes

✅ Approved

❌ Declined

⚠️ Overfitting Risk

A single deep decision tree memorises training data perfectly but generalises poorly. Use max_depth to limit depth, or switch to Random Forest (many trees averaged together).

Support Vector Machines (SVM)

SVMs find the maximum-margin hyperplane — the line (or plane, in higher dimensions) that separates classes with the largest possible gap. Points closest to the boundary are called support vectors.

SVMs also work for non-linearly-separable data via the kernel trick, which implicitly maps data into a higher-dimensional space where it becomes separable.

linearData is roughly linearly separable

rbfDefault choice — handles most non-linear problems

polyWhen polynomial decision boundaries are expected

Ensemble Methods — The Power of Many

Instead of relying on one model, ensemble methods combine multiple weak models into one strong one. Two key strategies:

Bagging (Random Forest)

Train many trees on random subsets of data. Average their predictions. Reduces variance. Works great on tabular data.

RandomForestClassifier(n_estimators=100)

Boosting (XGBoost, LightGBM)

Train trees sequentially. Each new tree corrects the errors of the previous ones. Reduces bias. Often wins Kaggle competitions.

XGBClassifier(n_estimators=200, learning_rate=0.1)

Hands-On: Titanic Survival Prediction

Python · End-to-End Example

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load data
df = pd.read_csv('titanic.csv')
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df = df[['Pclass', 'Sex', 'Age', 'Fare', 'Survived']].dropna()

X = df.drop('Survived', axis=1)
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

print(classification_report(y_test, model.predict(X_test)))

Choosing the Right Algorithm

Not sure which algorithm to try first? Use this decision guide:

Start simple — try Logistic Regression or Linear Regression first as a baseline
Structured tabular data — Random Forest or XGBoost usually wins
Need interpretability — Decision Tree or Logistic Regression
High-dimensional sparse data — SVM with linear kernel or Lasso Regression
Images, audio, text — Skip to Phase 3 (Deep Learning)

Frequently Asked Questions

What's the difference between a parameter and a hyperparameter?

Parameters (like weights in a neural network) are learned from data during training. Hyperparameters (like learning rate, n_estimators, max_depth) are set by you before training starts and control how the model learns.

When should I use XGBoost vs Random Forest?

XGBoost typically achieves better accuracy because it corrects errors iteratively. Random Forest is faster to train and more robust to hyperparameter choices. Start with Random Forest, then try XGBoost for that extra performance boost.

How much data do I need for supervised learning?

It depends on complexity. Simple linear regression can work with 50 rows. Deep learning needs thousands to millions. For traditional ML, aim for at least 1,000 rows per class for reliable classification.

Supervised Learning

Regression vs Classification

📈 Regression

🏷️ Classification

Linear Regression — The Simplest Model

🖊️ Interactive: Fit a Line to Data

Logistic Regression — Binary Classification

Decision Trees

Support Vector Machines (SVM)

Ensemble Methods — The Power of Many

Bagging (Random Forest)

Boosting (XGBoost, LightGBM)

Hands-On: Titanic Survival Prediction

Choosing the Right Algorithm

Frequently Asked Questions

Frequently Asked Questions

What will I learn here?

How should I use this page?

What should I read next?