Supervised Learning

Supervised learning is the most widely used type of machine learning. You give the model labelled examples — inputs paired with known correct outputs — and it learns to predict the output for new, unseen inputs. Think of it as learning with an answer key.

📖 This page covers: Regression · Classification · Decision Trees · SVMs · Ensemble Methods · Hands-on Python code

Regression vs Classification

All supervised problems fall into one of two buckets:

📈 Regression

The output is a continuous number.

  • Predicting house prices ($345,000)
  • Forecasting tomorrow's temperature (24°C)
  • Estimating a stock's closing price

Loss function: Mean Squared Error (MSE)

🏷️ Classification

The output is a category / label.

  • Is this email spam? (Yes/No)
  • Which digit is this? (0-9)
  • Is this tumour malignant? (Yes/No)

Loss function: Cross-Entropy / Log-Loss

Linear Regression — The Simplest Model

Linear regression fits a straight line through your data. The line is defined by two parameters: slope (w) and intercept (b). The model predicts: y = w·x + b.

Training finds the values of w and b that minimise the average squared error between predictions and real values (the MSE loss).

🖊️ Interactive: Fit a Line to Data

Drag the sliders to change slope and intercept. Watch the line update.

MSE Loss:

Logistic Regression — Binary Classification

Despite the name, Logistic Regression is a classification algorithm. It passes the linear output through a sigmoid function to squash it between 0 and 1, giving a probability.

Python · Scikit-Learn
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.2%}")
# → Accuracy: 96.49%

Decision Trees

A decision tree splits data using a series of yes/no questions. At each node it picks the feature and threshold that best separates the classes (measured by Gini impurity or information gain).

Is Age > 30?
Yes
Income > 50k?
Yes
✅ Approved
No
❌ Declined
No
❌ Declined
⚠️ Overfitting Risk

A single deep decision tree memorises training data perfectly but generalises poorly. Use max_depth to limit depth, or switch to Random Forest (many trees averaged together).

Support Vector Machines (SVM)

SVMs find the maximum-margin hyperplane — the line (or plane, in higher dimensions) that separates classes with the largest possible gap. Points closest to the boundary are called support vectors.

SVMs also work for non-linearly-separable data via the kernel trick, which implicitly maps data into a higher-dimensional space where it becomes separable.

KernelWhen to Use
linearData is roughly linearly separable
rbfDefault choice — handles most non-linear problems
polyWhen polynomial decision boundaries are expected

Ensemble Methods — The Power of Many

Instead of relying on one model, ensemble methods combine multiple weak models into one strong one. Two key strategies:

Bagging (Random Forest)

Train many trees on random subsets of data. Average their predictions. Reduces variance. Works great on tabular data.

RandomForestClassifier(n_estimators=100)

Boosting (XGBoost, LightGBM)

Train trees sequentially. Each new tree corrects the errors of the previous ones. Reduces bias. Often wins Kaggle competitions.

XGBClassifier(n_estimators=200, learning_rate=0.1)

Hands-On: Titanic Survival Prediction

Python · End-to-End Example
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load data
df = pd.read_csv('titanic.csv')
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df = df[['Pclass', 'Sex', 'Age', 'Fare', 'Survived']].dropna()

X = df.drop('Survived', axis=1)
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

print(classification_report(y_test, model.predict(X_test)))

Choosing the Right Algorithm

Not sure which algorithm to try first? Use this decision guide:

  • Start simple — try Logistic Regression or Linear Regression first as a baseline
  • Structured tabular data — Random Forest or XGBoost usually wins
  • Need interpretability — Decision Tree or Logistic Regression
  • High-dimensional sparse data — SVM with linear kernel or Lasso Regression
  • Images, audio, text — Skip to Phase 3 (Deep Learning)

Frequently Asked Questions

What's the difference between a parameter and a hyperparameter?

Parameters (like weights in a neural network) are learned from data during training. Hyperparameters (like learning rate, n_estimators, max_depth) are set by you before training starts and control how the model learns.

When should I use XGBoost vs Random Forest?

XGBoost typically achieves better accuracy because it corrects errors iteratively. Random Forest is faster to train and more robust to hyperparameter choices. Start with Random Forest, then try XGBoost for that extra performance boost.

How much data do I need for supervised learning?

It depends on complexity. Simple linear regression can work with 50 rows. Deep learning needs thousands to millions. For traditional ML, aim for at least 1,000 rows per class for reliable classification.

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.