Neural Networks

Neural networks are loosely inspired by the human brain. They are computational graphs made of interconnected nodes (neurons) that can learn remarkably complex patterns from data — given enough layers and enough data.

📖 Covers: Neurons · Layers · Activation Functions · Forward Pass · Backpropagation · Gradient Descent

The Neuron

A single artificial neuron does three things:

1
Multiply each input by a weight
x₁w₁ + x₂w₂ + x₃w₃
2
Add a bias term
+ b
3
Pass through an activation function
output = f(sum)

The weights and bias are learned from data during training. The activation function adds non-linearity — without it, stacking layers would be no different from a single linear equation.

🧠 Interactive Neural Network

Click Run Forward Pass to see data flow through the network. Use sliders to change the architecture.

Activation Functions

Activation functions determine whether a neuron "fires" and introduce non-linearity:

ReLU

f(x) = max(0, x)

Most common for hidden layers. Fast, simple, avoids vanishing gradients.

✅ Use for: Hidden layers in most networks

Sigmoid

f(x) = 1 / (1 + e⁻ˣ)

Squashes output to (0, 1). Useful for binary probability output.

✅ Use for: Binary classification output layer

Softmax

f(xᵢ) = eˣⁱ / Σeˣʲ

Converts logits to a probability distribution summing to 1.

✅ Use for: Multi-class classification output

Tanh

f(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)

Output in (-1, 1). Better than sigmoid for hidden layers in RNNs.

✅ Use for: RNN hidden states

Layers: Input → Hidden → Output

Neural networks are organised into layers:

Input
Input Layer
One neuron per feature. No computation, just passes data in.
Hidden Layers
Where learning happens. Each layer extracts increasingly abstract features.
Output
Output Layer
Final prediction. Neuron count = number of classes (or 1 for regression).

Forward Pass

A forward pass is when data flows from input → output to generate a prediction. At each layer, each neuron computes its weighted sum + bias, then applies its activation function.

Python · PyTorch — Simple Network
import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(4, 16),   # Input: 4 features → 16 neurons
            nn.ReLU(),
            nn.Linear(16, 8),   # Hidden: 16 → 8 neurons
            nn.ReLU(),
            nn.Linear(8, 1),    # Output: 8 → 1 (regression)
        )

    def forward(self, x):
        return self.layers(x)

model = SimpleNet()
x = torch.randn(32, 4)  # Batch of 32 samples, 4 features each
output = model(x)        # Forward pass
print(output.shape)      # → torch.Size([32, 1])

Backpropagation & Gradient Descent

Training a neural network means finding the right weights. This happens through:

1
Forward Pass
Compute predictions from current weights
2
Compute Loss
Measure how wrong predictions are (MSE, Cross-Entropy)
3
Backpropagation
Use the chain rule to compute gradient of loss w.r.t. each weight
4
Gradient Descent
Update weights: w = w - lr × ∇w
↓ Repeat for many batches
💡 The Learning Rate

The learning rate controls how big each weight update is. Too high → weights diverge (exploding gradients). Too low → training takes forever. Typical starting values: 0.001 or 0.0001. Use learning rate schedulers to decay over time.

Complete Training Loop in PyTorch

Python · PyTorch Training Loop
import torch
import torch.nn as nn
import torch.optim as optim

model = SimpleNet()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(100):
    # Forward pass
    predictions = model(X_train)
    loss = criterion(predictions, y_train)

    # Backward pass
    optimizer.zero_grad()  # Clear previous gradients
    loss.backward()        # Compute gradients
    optimizer.step()       # Update weights

    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.4f}")

Key Hyperparameters

HyperparameterWhat It ControlsTypical Values
Learning RateStep size for weight updates0.0001 – 0.01
Batch SizeSamples per gradient update32, 64, 128, 256
EpochsFull passes through training data10 – 200
Hidden UnitsCapacity / expressiveness64 – 4096
Dropout RateRegularisation strength0.2 – 0.5

Frequently Asked Questions

How many layers do I need?

Start with 1–3 hidden layers. Modern deep learning uses tens or hundreds of layers (ResNet has 152!). For tabular data, 2–3 layers is usually enough. Add more only if you have enough data and the simpler model underfits.

What is the vanishing gradient problem?

In very deep networks, gradients shrink as they travel backwards through many layers. Layers close to the input receive tiny gradient updates and stop learning. ReLU activations and residual connections (ResNets) largely solve this.

PyTorch or TensorFlow — which should I learn?

Both are excellent. PyTorch is more popular in research (imperative style, easier debugging). TensorFlow/Keras is strong in production deployment (TFLite, TFServing). We recommend starting with PyTorch — the syntax is more Pythonic and beginner-friendly.

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.