Neural Networks

Neural networks are loosely inspired by the human brain. They are computational graphs made of interconnected nodes (neurons) that can learn remarkably complex patterns from data — given enough layers and enough data.

The Neuron

A single artificial neuron does three things:

Multiply each input by a weight
x₁w₁ + x₂w₂ + x₃w₃

Add a bias term
+ b

Pass through an activation function
output = f(sum)

The weights and bias are learned from data during training. The activation function adds non-linearity — without it, stacking layers would be no different from a single linear equation.

🧠 Interactive Neural Network

Click Run Forward Pass to see data flow through the network. Use sliders to change the architecture.

Hidden Layers: 2

Neurons/Layer: 4

Activation Functions

Activation functions determine whether a neuron "fires" and introduce non-linearity:

ReLU

f(x) = max(0, x)

Most common for hidden layers. Fast, simple, avoids vanishing gradients.

✅ Use for: Hidden layers in most networks

Sigmoid

f(x) = 1 / (1 + e⁻ˣ)

Squashes output to (0, 1). Useful for binary probability output.

✅ Use for: Binary classification output layer

Softmax

f(xᵢ) = eˣⁱ / Σeˣʲ

Converts logits to a probability distribution summing to 1.

✅ Use for: Multi-class classification output

Tanh

f(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)

Output in (-1, 1). Better than sigmoid for hidden layers in RNNs.

✅ Use for: RNN hidden states

Layers: Input → Hidden → Output

Neural networks are organised into layers:

Input

Input Layer
One neuron per feature. No computation, just passes data in.

Hidden Layers
Where learning happens. Each layer extracts increasingly abstract features.

Output

Output Layer
Final prediction. Neuron count = number of classes (or 1 for regression).

Forward Pass

A forward pass is when data flows from input → output to generate a prediction. At each layer, each neuron computes its weighted sum + bias, then applies its activation function.

Python · PyTorch — Simple Network

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(4, 16),   # Input: 4 features → 16 neurons
            nn.ReLU(),
            nn.Linear(16, 8),   # Hidden: 16 → 8 neurons
            nn.ReLU(),
            nn.Linear(8, 1),    # Output: 8 → 1 (regression)
        )

    def forward(self, x):
        return self.layers(x)

model = SimpleNet()
x = torch.randn(32, 4)  # Batch of 32 samples, 4 features each
output = model(x)        # Forward pass
print(output.shape)      # → torch.Size([32, 1])

Backpropagation & Gradient Descent

Training a neural network means finding the right weights. This happens through:

Forward Pass
Compute predictions from current weights

↓

Compute Loss
Measure how wrong predictions are (MSE, Cross-Entropy)

↓

Backpropagation
Use the chain rule to compute gradient of loss w.r.t. each weight

↓

Gradient Descent
Update weights: w = w - lr × ∇w

↓ Repeat for many batches

💡 The Learning Rate

The learning rate controls how big each weight update is. Too high → weights diverge (exploding gradients). Too low → training takes forever. Typical starting values: 0.001 or 0.0001. Use learning rate schedulers to decay over time.

Complete Training Loop in PyTorch

Python · PyTorch Training Loop

import torch
import torch.nn as nn
import torch.optim as optim

model = SimpleNet()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(100):
    # Forward pass
    predictions = model(X_train)
    loss = criterion(predictions, y_train)

    # Backward pass
    optimizer.zero_grad()  # Clear previous gradients
    loss.backward()        # Compute gradients
    optimizer.step()       # Update weights

    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.4f}")

Key Hyperparameters

Learning RateStep size for weight updates0.0001 – 0.01

Batch SizeSamples per gradient update32, 64, 128, 256

EpochsFull passes through training data10 – 200

Hidden UnitsCapacity / expressiveness64 – 4096

Dropout RateRegularisation strength0.2 – 0.5

Frequently Asked Questions

How many layers do I need?

Start with 1–3 hidden layers. Modern deep learning uses tens or hundreds of layers (ResNet has 152!). For tabular data, 2–3 layers is usually enough. Add more only if you have enough data and the simpler model underfits.

What is the vanishing gradient problem?

In very deep networks, gradients shrink as they travel backwards through many layers. Layers close to the input receive tiny gradient updates and stop learning. ReLU activations and residual connections (ResNets) largely solve this.

PyTorch or TensorFlow — which should I learn?

Both are excellent. PyTorch is more popular in research (imperative style, easier debugging). TensorFlow/Keras is strong in production deployment (TFLite, TFServing). We recommend starting with PyTorch — the syntax is more Pythonic and beginner-friendly.

Neural Networks

The Neuron

🧠 Interactive Neural Network

Activation Functions

ReLU

Sigmoid

Softmax

Tanh

Layers: Input → Hidden → Output

Forward Pass

Backpropagation & Gradient Descent

Complete Training Loop in PyTorch

Key Hyperparameters

Frequently Asked Questions

Frequently Asked Questions

What will I learn here?

How should I use this page?

What should I read next?