Edge AI & Federated Learning
Not every AI decision can wait for a round trip to the cloud. A self-driving car needs to detect a pedestrian in milliseconds, not wait for a server in Virginia to respond. A hospital can't send patient data to a third-party cloud. Edge AI and federated learning solve these problems — bringing intelligence to where the data lives.
What is Edge AI?
Edge AI is the deployment of AI models directly on devices at the "edge" of the network — smartphones, cameras, sensors, industrial controllers, autonomous vehicles — rather than sending data to a central cloud for processing.
Why Run AI at the Edge?
Latency: Round-trip to cloud adds 50–200ms. Edge inference takes 5–20ms. For robotics, AR/VR, and autonomous systems, this difference is critical.
Privacy: Sensitive data (medical images, personal conversations, security footage) never leaves the device. No data transmission = no data breach risk.
Reliability: Edge devices work offline. A factory robot running local AI keeps operating when the internet connection drops.
Cost: Processing data locally instead of streaming it all to the cloud dramatically reduces bandwidth and cloud compute costs.
Edge AI Hardware
NVIDIA Jetson Series
The dominant platform for edge AI. Jetson Nano (for hobbyists and prototyping), Jetson Orin NX (mid-range), and Jetson AGX Orin (172 TOPS — powerful enough for autonomous vehicles and industrial robots). All run CUDA and support TensorRT for optimized inference. The same code that runs on a data center GPU can (mostly) run on Jetson with minimal changes.
Google Coral TPU
Google's edge TPU accelerator — a tiny chip (or USB stick) that runs TensorFlow Lite models at 4 TOPS. Designed for always-on, ultra-low-power inference (smart cameras, IoT devices). Much less capable than Jetson, but draws milliwatts instead of watts. Perfect for simple classification tasks that run 24/7.
Apple Neural Engine & Qualcomm AI Engine
Modern smartphones include dedicated AI accelerators. Apple's Neural Engine (in M-series and A-series chips) handles face recognition, image enhancement, and on-device Siri. Qualcomm's AI Engine is in most Android flagships. These run billions of AI operations per second on battery power — enabling AI features that would have required a data center ten years ago.
Optimizing Models for Edge Deployment
Cloud models are too large and slow for most edge devices. Several techniques make them edge-friendly:
Quantization
Reduce weight precision from 32-bit floats to 8-bit or 4-bit integers. Often 4x smaller, 2–4x faster, with minimal accuracy loss.
Pruning
Remove weights or neurons that contribute little to accuracy. Creates sparse models that run faster on compatible hardware.
Knowledge Distillation
Train a small "student" model to mimic a large "teacher" model. The student captures most of the capability at a fraction of the size.
TensorRT / CoreML / ONNX
Hardware-specific compilation tools that fuse operations and optimize memory layout for a target device. Often 2–5x inference speedup.
Federated Learning
Federated learning is a training approach where the model goes to the data, instead of the data going to the model. Multiple devices or organizations each train the model locally on their own data, then share only the model updates (gradients) — never the raw data itself.
How It Works
1. A central server sends the current model to all participating devices.
2. Each device trains the model on its local data and computes updates (gradients).
3. Devices send gradients (not raw data) back to the server.
4. The server aggregates gradients (typically via FedAvg — federated averaging) and updates the global model.
5. Repeat.
Real-World Applications
Keyboard prediction: Google's Gboard uses federated learning — your phone trains the next-word prediction model on your typing without Google ever seeing your messages. Healthcare: Multiple hospitals train a cancer detection model without sharing patient data across institutional boundaries. Finance: Banks collaborate on fraud detection models without exposing transaction data to competitors.
Frequently Asked Questions
Is federated learning truly private?
Better than sharing raw data, but not completely private. Research has shown that gradients can be "inverted" to reconstruct training data (gradient inversion attacks). Production federated learning systems combine gradient aggregation with differential privacy (adding calibrated mathematical noise to gradients) and secure aggregation (cryptographic protocols where the server only sees the aggregate, never individual gradients). With these protections, federated learning provides strong privacy guarantees.
What is the difference between edge AI and IoT?
IoT (Internet of Things) refers to the ecosystem of connected devices that collect and transmit data — sensors, smart appliances, industrial machines. Edge AI adds intelligence to those devices — instead of just transmitting raw sensor data to the cloud, IoT devices with edge AI process it locally and make decisions autonomously. All edge AI devices are IoT devices, but not all IoT devices have edge AI capability.
Can I run an LLM at the edge?
Increasingly yes. Small LLMs (1B–7B parameters, quantized to 4-bit) can run on recent smartphones, laptops, and high-end edge devices. Apple's iPhone 15 Pro can run 3B parameter models locally. Tools like llama.cpp, Ollama, and MLC LLM enable local LLM inference. The quality gap vs. frontier models (GPT-4, Claude) is significant, but on-device LLMs excel for use cases requiring offline capability or strict privacy.
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.