Phase 4: LLMs & Generative AI
Large Language Models have transformed what AI can do. ChatGPT, Claude, Gemini, Llama — they all share the same core architecture: a massive Transformer trained on internet-scale text. In this phase, you'll understand how they work and how to build with them.
Build LLM-powered applications from scratch
8 – 12 weeks
Hugging Face, LangChain, OpenAI API, FAISS
What is a Large Language Model?
An LLM is a neural network — specifically a Transformer — trained to predict the next token given all previous tokens. "Large" means billions of parameters. "Language" means it understands and generates human text. The key insight: predict next word well enough and you develop general intelligence about language, reasoning, and world knowledge.
The LLM Training Pipeline
Train on massive text corpus (Common Crawl, books, Wikipedia). Learn to predict next token. Cost: $1M–$100M+.
Fine-tune on high-quality instruction-response pairs. Teach the model to follow instructions.
Human raters rank responses. A reward model is trained. PPO optimises the LLM against the reward signal.
Topics in This Phase
How LLMs Work
Tokenisation, embeddings, context windows, temperature, sampling strategies. Deep dive into the mechanics.
Read Guide →Prompt Engineering
Zero-shot, few-shot, chain-of-thought, system prompts. Get better results from any LLM without training.
Read Guide →Fine-Tuning & LoRA
Adapt open-source LLMs to specific tasks. LoRA, QLoRA, PEFT — fine-tune 7B models on a single GPU.
Read Guide →RAG Systems
Retrieval-Augmented Generation: give LLMs access to your private knowledge base. No hallucinations.
Read Guide →Vector Databases
FAISS, Pinecone, Weaviate, ChromaDB. Store and search embeddings for semantic similarity at scale.
Read Guide →LLM Capabilities & Limitations
✅ What LLMs Are Good At
- Text generation & creative writing
- Code generation & debugging
- Summarisation & translation
- Question answering with context
- Classification & entity extraction
- Reasoning through structured problems
❌ LLM Limitations
- Hallucination (confident but wrong)
- Knowledge cutoff (no real-time info)
- Inconsistency across runs
- Weak at precise arithmetic
- Long-context degradation
- Expensive to serve at scale
Before fine-tuning, always try prompt engineering first. 80% of LLM performance improvements come from better prompts, not from training. Only fine-tune when you need consistent domain-specific style or behaviour that prompts can't achieve.
Frequently Asked Questions
What's the difference between GPT, BERT, and Llama?
GPT (decoder-only) is trained to generate text — great for completion and chat. BERT (encoder-only) is trained to understand context — great for classification and NER. Llama is Meta's open-weights decoder-only model, similar to GPT but freely downloadable and runnable locally.
Can I run LLMs locally?
Yes! Tools like Ollama and LM Studio let you run Llama, Mistral, and other open models on your laptop. Expect 7B models to run at 10–20 tokens/second on a modern MacBook M-series chip. Quantised (4-bit) models are smaller and faster.
How much does it cost to use LLM APIs?
Claude Sonnet: ~$3/million tokens input, $15/million output. GPT-4o: ~$5/$15. These prices are dropping ~50% per year. For high-volume production apps, open-source + self-hosting often becomes cheaper above 1B tokens/month.
Frequently Asked Questions
What will I learn here?
This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.
How should I use this page?
Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.
What should I read next?
Use the navigation below to continue to the next lesson or explore related topics.