Vector Databases — Store and Search AI Embeddings

Traditional databases search by exact value. Vector databases search by meaning. They are the backbone of RAG systems, semantic search, recommendation engines, and image retrieval.

🗄️ Covers: Embeddings · Cosine Similarity · FAISS · Pinecone · Weaviate · ChromaDB · ANN Algorithms · Python Examples

What Are Embeddings?

An embedding is a list of numbers (a vector) that represents the meaning of text, images, or audio. Sentences with similar meaning produce vectors that are geometrically close to each other.

"The cat sat on the mat" [0.12, -0.45, 0.88, ...] 1536 numbers
"A kitten rested on the rug" [0.14, -0.41, 0.85, ...] Very close! (similar meaning)
"Stock market crashes 5%" [-0.67, 0.23, -0.31, ...] Very far! (different meaning)

Interactive 2D Vector Similarity

📍 Explore Vector Space

Click anywhere on the canvas to add a vector point. The cosine similarity to the origin point is shown. Drag existing points to move them.

Click on the canvas to place vectors and see their similarity scores.

Cosine Similarity — The Core Metric

Cosine similarity measures the angle between two vectors, not their magnitude. A score of 1.0 means identical direction (same meaning), 0.0 means perpendicular (unrelated), -1.0 means opposite.

Python · Cosine Similarity from Scratch
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Example with OpenAI embeddings
from openai import OpenAI
client = OpenAI()

def embed(text):
    resp = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return np.array(resp.data[0].embedding)

vec_a = embed("machine learning model")
vec_b = embed("neural network training")
vec_c = embed("chocolate cake recipe")

print(cosine_similarity(vec_a, vec_b))  # ~0.87 (similar)
print(cosine_similarity(vec_a, vec_c))  # ~0.12 (different)

Vector Database Comparison

DatabaseTypeScaleManagedBest For
FAISSLibrarySingle machine❌ Self-hostedResearch, prototyping, local use
PineconeCloud SaaSBillions✅ Fully managedProduction, minimal ops, real-time
WeaviateOpen sourceMillions–billions✅ Cloud + selfRich metadata, multi-modal, GraphQL
ChromaDBOpen sourceMillions❌ Self-hostedLocal development, LangChain
QdrantOpen sourceMillions–billions✅ Cloud + selfPerformance-critical, filtering
pgvectorPostgreSQL extMillionsDependsAlready use Postgres, smaller scale

FAISS — Fast Local Similarity Search

Python · FAISS Indexing and Search
import faiss
import numpy as np

# Create 10,000 sample vectors of dimension 1536
dim = 1536
n_vectors = 10_000
vectors = np.random.randn(n_vectors, dim).astype('float32')

# Normalise for cosine similarity (FAISS uses inner product after normalisation)
faiss.normalize_L2(vectors)

# Build flat L2 index (exact, good for < 1M vectors)
index = faiss.IndexFlatIP(dim)   # Inner Product = cosine after normalisation
index.add(vectors)
print(f"Index size: {index.ntotal} vectors")

# Search — find 5 nearest neighbours to a query
query = np.random.randn(1, dim).astype('float32')
faiss.normalize_L2(query)

scores, indices = index.search(query, k=5)
print("Top 5 matches (indices):", indices[0])
print("Similarity scores:", scores[0])

# For large collections: use IVF (inverted file) index for speed
nlist = 100   # Number of clusters
quantizer = faiss.IndexFlatIP(dim)
index_ivf = faiss.IndexIVFFlat(quantizer, dim, nlist)
index_ivf.train(vectors)
index_ivf.add(vectors)
index_ivf.nprobe = 10  # Search 10 clusters (trade speed for recall)

ChromaDB — Simplest Local Setup

Python · ChromaDB with Persistent Storage
import chromadb
from chromadb.utils import embedding_functions

# Persistent local database
client = chromadb.PersistentClient(path="./my_chroma_db")

# Use OpenAI embeddings automatically
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small"
)

collection = client.get_or_create_collection(
    name="knowledge_base",
    embedding_function=openai_ef
)

# Add documents (ChromaDB embeds them automatically)
collection.add(
    documents=[
        "Python is a high-level programming language",
        "Machine learning requires large datasets",
        "The Eiffel Tower is in Paris, France",
    ],
    ids=["doc1", "doc2", "doc3"],
    metadatas=[{"source": "wiki"}, {"source": "textbook"}, {"source": "wiki"}]
)

# Query — semantic search
results = collection.query(
    query_texts=["coding with Python"],
    n_results=2,
    where={"source": "wiki"}   # Metadata filter
)
print(results["documents"])

Pinecone — Production-Ready Managed

Python · Pinecone Upsert and Query
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create index (one-time)
pc.create_index(
    name="knowledge-base",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("knowledge-base")

# Upsert vectors with metadata
vectors = [
    ("id-001", [0.1, 0.2, ...], {"text": "Doc 1", "category": "tech"}),
    ("id-002", [0.3, 0.1, ...], {"text": "Doc 2", "category": "science"}),
]
index.upsert(vectors=vectors, namespace="production")

# Query
results = index.query(
    vector=[0.15, 0.18, ...],
    top_k=5,
    filter={"category": {"$eq": "tech"}},  # Metadata filtering
    include_metadata=True,
    namespace="production"
)
for match in results.matches:
    print(f"{match.id}: score={match.score:.3f}, text={match.metadata['text']}")

ANN Algorithms — How Fast Search Works

Flat (Brute Force)

Compare query against every vector. Perfect recall, O(n) time. Use for <100K vectors or when accuracy is critical.

FAISS IndexFlatL2

IVF (Inverted File)

Cluster vectors, search only nearby clusters. 10–100× faster than flat, small recall loss controlled by nprobe.

FAISS IndexIVFFlat

HNSW

Hierarchical Navigable Small World graph. Excellent recall/speed balance. Default in Weaviate, Qdrant.

Most cloud DBs

PQ (Product Quantisation)

Compress vectors 8–64× by splitting into sub-vectors. Trades small accuracy loss for huge memory savings.

FAISS IndexPQ

Frequently Asked Questions

Which embedding model should I use?

For English text: OpenAI text-embedding-3-small (cheap, good) or text-embedding-3-large (best). Open source: bge-large-en-v1.5 (HuggingFace) for local use. Multi-lingual: Cohere Embed v3 or multilingual-e5-large. For images: CLIP. Always use the same model for indexing and querying.

How many vectors can each database handle?

FAISS: limited by RAM — roughly 100M vectors on a 32GB machine with compression. Pinecone/Weaviate/Qdrant: designed for billions of vectors with distributed sharding. ChromaDB: comfortably handles millions locally. For most RAG applications, even 1M vectors covers entire company knowledge bases.

Should I store raw text or just vectors?

Always store both. Store the vector for searching and the original text (or a reference to it) as metadata. When you find the top-K vectors, you need the original text to inject into your LLM prompt. Most vector databases support metadata fields alongside vectors for exactly this purpose.

Frequently Asked Questions

What will I learn here?

This page covers the core concepts and techniques you need to understand the topic and progress confidently to the next lesson.

How should I use this page?

Start with the overview, then follow the section links to deepen your understanding. Use the table of contents on the right to jump to specific sections.

What should I read next?

Use the navigation below to continue to the next lesson or explore related topics.