Vector Databases — Store and Search AI Embeddings

Traditional databases search by exact value. Vector databases search by meaning. They are the backbone of RAG systems, semantic search, recommendation engines, and image retrieval.

What Are Embeddings?

An embedding is a list of numbers (a vector) that represents the meaning of text, images, or audio. Sentences with similar meaning produce vectors that are geometrically close to each other.

"The cat sat on the mat" → [0.12, -0.45, 0.88, ...] 1536 numbers

"A kitten rested on the rug" → [0.14, -0.41, 0.85, ...] Very close! (similar meaning)

"Stock market crashes 5%" → [-0.67, 0.23, -0.31, ...] Very far! (different meaning)

Interactive 2D Vector Similarity

📍 Explore Vector Space

Click anywhere on the canvas to add a vector point. The cosine similarity to the origin point is shown. Drag existing points to move them.

Click on the canvas to place vectors and see their similarity scores.

Cosine Similarity — The Core Metric

Cosine similarity measures the angle between two vectors, not their magnitude. A score of 1.0 means identical direction (same meaning), 0.0 means perpendicular (unrelated), -1.0 means opposite.

Python · Cosine Similarity from Scratch

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Example with OpenAI embeddings
from openai import OpenAI
client = OpenAI()

def embed(text):
    resp = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return np.array(resp.data[0].embedding)

vec_a = embed("machine learning model")
vec_b = embed("neural network training")
vec_c = embed("chocolate cake recipe")

print(cosine_similarity(vec_a, vec_b))  # ~0.87 (similar)
print(cosine_similarity(vec_a, vec_c))  # ~0.12 (different)

Vector Database Comparison

Database	Type	Scale	Managed	Best For
FAISS	Library	Single machine	❌ Self-hosted	Research, prototyping, local use
Pinecone	Cloud SaaS	Billions	✅ Fully managed	Production, minimal ops, real-time
Weaviate	Open source	Millions–billions	✅ Cloud + self	Rich metadata, multi-modal, GraphQL
ChromaDB	Open source	Millions	❌ Self-hosted	Local development, LangChain
Qdrant	Open source	Millions–billions	✅ Cloud + self	Performance-critical, filtering
pgvector	PostgreSQL ext	Millions	Depends	Already use Postgres, smaller scale

FAISS — Fast Local Similarity Search

Python · FAISS Indexing and Search

import faiss
import numpy as np

# Create 10,000 sample vectors of dimension 1536
dim = 1536
n_vectors = 10_000
vectors = np.random.randn(n_vectors, dim).astype('float32')

# Normalise for cosine similarity (FAISS uses inner product after normalisation)
faiss.normalize_L2(vectors)

# Build flat L2 index (exact, good for < 1M vectors)
index = faiss.IndexFlatIP(dim)   # Inner Product = cosine after normalisation
index.add(vectors)
print(f"Index size: {index.ntotal} vectors")

# Search — find 5 nearest neighbours to a query
query = np.random.randn(1, dim).astype('float32')
faiss.normalize_L2(query)

scores, indices = index.search(query, k=5)
print("Top 5 matches (indices):", indices[0])
print("Similarity scores:", scores[0])

# For large collections: use IVF (inverted file) index for speed
nlist = 100   # Number of clusters
quantizer = faiss.IndexFlatIP(dim)
index_ivf = faiss.IndexIVFFlat(quantizer, dim, nlist)
index_ivf.train(vectors)
index_ivf.add(vectors)
index_ivf.nprobe = 10  # Search 10 clusters (trade speed for recall)

ChromaDB — Simplest Local Setup

Python · ChromaDB with Persistent Storage

import chromadb
from chromadb.utils import embedding_functions

# Persistent local database
client = chromadb.PersistentClient(path="./my_chroma_db")

# Use OpenAI embeddings automatically
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small"
)

collection = client.get_or_create_collection(
    name="knowledge_base",
    embedding_function=openai_ef
)

# Add documents (ChromaDB embeds them automatically)
collection.add(
    documents=[
        "Python is a high-level programming language",
        "Machine learning requires large datasets",
        "The Eiffel Tower is in Paris, France",
    ],
    ids=["doc1", "doc2", "doc3"],
    metadatas=[{"source": "wiki"}, {"source": "textbook"}, {"source": "wiki"}]
)

# Query — semantic search
results = collection.query(
    query_texts=["coding with Python"],
    n_results=2,
    where={"source": "wiki"}   # Metadata filter
)
print(results["documents"])

Pinecone — Production-Ready Managed

Python · Pinecone Upsert and Query

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create index (one-time)
pc.create_index(
    name="knowledge-base",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("knowledge-base")

# Upsert vectors with metadata
vectors = [
    ("id-001", [0.1, 0.2, ...], {"text": "Doc 1", "category": "tech"}),
    ("id-002", [0.3, 0.1, ...], {"text": "Doc 2", "category": "science"}),
]
index.upsert(vectors=vectors, namespace="production")

# Query
results = index.query(
    vector=[0.15, 0.18, ...],
    top_k=5,
    filter={"category": {"$eq": "tech"}},  # Metadata filtering
    include_metadata=True,
    namespace="production"
)
for match in results.matches:
    print(f"{match.id}: score={match.score:.3f}, text={match.metadata['text']}")

ANN Algorithms — How Fast Search Works

Flat (Brute Force)

Compare query against every vector. Perfect recall, O(n) time. Use for <100K vectors or when accuracy is critical.

FAISS IndexFlatL2

IVF (Inverted File)

Cluster vectors, search only nearby clusters. 10–100× faster than flat, small recall loss controlled by nprobe.

FAISS IndexIVFFlat

HNSW

Hierarchical Navigable Small World graph. Excellent recall/speed balance. Default in Weaviate, Qdrant.

Most cloud DBs

PQ (Product Quantisation)

Compress vectors 8–64× by splitting into sub-vectors. Trades small accuracy loss for huge memory savings.

FAISS IndexPQ

Frequently Asked Questions

Which embedding model should I use?

For English text: OpenAI text-embedding-3-small (cheap, good) or text-embedding-3-large (best). Open source: bge-large-en-v1.5 (HuggingFace) for local use. Multi-lingual: Cohere Embed v3 or multilingual-e5-large. For images: CLIP. Always use the same model for indexing and querying.

How many vectors can each database handle?

FAISS: limited by RAM — roughly 100M vectors on a 32GB machine with compression. Pinecone/Weaviate/Qdrant: designed for billions of vectors with distributed sharding. ChromaDB: comfortably handles millions locally. For most RAG applications, even 1M vectors covers entire company knowledge bases.

Should I store raw text or just vectors?

Always store both. Store the vector for searching and the original text (or a reference to it) as metadata. When you find the top-K vectors, you need the original text to inject into your LLM prompt. Most vector databases support metadata fields alongside vectors for exactly this purpose.

Vector Databases — Store and Search AI Embeddings

What Are Embeddings?

Interactive 2D Vector Similarity

📍 Explore Vector Space

Cosine Similarity — The Core Metric

Vector Database Comparison

FAISS — Fast Local Similarity Search

ChromaDB — Simplest Local Setup

Pinecone — Production-Ready Managed

ANN Algorithms — How Fast Search Works

Flat (Brute Force)

IVF (Inverted File)

HNSW

PQ (Product Quantisation)

Frequently Asked Questions

Frequently Asked Questions

What will I learn here?

How should I use this page?

What should I read next?