Vector Databases in the Cloud

Traditional databases search by exact match or range. Vector databases search by meaning. They're the reason you can ask an AI chatbot a question and it finds the right answer from your company's documentation — even if none of the exact words match. Vector databases are the memory system of the AI era.

What is a Vector Embedding?

A vector embedding is a list of numbers that represents the meaning of a piece of text (or an image, or audio). An embedding model converts "the cat sat on the mat" into something like [0.12, -0.45, 0.89, ...] — a point in high-dimensional space (typically 768–4,096 dimensions).

Why Vectors Capture Meaning

Embedding models are trained to place semantically similar items close together in vector space. "Dog" and "puppy" will have very similar vectors. "Paris" and "France" will be close. "King - Man + Woman ≈ Queen" is a famous demonstration. The distance between two vectors (cosine similarity or L2 distance) measures semantic similarity — without any keyword matching.

How Vector Databases Work

A vector database stores vectors and enables fast approximate nearest-neighbor (ANN) search — finding the N vectors most similar to a query vector among potentially billions of stored vectors.

The ANN Challenge

Brute-force nearest-neighbor search (comparing your query against every stored vector) is O(n) — fine for thousands of vectors, impractical for millions. Vector databases use ANN indexes — data structures that trade a small amount of recall accuracy for dramatic speed improvements. HNSW (Hierarchical Navigable Small World) and IVF (Inverted File Index) are the dominant index types, supporting millisecond search across billions of vectors.

Hybrid Search

Modern vector databases combine vector search (semantic similarity) with traditional keyword search (BM25/TF-IDF) — a technique called hybrid search. This handles both semantic queries ("articles about dog care") and exact lookups ("find documents mentioning SKU-4521") in a single query. Most production RAG systems use hybrid search rather than pure vector search.

Vector Database Options in the Cloud

Pinecone — Managed, Serverless

Pinecone is purpose-built for vector search — no other database features. It's serverless: you just insert vectors and query them, with no cluster management. It scales automatically and has a generous free tier. Best choice for getting started quickly or for applications where you want zero infrastructure overhead. Proprietary (not open source).

Weaviate — Open Source, Schema-Rich

Weaviate is open-source and self-hostable (or managed via Weaviate Cloud). It combines vector search with a graph-like object model — you define schemas with relationships between data types. Has built-in embedding modules (you can skip the separate embedding step). Good for complex data models and enterprise RAG applications.

Qdrant — Open Source, High Performance

Qdrant is written in Rust (extremely fast), open-source, and self-hostable. Strong on filtering (vector search + metadata filters applied efficiently together). Supports named vectors (multiple embeddings per record — useful for multimodal). Good choice for teams who want self-hosted control with high performance.

pgvector — Vectors in PostgreSQL

pgvector is a PostgreSQL extension that adds vector storage and search to your existing Postgres database. Available on AWS RDS, Supabase, Neon, and any managed Postgres. If your application already uses Postgres, pgvector is compelling: one database, standard SQL, no new service to manage. Slower than purpose-built vector DBs at very large scale, but sufficient for most applications up to tens of millions of vectors.

Choosing an Embedding Model

The quality of your vector search depends heavily on your embedding model. Common choices:

text-embedding-3-large

OpenAI — 3,072 dims, best for English RAG, ~$0.13/M tokens

voyage-3

Voyage AI — state-of-the-art retrieval quality, used by Anthropic

bge-m3

BAAI — open-source, multilingual, self-hostable, excellent quality

nomic-embed

Nomic AI — open-source, 8K context window, runs locally

Frequently Asked Questions

Do I need a vector database to use RAG?

Not necessarily — for small datasets (under 10K documents), you can store embeddings in memory or in a simple file, compute cosine similarity at query time, and skip the dedicated database. Libraries like FAISS, Chroma, and LanceDB work locally without a server. Only reach for a managed vector database when your dataset is large, you need multi-user access, or you need production reliability guarantees.

How do I chunk documents for embedding?

Document chunking (splitting documents into smaller pieces before embedding) is one of the most impactful decisions in RAG. Common strategies: fixed-size chunks (512 tokens, 128 overlap), sentence-based chunks, paragraph-based chunks, and semantic chunking (split when the topic changes). Best practice: chunk size should match the granularity of questions you expect. For FAQ retrieval, sentence-level chunks work well. For detailed technical docs, paragraph chunks often perform better.

What is the difference between semantic search and vector search?

They're essentially the same thing — "semantic search" is the use case (finding semantically similar content), "vector search" is the mechanism (finding nearest neighbors in embedding space). All modern semantic search systems use vector embeddings under the hood. The terms are used interchangeably in most contexts.

Vector Databases in the Cloud

What is a Vector Embedding?

Why Vectors Capture Meaning

How Vector Databases Work

The ANN Challenge

Hybrid Search

Vector Database Options in the Cloud

Pinecone — Managed, Serverless

Weaviate — Open Source, Schema-Rich

Qdrant — Open Source, High Performance

pgvector — Vectors in PostgreSQL

Choosing an Embedding Model

Frequently Asked Questions

Frequently Asked Questions

What will I learn here?

How should I use this page?

What should I read next?