๐Ÿ‡ Rabbithole โ† All lessons Work with us โ†’
Embeddings ยท Day 29 of 30 01 / 05
Learn by clicking ยท Day 29 of 30 ยท ~4 min read

What are embeddings, and how does search find things by meaning?

Think of a giant map where every idea gets a pin. Ideas that mean similar things sit close together. An embedding is the address of that pin: it turns a piece of text into a list of numbers that captures its meaning. Here is the whole idea in one picture:

A piece of text
"how do I get my money back"
โ†’embedding
model
Embedding model
reads the meaning, not the letters
[0.12, -0.04, 0.88, ...]
โ†’a point on
the map
Lands near similar ideas
sits beside "refund policy", far from "office hours"
๐Ÿ“ Similar meaning, nearby points ๐Ÿ”ข Meaning, stored as numbers ๐Ÿง  The list of numbers is called a vector ๐Ÿ”Ž Finding the nearest points is vector search

Two terms, both unpacked. A vector is just that list of numbers (the pin's coordinates). A vector search takes your question, turns it into its own pin, then finds the pins sitting closest to it. Closest pins mean closest meanings. That is how a computer can match ideas, not just words. You will try it live in a minute.

02 / 05 ยท The myth

"Search only works if you use the exact keywords."

This is the belief embeddings quietly killed. Let us name it plainly, then correct it.

The myth

"If my document does not contain the words I typed, search cannot find it. I have to guess the right keywords."

The correction
That was true for old keyword search (matching the actual letters and words). Meaning-aware search works differently: it compares meanings, so how do I get my money back can find a page titled Refund policy even though they share zero words. The match is on the idea, not the spelling.

๐Ÿ”ค Keyword search

Matches the words themselves.

  • โ€“ Needs a shared word to find anything
  • โ€“ "money back" misses "refund" entirely
  • โ€“ Synonyms and rephrasings slip through
  • โœ“ Great for exact codes, names, IDs

๐Ÿงญ Meaning search (embeddings)

Matches what the words mean.

  • โœ“ Finds the idea even with no shared words
  • โœ“ Handles synonyms and plain-English phrasing
  • โœ“ Ranks results by how close in meaning
  • โ€“ Can over-reach: "close" is not "correct"
The honest version: meaning search does not replace keyword search, it complements it. The best systems use both. Embeddings find the right neighborhood of ideas; you still want exact matching for part numbers and names.
03 / 05 ยท Watch it work

Search by meaning, live.

Below is a tiny knowledge base of 6 phrases. Type a question, or tap an example, and watch the closest meanings light up, even when you share no words with them.

๐Ÿ”’ This runs entirely in your browser. Nothing you type here is sent anywhere.

How this demo cheats (on purpose): a real system asks an embedding model for each vector. To keep this page self-contained and offline, we fake the "meaning" with a small hand-built map of related concepts. The behavior is the same idea: closer meaning, higher score. The exact numbers are illustrative only.

๐Ÿ’ก This is the engine under RAG. See it doing real work over a real knowledge base in the RAG lesson and the quote-bot kit.

04 / 05 ยท Use it well

What to know before you trust meaning search.

Embeddings are powerful, but "close in meaning" is not the same as "right." A few things worth keeping in mind.

  • 1. Nearest is not the same as correct. Vector search returns the closest match, even when nothing in your data truly answers the question. It never says "I have nothing." That is your job to check.
  • 2. It reduces hallucination, it does not erase it. Feeding an AI the nearest passages (that is RAG) keeps it grounded in your real documents, but the AI can still misread or overstate. Always allow for "not found."
  • 3. Garbage in, garbage near. If your knowledge base is thin or out of date, the nearest match is still wrong, just confidently wrong. Good search starts with good content.
  • 4. Chunk size matters. You embed pieces of text, not whole documents. Too big and the meaning blurs; too small and context is lost. This is most of the real tuning work.
  • 5. Embeddings carry their training's blind spots. The model decides what counts as "similar." Test it on your own real questions before you rely on it.
05 / 05 ยท Done

You now understand embeddings better than most people who use them daily.

You know an embedding turns text into a list of numbers that captures meaning, that a vector is that list, that vector search finds the nearest meanings, and why "money back" can find your refund policy. That is the real engine under RAG.

This demo used a hand-built map of 6 phrases. The real power is meaning-aware search over your actual knowledge base, your handbook, past quotes, support history, and docs, so customers and staff find answers by asking in plain English.

Built by rabbithole.consulting: custom-built infrastructure that runs your business. This lesson runs entirely in your browser ยท Free to read.