๐Ÿ‡ Rabbithole โ† All lessons Work with us โ†’
RAG ยท Day 8 of 30 01 / 05
Learn by clicking ยท Day 8 of 30 ยท ~2 min read

What is "RAG", and how does AI read your own documents?

RAG (Retrieval-Augmented Generation) is a simple trick. Before the AI answers, it first looks up the most relevant snippets from your documents and slips them into the prompt. The answer is grounded in your data, not just what the model picked up in general training.

The one analogy that makes it click: it's an open-book exam. The model is the smart student, but it has never read your employee handbook. RAG is the moment you hand it the exact right page, open to the right paragraph, just before it answers.

A question
"What's our refund policy?"
โ†’look up
your docs
Retrieve snippets
the 2 to 3 passages closest in meaning
refunds.mdpolicy.pdf
โ†’hand to
the model
Grounded answer
cites the snippets it used
๐Ÿ“š Answers from your documents ๐Ÿ”„ No retraining: just update the files ๐Ÿ”— Can cite its sources ๐Ÿง  Reduces (not eliminates) made-up answers

A few terms you'll meet, each in one line: a large language model (LLM) is the AI that writes the answer; training is the slow, expensive process that taught it general language; the prompt is the text you send it right now. RAG never touches training. It just makes the prompt smarter. Now let's bust the biggest myth about it.

02 / 05 ยท The myth

"Don't you have to retrain the AI on my data?"

This is the most common misunderstanding, and it scares people off, because retraining sounds slow, expensive, and risky. It usually isn't what you need at all.

โš  The myth

"To make AI use my documents, I have to fine-tune or retrain the model on them, and once I do, it can't make things up anymore."

โœ“ The reality

You don't retrain anything. With RAG, your documents stay exactly where they are. At the moment of the question, the system retrieves the right snippets and pastes them into the prompt. Change a price in your handbook this morning, and the AI uses the new price this afternoon: no model update, no waiting, no retraining bill.

And it does not make hallucination impossible. "Hallucination" is when an AI states something false with total confidence. RAG reduces it by handing the model real, relevant text to lean on. But the model can still misread a snippet, blend two together, or answer when the right passage wasn't retrieved. RAG grounds answers; it doesn't make them flawless. We'll come back to that honestly.

The mental model: retraining changes who the model is. RAG changes what the model can see at the instant it answers. For "use my private, changing facts," you almost always want the second one.
03 / 05 ยท How it works

The whole pipeline, in five steps.

RAG isn't magic. It's a short assembly line. Three stages happen once when you set it up (chunk, embed, store), and two happen every time someone asks (retrieve, generate).

1 ยท SETUP
Chunk
Split your docs into bite-size passages
โ†’
2 ยท SETUP
Embed
Turn each passage into numbers that capture its meaning
โ†’
3 ยท SETUP
Store
Keep them in a searchable index (a vector database)
โ†’
4 ยท PER ASK
Retrieve
Find the chunks whose meaning is closest to the question
โ†’
5 ยท PER ASK
Generate
Feed those chunks + the question to the model; it answers, ideally citing them

Two terms unpacked, plainly:

An embedding is a passage's meaning turned into a list of numbers. The clever part: text that means similar things lands on nearby numbers, so "money back" sits close to "refund" even though they share no words. That's how retrieval matches by meaning, not just keywords.

A vector database is just a search index built for those number-lists. Ask it "what's nearest to this question?" and it returns the closest passages, fast, even across thousands of documents.

๐Ÿ’ก The cost of pasting text into a prompt is measured in tokens, the AI's unit of text. More retrieved text = more tokens = more cost per question. That's why RAG retrieves only the best few snippets instead of your whole library. See /learn/tokens.

04 / 05 ยท Watch a retrieval

Watch it look something up.

Below is a tiny pretend company handbook of 7 snippets. Pick a question (or type your own). The demo scores every snippet, lights up the closest 2 to 3, and writes an answer that points at exactly those sources. The retrieval step is the whole aha: watch which snippets light up.

๐Ÿ“š The knowledge base ยท 7 snippets ยท retrieved ones light up
Pick a question above, then press Retrieve & answer.
๐Ÿ”’ This runs entirely in your browser. Nothing is sent anywhere. No server, no API, no key.

Honest about the fake: a real system scores by meaning using embeddings. To keep this 100% offline, this demo fakes the scoring with simple word-overlap (it counts shared and related words), so you can see the shape of retrieval without any backend. The aha is the same: look up first, then answer from what you found.

05 / 05 ยท You've got it

You now understand RAG better than most people who use it daily.

You know it's an open-book exam, not retraining. You can name the pipeline: chunk, embed, store, retrieve, generate. And you watched a retrieval choose its sources. Two honest notes before you go.

1 ยท It reduces hallucination; it doesn't end it. RAG's answers are only as good as what it retrieved. Bad chunking, a thin document, or a near-miss match can still produce a confident wrong answer, and the model can misread a perfectly good snippet. So for anything that matters: show the sources and keep a human in the loop. More on that in /learn/ai-safety.

2 ยท RAG vs. the alternatives. Pick by what you're trying to change:

๐Ÿ“š RAG

Give the model knowledge at answer-time.

  • โœ“ Best for large, private, or fast-changing info
  • โœ“ Policies, catalogs, past tickets & jobs
  • โœ“ Update the docs, no retraining
  • โœ“ Can cite which snippet it used

๐ŸŽš Fine-tuning

Change the model's behavior, not its facts.

  • โœ“ Best for "always sound like us"
  • โœ“ Or "always output this structure"
  • โ€“ Doesn't reliably teach new facts
  • โ€“ Re-do it when things change

๐Ÿ“„ Long context

Just paste it all in the prompt.

  • โœ“ Simplest; great for one small doc
  • โ€“ Costs tokens every single call
  • โ€“ Doesn't scale to a big library
  • โœ“ See /learn/tokens

RAG doing real work: an AI receptionist that answers customer questions from your actual handbook, price list, and past jobs (and cites where each answer came from) is RAG, end to end. That's exactly the kind of thing we build for small businesses, safely.

Built by rabbithole.consulting: custom-built infrastructure that runs your business. This lesson's demo runs entirely in your browser ยท Free to read.