Learn by clicking · Day 7 of 30 · ~2 min read

What is "RAG", and how does AI read your own documents?

RAG (Retrieval-Augmented Generation) is a simple trick. Before the AI answers, it first looks up the most relevant snippets from your documents and slips them into the prompt. The answer is grounded in your data, not just what the model picked up in general training.

The one analogy that makes it click: it's an open-book exam. The model is the smart student, but it has never read your employee handbook. RAG is the moment you hand it the exact right page, open to the right paragraph, just before it answers.

A question

"What's our refund policy?"

→look up
your docs

Retrieve snippets

the 2 to 3 passages closest in meaning

refunds.mdpolicy.pdf

→hand to
the model

Grounded answer

cites the snippets it used

📚 Answers from your documents 🔄 No retraining: just update the files 🔗 Can cite its sources 🧠 Reduces (not eliminates) made-up answers

A few terms you'll meet, each in one line: a large language model (LLM) is the AI that writes the answer; training is the slow, expensive process that taught it general language; the prompt is the text you send it right now. RAG never touches training. It just makes the prompt smarter. Now let's bust the biggest myth about it.

02 / 05 · The myth

"Don't you have to retrain the AI on my data?"

This is the most common misunderstanding, and it scares people off, because retraining sounds slow, expensive, and risky. It usually isn't what you need at all.

⚠ The myth

"To make AI use my documents, I have to fine-tune or retrain the model on them, and once I do, it can't make things up anymore."

✓ The reality

You don't retrain anything. With RAG, your documents stay exactly where they are. At the moment of the question, the system retrieves the right snippets and pastes them into the prompt. Change a price in your handbook this morning, and the AI uses the new price this afternoon: no model update, no waiting, no retraining bill.

And it does not make hallucination impossible. "Hallucination" is when an AI states something false with total confidence. RAG reduces it by handing the model real, relevant text to lean on. But the model can still misread a snippet, blend two together, or answer when the right passage wasn't retrieved. RAG grounds answers; it doesn't make them flawless. We'll come back to that honestly.

The mental model: retraining changes who the model is. RAG changes what the model can see at the instant it answers. For "use my private, changing facts," you almost always want the second one.

03 / 05 · How it works

The whole pipeline, in five steps.

RAG isn't magic. It's a short assembly line. Three stages happen once when you set it up (chunk, embed, store), and two happen every time someone asks (retrieve, generate).

1 · SETUP

Chunk

Split your docs into bite-size passages

→

2 · SETUP

Embed

Turn each passage into numbers that capture its meaning

→

3 · SETUP

Store

Keep them in a searchable index (a vector database)

→

4 · PER ASK

Retrieve

Find the chunks whose meaning is closest to the question

→

5 · PER ASK

Generate

Feed those chunks + the question to the model; it answers, ideally citing them

Two terms unpacked, plainly:

An embedding is a passage's meaning turned into a list of numbers. The clever part: text that means similar things lands on nearby numbers, so "money back" sits close to "refund" even though they share no words. That's how retrieval matches by meaning, not just keywords.

A vector database is just a search index built for those number-lists. Ask it "what's nearest to this question?" and it returns the closest passages, fast, even across thousands of documents.

💡 The cost of pasting text into a prompt is measured in tokens, the AI's unit of text. More retrieved text = more tokens = more cost per question. That's why RAG retrieves only the best few snippets instead of your whole library. See /learn/tokens.

04 / 05 · Watch a retrieval

Watch it look something up.

Below is a tiny pretend company handbook of 7 snippets. Pick a question (or type your own). The demo scores every snippet, lights up the closest 2 to 3, and writes an answer that points at exactly those sources. The retrieval step is the whole aha: watch which snippets light up.

Ask the handbook

📚 The knowledge base · 7 snippets · retrieved ones light up

Pick a question above, then press Retrieve & answer.

🔒 This runs entirely in your browser. Nothing is sent anywhere. No server, no API, no key.

Honest about the fake: a real system scores by meaning using embeddings. To keep this 100% offline, this demo fakes the scoring with simple word-overlap (it counts shared and related words), so you can see the shape of retrieval without any backend. The aha is the same: look up first, then answer from what you found.

05 / 05 · You've got it

You now understand RAG better than most people who use it daily.

You know it's an open-book exam, not retraining. You can name the pipeline: chunk, embed, store, retrieve, generate. And you watched a retrieval choose its sources. Two honest notes before you go.

1 · It reduces hallucination; it doesn't end it. RAG's answers are only as good as what it retrieved. Bad chunking, a thin document, or a near-miss match can still produce a confident wrong answer, and the model can misread a perfectly good snippet. So for anything that matters: show the sources and keep a human in the loop. More on that in /learn/ai-safety.

2 · RAG vs. the alternatives. Pick by what you're trying to change:

📚 RAG

Give the model knowledge at answer-time.

✓ Best for large, private, or fast-changing info
✓ Policies, catalogs, past tickets & jobs
✓ Update the docs, no retraining
✓ Can cite which snippet it used

🎚 Fine-tuning

Change the model's behavior, not its facts.

✓ Best for "always sound like us"
✓ Or "always output this structure"
– Doesn't reliably teach new facts
– Re-do it when things change

📄 Long context

Just paste it all in the prompt.

✓ Simplest; great for one small doc
– Costs tokens every single call
– Doesn't scale to a big library
✓ See /learn/tokens

RAG doing real work: an AI receptionist that answers customer questions from your actual handbook, price list, and past jobs (and cites where each answer came from) is RAG, end to end. That's exactly the kind of thing we build for small businesses, safely.

We connect AI to your real documents (handbooks, catalogs, past jobs) safely →

← Back to all 30 lessons

Day 7 of 30 free, plain-English AI lessons for small business.