Why your chatbot makes things up (and what RAG has to do with it)
A chatbot doesn't lie — it generates the most likely text, not the truth. Plain-language take on hallucinations and how RAG curbs them by attaching sources before the model answers.
Anyone who has spent five minutes with a chatbot has seen the same thing: a confident, fluent answer — and complete nonsense. An invented procedure, a law section that doesn’t exist, a book nobody ever wrote. This isn’t a bug they’ll “fix in the next version.” It comes straight from how the model actually works — which is exactly why it can be curbed.
Where the made-up stuff comes from
A language model doesn’t carry a fact database inside that it checks against. It does one thing: it predicts the most likely next word, then the next, then the next. It learned this from an enormous amount of text, so those predictions usually sound reasonable. But “sounds reasonable” is not the same as “is true.”
Hence the technical name: a hallucination. The model isn’t lying in any human sense — it doesn’t even know it’s making things up. It simply generated text that statistically fits, and it happened that the content was false.
You see it most clearly with specifics. Ask for an exact section number, a date, or a quote — something you can’t “guess from the style.” The model will still give you something, because its job is to finish the sentence, not to say “I don’t know.” And that “something” looks exactly as confident as a real answer.
A concrete example: you ask for the deadline to file some return. The correct date and a made-up date sound identical — both are just “a day of the month.” The model has no signal that stops it on the wrong one; it simply picks whichever fits the rest of the sentence statistically. So you can’t “tell from the tone” whether the answer is true. The model’s confidence says nothing about its accuracy — and that’s the crux of the problem when a client’s decision or a contract clause is at stake.
What RAG does
RAG (retrieval-augmented generation) is a simple idea for working around this: before you ask the model to answer, first find the relevant source passages — in your documents, knowledge base, regulations — and attach them to the question. Then you tell it: answer only based on what you were given.
That’s called grounding — anchoring the answer in real text instead of in the model’s memory.
An analogy: it’s the difference between a closed-book exam and an open-book one. From memory, a student will say anything as long as it sounds confident. With the book open, they have a specific passage in front of them and answer from it. RAG flips a chatbot from the first mode into the second.
This has two consequences that matter in practice. First, the model answers primarily from your documents, rather than from general internet knowledge — so it can know your own procedures, the ones that exist nowhere else. Second, since the answer comes from a specific passage, you can show which one — which means you can check it instead of taking it on faith.
Why it isn’t magic
One caveat — this is no magic wand. RAG shifts the problem, it doesn’t erase it.
Because the answer is exactly as good as the passages it found. If the system hands the model the wrong passage — or finds nothing useful — the model will still make things up, just now “with an open but wrong book.” So the whole weight moves onto retrieval: how you actually search those documents.
And you can search by words (literal matching) or by meaning (paraphrase, synonym). Each one alone has holes: word search misses a question phrased differently from the document, and meaning search can overlook a rare, literal term — a number, a symbol, a field name. That’s why a good RAG doesn’t pick one method — it combines them, and measures on real questions whether it actually finds the right thing.
In other words: RAG doesn’t magic hallucinations away. What it gives you are two levers a bare chatbot doesn’t have — grounding in a source and the ability to verify — provided you get the search right.
Next
If you want to see what this looks like in practice — with answers that point to the exact file and line of the source, and an honest result where a single method fails — I wrote up the whole build step by step: how I built a RAG with file:line citations.
Tags
Related articles.
A document agent that decides what to search and read next (ReAct)
Plain RAG retrieves once and answers. agent-flow runs a think→tool→observe loop: it decides for itself what to search and read next, and every claim in the report carries a file:line citation. Plus a human-approval gate.
BM25 vs Embeddings: a tiny benchmark showing semantic search is brittle
Embeddings understand meaning, so they always beat keyword search — right? I built a small benchmark that proves otherwise, and shows exactly when semantic search falls apart.
I built a document search that doesn't hallucinate — here's how
A RAG that shows the file and line it pulled an answer from, instead of making things up. How the BM25 + embeddings + RRF hybrid works, and why a single method fails.