In the Loop: How RAG Fails, and How to Fix it
In this episode of In The Loop, ML Evangelist Micaela Kaplan breaks down seven common failure points in retrieval-augmented generation (RAG) systems—and how to fix them. Learn how to diagnose issues in both the retrieval and generation stages to build more reliable, effective pipelines.
Transcript
Hi, I’m Micaela Kaplan, the ML Evangelist at HumanSignal, and this is In The Loop—the series where we help you stay up to date with all things data science and machine learning.
Last time, we explored what a RAG system is and how it works. Today, we’re covering how RAG systems can fail—and what you can do to fix them.
There are many ways a RAG system can go wrong. In their 2024 paper, Barnett et al. identified seven common failure points, from incomplete knowledge bases to generation issues. These failures fall into two broad categories: retrieval errors and generation errors.
Retrieval errors come from the knowledge base or the way documents are pulled from it. Generation errors come from the language model itself and are often harder to fix.
Let’s start with retrieval errors.
The first and most obvious: the answer isn’t in the knowledge base at all. Barnett et al. call this Missing Content. It’s often tricky to identify—it usually requires a manual review—but fixing it is straightforward: add the missing information.
If the content exists but isn’t retrieved, you may be looking at Missed Top-Ranked Documents. Most systems return the top K ranked documents. If the correct document isn’t in those top K, you may need to retrain your ranker. A human-in-the-loop approach can help ensure relevance.
If the document is in the top K but still not making it into the model's context window, you’re likely hitting Consolidation Strategy Limitations. Review how you consolidate and truncate context, and make sure the answer-relevant documents actually make it into the final prompt.
Now let’s talk about generation errors.
Not Extracted means the answer is in the context, but the model doesn’t surface it. This could be due to noisy or conflicting information. Denoising your context and cleaning up contradictions in your knowledge base can help.
Wrong Format happens when the model outputs the right answer in the wrong structure—like plain text instead of JSON. You can fix this by clearly separating formatting instructions or using structured output tools available in many LLM APIs.
Incorrect Specificity refers to answers that are technically correct but either too general or too specific to be useful. This might be caused by overly vague user questions or rigid expectations from the system designer. Fixes include improving prompt design, guiding users to ask better questions, or rethinking the system’s output expectations.
The final issue is Incomplete Answers. This often happens when you ask a model to return multiple pieces of information—for example, “What are the key points in documents A, B, and C?”—but only get the answer for A. The solution is to break compound queries into separate questions and process them individually.
To sum it up: RAG errors come from either the retrieval or generation stage. Diagnosing them takes careful review, but with the right tools and strategy, most issues can be resolved.
What would you like to see in the next In The Loop? Let me know in the comments. Thanks for watching, and don’t forget to like and subscribe.
Looking for more on debugging RAG systems? Check out our March 13th webinar for a detailed walkthrough.