RAG - Retrieval Augmented Generation Concepts¶

🤖 What is RAG (Retrieval Augmented Generation)?¶

The Basic Idea¶

RAG enhances language models by giving them access to external knowledge.

Without RAG:¶

User: "What are the safety procedures for Model X-500?"
LLM: "I don't have specific information about Model X-500..."

With RAG:¶

User: "What are the safety procedures for Model X-500?"

[System retrieves relevant documentation]

LLM: "According to the safety manual, Model X-500 requires:
1. Lock-out tag-out procedures...
2. Personal protective equipment including...
3. ..."

🏯 RAG Pipeline¶

┌──────────────────┐
│      Query       │
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│   RETRIEVAL      │ ← Search through docs
│  - Vector Search │
│  - BM25, ...     │
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│  Retrieved Docs  │ ← Docs for context
│  [Doc1, Doc2...] │
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│   GENERATION     │ ← LLM uses docs to answer
│   (LLM + Docs)   │
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│      Answer      │
└──────────────────┘

Why RAG?¶

Up-to-date Information: Access current documents without retraining
Domain Specificity: Use your own proprietary knowledge
Transparency: See which documents informed the answer
Reduced Hallucinations: Grounded in actual documents

✅ RAG vs. Fine-tuning¶

Aspect	RAG	Fine-tuning
Data Changes	Update documents immediately	Requires retraining
Cost	Low (storage and some GPU)	High (much GPUs, time)
Transparency	Can cite sources	Black box
Use Case	Dynamic knowledge	Specialized behavior

⚙️ RAG Configurations¶

The RAG configuration consists of: - A Splitter / Node Parser: during pre-processing phase, the splitter splits the document into chunks (Not implemented at this stage). - A Retriever: At inference, the retriever select chunks, create a context to answer a query. - A Reader / Synthesizer: Still in inference, the reader use the selected chunks to answer the query.

⚠️ An infinite number of methods exist, only methods implemented at this stage are described in the following subsections.

Here you have more information about RAG Configuration Space in RAGO 👉 RAG Configuration

Splitter (not implemented yet)¶

⚠️ Sentence Splitter, Recursive Character Text Splitter, Semantic Splitter shall be implemented in next versions.

Retriever¶

Once we have a set of chunks we can then use a retriever to select chunks useful to answer a given query. More information on RAGO usage for the Retriever Configuration Spaces here 👉 Retriever Methods

Reader¶

Classical reader method uses the retrieved chunks, compact it in a single context block and use the LLM to generate the query directly. More complex reader methods proposed by llamaIndex library such as Refine, Compact & Refine, Tree Summarize are implemented in the RAGO library. More information here 👉 Reader Methods

📄 RAG Output in RAGO¶

A RAGOutput as its name hints is the output of RAG to a query. It contains:

The answer of the RAG to the query.
The retrieved_documents by the RAG to answer the query. Below his an example usage of the RAGoutput class:

from rago.data_objects import RAGOutput, RetrievedContext

RAGOutput("Thomas is 12.", [RetrievedContext("Thomas is 12.")])

RAG outputs can be generated manually or using a RAG.