Retriever methods¶
Once we have a set of chunks we can then use a retriever to select chunks useful to answer a given query.
๐ข VectorIndexRetriever¶
The Vector Index Retriever is the basic retriever. The idea is to use an embedding model to compute a similarity score using cosine similarity between each chunks and the query.
The retrieved chunks are the k chunks with the highest similarity score. Hence, the parameters are: - top k is number of retrieved chunks - similarity score threshold: the threshold between 0 and 1 in which we keep the retrieved document. - embedding: encoder language model for embedding the documents and queries
To find relevant documents, we measure cosine similarity:
Where: - \(A \cdot B\) = dot product, i.e. how aligned the vectors are, 1 means "identical meaning" therefore 0 means "unrelated". - \(|A|\) = length of vector A - \(|B|\) = length of vector B
๐ BM25 Retriever¶
BM25 stands for Best Match 25. The BM25 retriever scores each document based on the terms (words) it shares with the query using the following formula (2):
where: - \(q_i\): The \(i\)-th term in the query \(Q\). - \(n\): The total number of terms in the query \(Q\). - \(TF(q_i, D)\): Term Frequency, i.e., the number of times term \(q_i\) appears in document \(D\). - \(|D|\): The length of the document \(D\) (e.g., the number of words in \(D\)). - \(avgdl\): The average document length across all documents in the corpus. - \(k1\): A tuning parameter that controls the saturation of term frequency, typically set between \(1.2\) and \(2.0\). - \(b\): A tuning parameter that controls the degree of length normalization, typically set around \(0.75\). - \(IDF(q_i)\): Inverse Document Frequency, calculated as:
where - \(N\) is the total number of documents - \(DF(q_i)\) is the number of documents containing the term \(q_i\).
When to Use BM25¶
Good for: - Exact term matching (product codes, model numbers) - Queries with rare, specific keywords - Multilingual scenarios (no pre-trained embeddings needed) - Explainable retrieval (can see why doc matched)
Less good for: - Synonyms ("car" vs "automobile") - Paraphrasing - Semantic similarity
โ Vector vs BM25¶
| Aspect | Vector Search | BM25 |
|---|---|---|
| Type | Semantic | Lexical |
| Synonyms | โ Understands | โ Misses |
| Exact Match | Sometimes misses | โ Perfect |
| Speed | Fast (with indexing) | Very fast |
| Setup | Needs embedding model | No training needed |
| Multilingual | Model-dependent | Works naturally |
Best practice: Use hybrid retrieval (combine both)!
โ ๏ธ
Hybrid Retriever,Cluster Based Retrievershall be implemented in next versions.
๐ Retrieved Context¶
The retrieved context is a list of RetrievedContext. A retrieved context contains:
- a text: chunk of information (e.g a document, a sentence or paragraph.)
- with optionally its relevancy (the score) and its embedding (vector representations of the text)
Below is an example usage of the ContextNode class:
from rago.data_objects import RetrievedContext
RetrievedContext(text ="Thomas will be twelve in 2012.", embedding= [0, 0, 0, 0, 0, 0, 0], score = 0.8)
see Retrieved Context Dataclass for more details
๐ Retriever Wrappers¶
The retriever wrapper is a key component of the RAG pipeline it takes in the query and returns a context it deems relevant to answer the query. The retriever wrapper can be instantiated as follow:
from rago.model.wrapper.retriever.langchain_retriever import LangchainRetrieverWrapper
from rago.model.configs.retriever_config.langchain import LangchainRetrieverConfig
from rago.model.configs.encoder_config.langchain import HuggingFaceLangchainEncoderConfig
encoder_config = HuggingFaceLangchainEncoderConfig(model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
config = LangchainRetrieverConfig(
type = "VectorIndexRetriever",
similarity_function="cosine",
search_type="similarity_score_threshold",
search_kwargs={"k":3, "score_threshold" : 0.0},
encoder = encoder_config,
)
retriever = LangchainRetrieverWrapper.make(config, inputs_chunks=["Thomas is 12."])
print(retriever.get_retriever_output("Hello"))
๐งญ Retrievers Configuration Space in RAGO¶
In order to have a deeper look on the retriever configuration spaces implemented in RAGO, see the following code ๐ ๐ Retriever Configuration Space python methods. All the following parameter values can be adapted as you need, and you can combine or choose one of the methods.
Default Retriever Configuration Space¶
By default the Retriever Configuration Space used in RAGO is based on the VectorIndexRetriever method with these default parameter values:
similarity_function: CategoricalParamSpace = Field(default=CategoricalParamSpace(choices=["cosine"]))
search_type: CategoricalParamSpace = Field(default=CategoricalParamSpace(choices=["similarity_score_threshold"]))
top_k: IntParamSpace = Field(default=IntParamSpace(low=1, high=5))
score_threshold: FloatParamSpace = Field(default=FloatParamSpace(low=0.0, high=0.9))
encoder: HFEncoderConfigSpace = Field(default=HFEncoderConfigSpace())
model_name: CategoricalParamSpace = Field(
default=CategoricalParamSpace(
name="encoder_model_name",
choices=[
"BAAI/bge-m3",
"intfloat/e5-large-v2",
"Qwen/Qwen3-Embedding-0.6B",
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
],
),
)
BM25 Configuration Space¶
Here is the default configuration space fo the BM25 retrievers:
bm25_k1: FloatParamSpace = Field(default=FloatParamSpace(low=1.2, high=2.0))
bm25_b: FloatParamSpace = Field(default=FloatParamSpace(low=0.0, high=1.0))
bm25_similarity: CategoricalParamSpace = Field(default=CategoricalParamSpace(choices=["custom_bm25"]))
๐ Related Documentation¶
- โ๏ธ RAG Configurations
- ๐ค Reader Methods