Llamaindex Router

Read Markdown documents from the folder, index them using OpenAI embeddings, and provide answers to questions regarding the content.

Useful Links

Name

URL

Building Agentic RAG with Llamaindex

https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/

LlamaIndex

https://docs.llamaindex.ai/en/stable/

Getting Started

https://docs.llamaindex.ai/en/stable/#getting-started

Starter Tutorial (Local Models)

https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/#starter-tutorial-local-models

RAG CLI

https://docs.llamaindex.ai/en/stable/getting_started/starter_tools/rag_cli/

RAG CLI using Local Model

https://github.com/run-llama/llama_index/issues/17013

Python code in this chapter demonstrates the use of the LlamaIndex library to process and query text data, specifically from a directory containing Markdown files. The code sets up a system to summarize documents and retrieve specific information, using two types of query engines: one for summarization and one for context retrieval.

Data Loading:

  • The data_dir variable specifies the directory containing the text documents.

  • The SimpleDirectoryReader class from LlamaIndex is used to load documents from this directory.

Useful Links

Name

URL

SimpleDirectoryReader

https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/#simpledirectoryreader

import os
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("data_dir", help="Directory containing Markdown documents")
args = parser.parse_args()

print("Data dir: " + args.data_dir)

from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(args.data_dir).load_data()

Define LLM and Embedding model

Text Processing:

  • The SentenceSplitter is initialized with a chunk_size of 1024, which means documents will be split into chunks of 1024 characters.

  • get_nodes_from_documents breaks down the documents into manageable nodes (chunks of text).

Useful Links

Name

URL

SentenceSplitter

https://docs.llamaindex.ai/en/stable/api_reference/node_parsers/sentence_splitter/#llama_index.core.node_parser.SentenceSplitter

Using LLMs

https://docs.llamaindex.ai/en/stable/module_guides/models/llms/

from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

Model Setup:

  • The code defines a language model (LLM) and an embedding model using the OpenAI gpt-4o-mini model for language processing and text-embedding-3-small for generating text embeddings.

  • This setup is made using the Settings class from the LlamaIndex library.

Useful Links

Name

URL

Settings

https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/settings/

Ollama

https://docs.llamaindex.ai/en/stable/examples/llm/ollama/

OpenAIEmbedding

https://docs.llamaindex.ai/en/stable/api_reference/embeddings/openai/#llama_index.embeddings.openai.OpenAIEmbedding

OllamaEmbedding

https://docs.llamaindex.ai/en/stable/api_reference/embeddings/ollama/#llama_index.embeddings.ollama.OllamaEmbedding

from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

Index Creation:

Two types of indices are created:

  • SummaryIndex: Used for summarizing documents.

  • VectorStoreIndex: Used for retrieving specific context or information from the documents.

Useful Links

Name

URL

Document Summary Index

https://docs.llamaindex.ai/en/stable/examples/index_structs/doc_summary/DocSummary/

VectorStoreIndex

https://docs.llamaindex.ai/en/stable/examples/vector_stores/SimpleIndexDemoLlama2/

Starter Tutorial (OpenAI)

https://docs.llamaindex.ai/en/stable/getting_started/starter_example/

from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

Query Engine Initialization:

Two query engines are initialized:

  • summary_query_engine: Configured for summarization tasks, using a tree-based summarization response mode and asynchronous processing.

  • vector_query_engine: Configured for retrieving specific contexts from the documents.

summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

Tool Definition:

QueryEngineTool instances are created for each query engine to facilitate query processing:

  • summary_tool: For summarization queries.

  • vector_tool: For context retrieval queries.

from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to the book"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the book."
    ),
)

Router Query Engine:

  • A RouterQueryEngine is set up to handle queries using a selector (LLMSingleSelector) that chooses the appropriate query engine tool based on the query type.

  • verbose=True enables detailed logging of the query processing.

from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

Query Execution:

  • The query_engine is used to execute a query asking for a summary of the document.

  • The response is printed, along with the number of source nodes (chunks of text) involved in generating the response.

response = query_engine.query("What is the summary of the document?")
print(str(response))

print(len(response.source_nodes))

In summary, this code sets up a system to load and process text data from a directory, create indices for summarization and context retrieval, and execute queries using a router query engine that selects the appropriate processing tool based on the query type.