64. Basic RAG Pipeline
A basic RAG chain is built with LangChain by:
-
splitting a document into chunks and storing them in a vector store
-
creating a similarity-based retriever
-
formatting retrieved chunks into a context string
-
passing context and question through a prompt, LLM, and output parser
The pipeline is modular and uses runnable composition:
-
retrieve relevant documents
-
format them as context
-
insert them into the prompt
-
generate an answer
-
parse the result
The model is instructed to answer only from context and say it does not know if the answer is unavailable.
65. RAG with Sources
This extends basic RAG so the response includes citations or source references.
-
The retriever and vector store stay the same
-
Retrieved documents are formatted with source metadata
-
The prompt asks the model to answer using context and include sources
This improves transparency, trust, and usefulness for Q&A systems and enterprise search.
66. RAG with Fallback
This adds safe handling for out-of-scope questions.
-
The prompt tells the model to answer only from the provided context
-
If no answer exists, it should return a fixed fallback message
This reduces hallucinations and makes the assistant more reliable when the knowledge base does not contain the answer.
67. RAG with Structured Outputs
A structured RAG pipeline returns a validated object instead of plain text.
-
A
RAGResponseschema defines fields such asanswer,confidence,sources_used, andfollow_up -
The LLM is wrapped with structured output support
-
Retrieved documents are formatted with source metadata before prompting
This makes the output predictable and easier to use in applications.
70. Advanced RAG - Multi-Query Retriever
Several advanced retrieval strategies are introduced, especially multi-query retrieval.
-
An LLM rewrites one user question into multiple similar queries
-
Each query searches the vector store
-
Results from all queries are combined
This improves recall and helps find relevant documents even when the original wording is different, though it increases cost and latency.
71. Advanced RAG - Contextual Compression
Contextual compression reduces retrieved content before sending it to the final model.
-
A base retriever fetches documents
-
An LLM-based compressor extracts only relevant excerpts
-
The final context is smaller and cleaner
Benefits include lower token usage, better precision, and faster generation, but at the cost of extra retrieval-time LLM calls.
72. Advanced RAG - Hybrid Search
Hybrid search combines keyword and semantic retrieval.
-
BM25 handles exact keyword matching well
-
Semantic retrieval handles meaning and intent
-
An ensemble retriever merges both using weighted rank fusion
This is more robust than using either method alone and works well across both exact-term and concept-based queries.
73. Advanced RAG - Parent Document Retriever
Parent document retrieval balances retrieval precision and generation context.
-
Documents are split into small child chunks for search
-
The retriever returns the larger parent chunk to the model
This gives accurate matching from small chunks while preserving broader context for the LLM.
74. Combining Multi-Query and Compression
Multi-query retrieval and contextual compression are combined into one RAG pipeline.
-
Multi-query improves recall
-
Compression improves precision
-
A prompt formats the retrieved context and drives the final answer
The key lesson is that you can mix RAG strategies selectively based on the use case.
76. Conversation Memory - Basics
Conversation memory is implemented with session-based message history.
-
A chat model is wrapped with a prompt that includes a
MessagesPlaceholder -
RunnableWithMessageHistoryautomatically loads and stores messages -
InMemoryChatMessageHistorykeeps session-specific conversation state
This allows the assistant to remember earlier messages such as a user’s name or current topic.
77. Multiple Sessions Memory
Multiple users can share one model while keeping separate conversation histories.
-
Each session ID maps to its own memory object
-
A helper creates or retrieves the correct history
-
The prompt uses the corresponding history for each session
This prevents conversation mixing and keeps each user’s memory isolated.
78. Message Trimming
Message trimming shortens conversation history to fit context limits.
-
A token limit is applied
-
Strategies like
lastkeep the most recent messages -
System messages can be preserved
-
Partial messages may be disallowed
This controls token usage and prevents long histories from overflowing the model context window.
79. Windowed Memory
Sliding window memory keeps only the last K exchanges.
-
A custom history class removes older messages once the limit is exceeded
-
Each exchange consists of one human and one AI message
This gives fixed-size memory and predictable cost, but older context is lost.
80. Summary Memory
Summary memory compresses old conversation history into a running summary.
-
Recent messages are kept verbatim
-
Older messages are summarized
-
The summary is updated over time
This preserves important long-term context while keeping token usage bounded.
81. Persistent Memory
Persistent memory stores chat history in SQLite.
-
RunnableWithMessageHistoryis combined withSQLChatMessageHistory -
Each session is stored under a session ID
-
Memory survives restarts
This enables durable chatbot memory and can be verified by inspecting the database directly.
83. AI Research Assistant - Indexing Documents
An AI Research Assistant project begins with document ingestion and indexing.
-
Documents are split with
RecursiveCharacterTextSplitter -
Chunks are embedded with
OpenAIEmbeddings -
A persistent
Chromavector store stores the indexed content
The assistant can ingest text, track sources, count documents, and persist data, but it does not yet answer questions at this stage.
84. AI Research Assistant - Prompt and Output Parser
The assistant is extended into a basic RAG Q&A chain.
-
A retriever fetches the top relevant chunks
-
Retrieved chunks are formatted into context
-
A prompt, LLM, and
StrOutputParserproduce an answer
This works for factual questions, but follow-up questions reveal the lack of memory.
85. AI Research Assistant - Adding Memory
Session-based memory is added to the assistant.
-
A session store keeps message history per session
-
MessagesPlaceholderinserts history into the prompt -
The assistant saves both human and AI messages after each turn
-
Utility methods support clearing and displaying session history
This makes follow-up questions work while keeping sessions separate.
86. AI Research Assistant - Multi-Query Retrieval
The assistant’s retriever is upgraded with multi-query retrieval.
-
A basic retriever can miss relevant chunks if wording differs
-
Multi-query retrieval generates multiple semantically related queries
-
Retrieval becomes broader and more flexible
This improves recall and usually yields better context for answering.
87. AI Research Assistant - Structured Output
The final improvement changes the assistant from plain-text answers to structured responses.
-
A
ResearchResponseschema defines fields likeanswer,confidence,sources,key_quotes, andfollow_up_questions -
The LLM is wrapped with
with_structured_output -
The
ask_structuredfunction returns a validated object
This makes the assistant easier to integrate with downstream code and completes a more advanced RAG system with retrieval, memory, and structured answers.