5. RAG and Memory - A Comprehensive Dive

64. Hands-on ~ Basic RAG Pipeline

The passage explains how to build a simple RAG pipeline with LangChain.

Main points

Imports and setup
- Uses common LangChain components like OpenAIEmbeddings, PromptTemplate, RunnablePassthrough, RunnableParallel, and related utilities.
- An embeddings model is already initialized for retrieval.

1) Create the knowledge base

Defines a create_kb function.
Splits a manually created Document using RecursiveCharacterTextSplitter with:
- chunk_size = 5500
- chunk_overlap = 50
The document includes metadata such as source="blank_chain.md".
Uses split_documents() because the input is a Document object.
Creates a vector store from the chunks using vectorstore_from_documents(…), the embeddings model, and a persistence directory like ./temp.
Returns the vector store.

2) Create a basic RAG system

Calls create_kb() to get the vector store.
Builds a retriever with:
- search_type="similarity"
- k=2
Initializes a chat model with temperature=0.2.
Creates a prompt that instructs the model to answer only from the given context and say “I don’t know” if unsure.

3) Format retrieved documents

Defines a helper like format_docs(docs) to join retrieved chunks into a single string for the prompt.

4) Build the RAG chain

Creates a chain with two inputs:
- context: retrieved docs passed through format_docs
- question: passed through unchanged with RunnablePassthrough()
Pipes the inputs through:
1. prompt template
2. LLM
3. StringOutputParser
Produces a final string answer.

5) Test the chain

Invokes the chain with sample questions like:
- “What is LangChain?”
- “Who created LangChain?”
- “What is LangGraph used for?”
Prints the responses.

Why it works

It follows the standard RAG flow:

Retrieve relevant documents
Format them as context
Add them to the prompt
Generate an answer with the LLM
Parse the output cleanly

The key benefit is that LangChain’s runnable and pipe syntax makes the entire pipeline modular, readable, and easy to compose.

65. Hands-on ~ RAG with Resources

The passage explains how to extend basic RAG into RAG with sources.

Key ideas

The core setup stays the same: knowledge base/vector store, retriever, and LLM are reused.
The main change is that the system now returns sources or citations along with the answer.

Main steps

Create a prompt that tells the model to:
- answer using the provided context
- include the sources used
Format retrieved documents with source info using a helper like format_docs_with_sources.
Build the chain in the same RAG flow:
- retriever → prompt → LLM → output parser
Ask a question, and the response includes:
- the answer
- the documents or sources it came from

Why it matters

This is useful for:

Q&A bots
enterprise search
trustworthy AI systems

Because users can:

verify answers
trace them back to original documents
trust the results more easily

Bottom line

RAG with sources works like basic RAG, but adds source formatting, citations, and a prompt that asks for references, making the output more transparent and practical.

66. Hands-on ~ RAG with Fallback

The passage explains how to add a fallback mechanism to a RAG pipeline so it can handle out-of-scope questions safely.

Key points

The pipeline uses:
- a vector store
- a retriever
- a prompt
The prompt instructs the model to:
- answer only using the provided context
- reply with: “I don’t have information about that in my knowledge base.” if the answer is not present
The chain works by:
1. retrieving relevant documents
2. formatting and inserting them into the prompt
3. sending the prompt to the LLM
4. parsing the output as text

Testing

Questions covered by the knowledge base produce normal answers.
Questions outside the knowledge base trigger the fallback response.

Why it matters

It reduces hallucinations by preventing the model from guessing.
It makes the system more honest, reliable, and useful in real-world situations where users may ask unsupported questions.

Overall result

This approach makes the RAG pipeline more robust by keeping answers grounded in the available context and gracefully handling unknown queries.

67. Hands-on ~ RAG with Structured Outputs

demo_structured_rag() demonstrates a small structured RAG pipeline.

It creates a knowledge base and turns it into a retriever that fetches the top 3 relevant documents.
It defines a RAGResponse Pydantic schema with fields for:
- answer
- confidence
- sources_used
- follow_up
It wraps the LLM with with_structured_output(RAGResponse) so the model returns a validated structured object instead of free-form text.
It builds a prompt that includes retrieved context and the user question.
A helper formats retrieved docs by combining each document’s source metadata and content into one context string.
The pipeline uses runnable composition:
- question → retriever → formatted context
- question → passthrough
- both go into the prompt
- prompt goes to the structured LLM
It invokes the chain with "What is LangGraph?" and prints the structured fields from the result.

Key ideas:

RAG grounds answers in retrieved documents.
Structured output makes responses predictable and easier to use programmatically.
The | operator composes retrieval, formatting, prompting, and generation into one chain.

It also notes that confidence is only described as "high, medium, or low" but not strictly enforced; using an Enum would add validation.

70. Hands-on ~ Advanced RAG - Multi-Query Retriever

The section introduces several advanced RAG retrieval patterns:

multi-query retrieval
self-query retrieval
contextual compression
hybrid search

It explains that the examples use some langchain_community imports because LangChain has been reorganizing its packages, and this compatibility layer is still useful for learning, even though some parts may be deprecated later. Some of these retrievers are also moving into LangGraph.

New components introduced include:

MultiQueryRetriever
ContextualCompressionRetriever
LLMChainExtractor
EnsembleRetriever
BM25Retriever
ParentDocumentRetriever

Logging is enabled so the generated sub-queries from multi-query retrieval can be inspected during execution.

The demo builds a small technical knowledge base, creates a Chroma vector store with embeddings like text-embedding-3-small, and uses that as the foundation for retrieval experiments.

The main example covered is Multi-Query Retriever:

It uses an LLM to rewrite a single user query into multiple alternative phrasings.
These different versions help surface documents that might not match the original wording exactly.
For example, “What tools can I use to build AI applications?” might be expanded into several related queries about AI app development tools, platforms, or software.

When run, the retriever generates these alternate queries, searches the vector store for each one, and returns a broader set of relevant documents. This improves recall but requires more computation and may increase cost.

In the example, the retrieved results included documents related to AI tools, AI platforms, and databases/infrastructure, showing how multi-query retrieval can expand coverage beyond a single query.

71. Hands-on ~ Advanced RAG - Contextual Compression

Contextual compression is a retrieval technique that uses an LLM to extract only the most relevant parts of retrieved documents before passing them to the final model.

How it works

Set up the vector store, retriever, and LLM.
Create an LLM chain extractor to act as the compressor.
Wrap the base retriever with a Contextual Compression Retriever using:
- the compressor
- the base retriever
Run the query and compare:
- Without compression: full document chunks are returned.
- With compression: only relevant excerpts are returned.

What it shows

In simple cases, compression may not seem dramatic because the documents are already focused.
In more complex documents, the reduction is much clearer:
- full chunks may be around 1500–1700 characters
- compressed results may shrink to around 214 characters
The output keeps only the information needed to answer the question, such as framework names like LangChain and LangGraph.

Benefits

Lower token usage
Better answer quality due to less noise
Faster processing for large contexts

Trade-off

It adds extra LLM calls during retrieval, which increases latency and cost.

Overall

Contextual compression is useful when documents are long, noisy, or expensive to send to the model. It improves precision and efficiency, but at the cost of extra retrieval-time computation.

72. Hands-on ~ Advanced RAG - Hybrid Search

This walkthrough explains how to build a hybrid search system that combines BM25 keyword search and semantic search using a tech docs dataset.

Main steps

BM25 retriever
- Built from the documents with from_documents
- Configured with k=3 to return the top 3 keyword matches
Semantic retriever
- Uses the existing semantic retriever setup
- Also set to k=3
Ensemble retriever
- Combines BM25 and semantic retrievers using rank fusion
- Example weighting: 40% BM25, 60% semantic
- Weights should be tuned based on the kinds of queries users ask
Testing queries
- Keyword-heavy queries like Postgres, SQL, and pgvector work well with BM25
- More meaning-based queries benefit from semantic search
BM25 installation issue
- The rank-bm25 package was missing
- After installing it, the hybrid retriever worked correctly

Results and takeaways

For “What is Postgres?”, the ensemble gives the best combined result.
For “What database stores vectors?”, both retrievers identify relevant vector database content.
For “asset transactions”, BM25 succeeds where semantic search drifts off-topic.
For “How do I store AI model outputs for later retrieval?” and “fast similarity lookup embeddings”, both retrievers contribute useful signals.

Why it matters

BM25 is strong for exact keyword matching
Semantic search is strong for intent and meaning
Ensemble retrieval combines both to improve accuracy and robustness

The key lesson is that hybrid search handles both exact terms and conceptual similarity, making it more reliable than using either approach alone.

73. Hands-on ~ Advanced RAG - Parent Document Retriever

The document explains how to build a parent document retriever, which combines small chunks for retrieval accuracy with large chunks for better context.

Main idea

Split documents into:
- Parent chunks: larger pieces of about 800 characters
- Child chunks: smaller pieces of about 200 characters with overlap
Search is done over the small child chunks
The system returns the full parent chunk to the LLM

Setup

Use an in-memory vector store for embeddings
Use an in-memory document store for parent documents
Name the collection something like parent-child-demo
Build the retriever with:
- vector store
- document store
- child splitter
- parent splitter

How it works

Add documents
Query something like “What is LangGraph used for?”
Compare:
- regular retrieval: returns a small, focused chunk
- parent document retrieval: returns a larger chunk with more context

Why it helps

Small chunks
- better retrieval precision
- more focused embeddings
Large chunks
- better context for generation
- less fragmentation

Key benefit

This approach gives the best of both worlds:

accurate search from small chunks
complete context from large chunks

Summary

A parent document retriever is a two-stage retrieval system:

first, find the most relevant small child chunk
then, return its corresponding larger parent document

It is especially useful for larger documents where both precision and context matter.

74. Hands-on ~ Advanced RAG - Combining Multi-Query and Compression Strategies

The passage describes how to combine advanced RAG techniques into one retrieval chain:

Start with a vector store and an LLM.
Add multi-query retrieval to improve recall by generating query variations.
Add contextual compression to improve precision by filtering retrieved results for relevance.
Define a RAG prompt, format retrieved documents, and build the final chain.
Test the chain with example questions.

Key takeaways:

Multi-query retrieval helps find more relevant documents.
Contextual compression helps keep only the most useful context.
You do not need to use every RAG strategy at once; choose what fits your use case.

The example setup uses:

ChromaDB for vector storage
OpenAI for embeddings and completions
Multi-query retrieval
Contextual compression
An LLM to generate the final answer

Overall, it presents a clean blueprint for a more advanced, effective RAG pipeline.

76. Hands-on ~ Conversation Memory - Basics

init_chat_model: https://reference.langchain.com/python/langchain/chat_models/base/init_chat_model
ChatPromptTemplate: https://reference.langchain.com/python/langchain-core/prompts/chat/ChatPromptTemplate
StrOutputParser: https://reference.langchain.com/python/langchain-core/output_parsers/string/StrOutputParser
MessagesPlaceholder: https://reference.langchain.com/python/langchain-core/prompts/chat/MessagesPlaceholder
InMemoryChatMessageHistory: https://reference.langchain.com/python/langchain-core/chat_history/InMemoryChatMessageHistory
get_session_history: https://reference.langchain.com/python/langchain-core/runnables/history/RunnableWithMessageHistory/get_session_history
RunnableWithMessageHistory: https://reference.langchain.com/python/langchain-core/runnables/history/RunnableWithMessageHistory

This document explains how to build conversation_memory.py to demonstrate conversational memory in LangChain.

Main idea

The chat model remembers earlier parts of a conversation by storing messages in session-based history and reusing them in later turns.

Key components

Chat model setup using init_chat_model or ChatOpenAI
Prompt template with:
- a system message
- a human input
- a MessagesPlaceholder for chat history
Message history storage with:
- InMemoryChatMessageHistory
- a dictionary keyed by session_id
RunnableWithMessageHistory to automatically load and save messages for each session
StrOutputParser to format model output

`basic_memory()` workflow

Initialize the chat model.
Build a prompt that includes history.
Chain the prompt, model, and parser.
Create an in-memory store for session histories.
Define get_session_history() to retrieve or create history for a session.
Wrap the chain with RunnableWithMessageHistory.
Use a fixed session_id to simulate one conversation.
Send several user messages through the chain.
Print the stored history to inspect saved human and AI messages.

Result

The model can answer follow-up questions using earlier context, such as:

remembering the user’s name
remembering what the user is learning

Conclusion

This is a simple example of session-based conversational memory in LangChain using modern runnable utilities and message history placeholders.

77. Hands-on ~ Multiple Sessions Memory

This describes how to support multiple independent chat sessions with one shared LLM by giving each user their own memory.

Main idea

Use one shared LLM
Build a prompt that accepts:
- history for prior messages
- current input
Create a chain from the prompt and LLM
Store conversation histories in a dictionary
Add a helper function that gets or creates a session’s history
Wrap the chain so:
- message maps to the current user input
- history maps to that user’s stored conversation history

How it works

If a session ID is new, a history object is created and saved.
That means each user gets separate memory instead of sharing one global chat history.

Example

User A says: “My favorite language is Python.”
User B says: “I love JavaScript.”

When they later ask:

“What is my favorite language?”

the system uses the correct session ID to load the right history:

User A → Python
User B → JavaScript

Why it matters

This setup lets the model:

remember past messages
keep conversations separate by user
answer based on each user’s own history

Summary: each session has its own memory, so multiple users can talk to the same model without mixing their conversations.

78. Hands-on ~ Message Trimming

Trim messages: https://docs.langchain.com/oss/python/langgraph/add-memory#trim-messages
trim_messages: https://reference.langchain.com/python/langchain-core/messages/utils/trim_messages

The passage explains message trimming, which is the process of shortening a conversation history so it fits within a model’s context window and token limits.

Key points:

It uses a simulated long chat made of SystemMessage, HumanMessage, and AIMessage.
Trimming is done with a token limit and a strategy such as "last" or "first".
In the example, the last strategy is used, so the most recent messages are kept.
include_system=True means system messages are preserved.
allow_partial=False means messages are only kept if they fit completely.

Why it matters:

Reduces token usage
Keeps conversations within context limits
Avoids sending unnecessary history
Helps manage long-term memory efficiently

Example outcome:

The original chat had 8 messages
After trimming with a small token limit, it may shrink to only 2 messages

Overall, message trimming is a practical way to control how much conversation history is retained in AI applications.

79. Hands-on ~ Windowed Memory

InMemoryChatMessageHistory: https://reference.langchain.com/python/langchain-core/chat_history/InMemoryChatMessageHistory

The passage explains sliding window memory for LLM conversations:

LLM costs grow because each new request may include the full chat history.
To control this, sliding window memory keeps only the last K exchanges and discards older messages.
In the demo, a custom WindowChatHistory class extends LangChain’s InMemoryChatMessageHistory.
It overrides add_messages to check whether the number of messages exceeds K * 2:
- 1 exchange = 1 human + 1 AI message
- So K exchanges = K * 2 messages
If the limit is exceeded, it slices the list to keep only the newest messages:
- self.messages = self.messages[-(K * 2):]

The demo conversation shows the memory shrinking as new messages arrive. With K = 2, only the last two exchanges remain, so the model remembers recent facts like:

“I work as an engineer.”
“I have two cats.”

It forgets earlier ones like:

“My name is Paulo.”
“I live in Seattle.”

Main takeaway

Sliding window memory provides fixed-size, predictable conversation memory, lowering cost and avoiding context-window overflow, but it loses older context.

80. Hands-on ~ Summary Memory

Summary memory keeps a conversation manageable by compressing older messages into a running summary instead of deleting them. It uses:

a summary LLM to maintain a stable, deterministic summary,
a chat LLM for the live conversation,
a prompt built from running summary + recent message buffer + current user input.

How it works

The model responds using the summary, recent messages, and new input.
The new exchange is added to a recent-message buffer.
When the buffer gets too large, the oldest messages are summarized.
That summary is merged into the running summary.

Why it’s useful

Recent context stays exact
Older context is preserved in compressed form
Token usage remains bounded, preventing context overflow

Key idea

It’s a hybrid memory strategy:

old info → summarized
new info → kept verbatim

This is especially useful for chatbots, RAG systems, and other long-running AI interactions.

81. Exercise and Solution ~ Persistent Memory

The passage explains how to build a chatbot with persistent memory using LangChain and SQLite.

Main points

Use RunnableWithMessageHistory and SQLChatMessageHistory to store conversation history in a local SQLite database.
Each chat session is identified by a session ID, so messages are saved and retrieved per user/session.
The chatbot can remember preferences across restarts, such as:
- “I prefer dark mode themes.”
- “What theme do I prefer?”
To make this work, you:
1. Import the needed LangChain chat history tools.
2. Set a SQLite .db file path.
3. Create a function that returns a SQLChatMessageHistory for a given session.
4. Build a prompt with a system message, history, and user input.
5. Wrap the chain with RunnableWithMessageHistory.
6. Pass a config containing the session_id.
7. Test that the bot remembers past messages.

Persistence verification

To confirm memory is truly saved, you can:
- run a conversation,
- restart the chain,
- reuse the same SQLite database,
- and ask about earlier information.
You can also inspect the SQLite database directly to see stored human and AI messages.

Summarization idea

After about 10 messages, the conversation can be summarized automatically.
The summary can be stored as memory, while older raw messages may be pruned if desired.

Overall goal

The result is a chatbot that:

remembers preferences,
persists across restarts,
stores memory locally,
and can later be extended with automatic summarization.

83. Project ~ AI Research Assistant - Indexing Documents (Part 1)

The document outlines the setup of an AIResearchAssistant for a RAG pipeline using Chroma, OpenAIEmbeddings, and RecursiveCharacterTextSplitter.

Main points

Introduces structured output models:
- ResearchResponse: includes answer, confidence, sources, and key_quotes
- follow_up_questions: for generating follow-up prompts
Builds an AIResearchAssistant class that bundles the three core RAG components:
1. Embedding model (OpenAIEmbeddings, text-embedding-3-small)
2. Text splitter (RecursiveCharacterTextSplitter)
3. Vector store (Chroma with persistent storage)
The constructor sets defaults like:
- persistent_directory="research_db"
- chunk_size=1000
- chunk_overlap=200
Adds document ingestion methods:
- add_documents to split, tag, timestamp, and store documents
- add_text and add_texts as convenience wrappers for raw text
Includes inspection utilities:
- get_document_count
- list_sources
Confirms persistence and indexing through tests and cleanup steps

Overall takeaway

The assistant is now able to ingest, chunk, index, and persist documents, but it does not yet support retrieval or question answering. The next step is to add those capabilities so it can respond to user queries.

84. Project ~ AI Research Assistant - LLM Prompt and Output Parser (Part 2)

The passage explains how to turn a basic document retriever into a simple RAG-style Q&A chain using three main parts: an LLM, a prompt, and an output parser.

Main steps covered

Add a ChatOpenAI model to the assistant.
Build a retriever that uses similarity search and returns the top 4 relevant chunks.
Test the retriever to confirm it returns relevant document fragments.
Add a function to format retrieved documents into plain-text context for the LLM.
Create an ask method that:
1. retrieves documents,
2. formats them,
3. builds a prompt with system and human instructions,
4. runs a chain like prompt | llm | StrOutputParser(),
5. returns the generated answer.

Testing and behavior

The assistant is tested with three questions:

A factual question about RAG
A question needing information from multiple sources
A follow-up question

Key result

The system works, but it has a major weakness: no memory.
Because of that, follow-up questions can be misinterpreted or hallucinated instead of being answered correctly. This shows why grounding helps, but also why conversational memory will be needed next.

85. Project ~ AI Research Assistant - Adding Memory (Part 3)

The passage explains how to add session-based memory to an AI Research Assistant so each user session keeps its own conversation history.

Key points:

Add self.session_store as an in-memory dictionary to hold per-session chat history.
Create _get_session_history(self, session_id) to return or initialize a session’s message list.
Update the prompt in ask by inserting a MessagesPlaceholder named history between the system and human messages.
Inspecting session history shows it is just a list of stored messages, which only becomes useful when injected into the prompt.
In ask, retrieve the session history first and pass history.messages into the chain, optionally limiting the number of recent messages.
Before returning a response, save both sides of the exchange:
- HumanMessage for the user question
- AIMessage for the assistant reply
Add utility methods:
- clear_session(…) to erase a session’s history
- get_session_history_display(…) to view history in a readable format

Testing shows:

Follow-up questions now work because prior context is available.
Each Q&A pair adds two messages to memory.
Different session IDs have isolated histories, so one user’s conversation does not affect another’s.

Overall, the update gives the chatbot real conversational memory while keeping session histories separate.

86. Project ~ AI Research Assistant - Multi-Query Implementation (Part 4)

MultiQueryRetriever: https://reference.langchain.com/python/langchain-classic/retrievers/multi_query/MultiQueryRetriever

The passage explains how to improve a RAG retriever by adding multi-query retrieval.

Main idea

The current retriever only matches the query using similar words.
This works, but it can miss relevant chunks if the answer uses different terminology.
Multi-query retrieval fixes this by using an LLM to generate several semantically similar queries from the original question.

Basic vs. advanced retriever

Basic retriever: simple similarity search, returns about four documents.
Advanced retriever: multi-query retrieval, which expands the original question into multiple related searches.

Why it helps

Different people may ask the same thing in different ways. Multi-query retrieval improves recall by searching from multiple angles, making it more likely to find useful chunks even when wording differs.

Implementation described

Update _build_retriever in AIResearchAssistant.
Add a use_advanced flag.
If false, use the basic retriever.
If true, use the multi-query retriever.
Update ask to pass this flag through and handle retrieval the same way afterward.

Testing and results

With debugging enabled, the advanced retriever shows generated alternate queries.
It returns more relevant and unique chunks than the basic retriever.
This gives the model richer context and usually improves answer quality.

Conclusion

Multi-query retrieval makes the RAG system smarter and more flexible by retrieving information from several semantically related searches instead of relying on one keyword-based query.

The passage ends by noting the next step: moving from raw string answers to structured output, such as returning confidence, sources, and answer text in a predictable format.

87. Project ~ AI Research Assistant - Structured Output - Final Part

with_structured_output: https://reference.langchain.com/python/langchain-core/language_models/chat_models/BaseChatModel/with_structured_output

The passage explains how to improve an ask function in a RAG system by changing its output from plain text to a structured object.

Key points

The current ask function returns a plain string, which is hard to use programmatically.
A ResearchResponse data model already exists to solve this by structuring outputs with fields like:
- answer
- confidence
- sources
- key_quotes
- follow_up_questions
A new function, ask_structured, is introduced right before ask.
- It takes the same inputs as ask:
  - question
  - session_id
  - use_default
  - use_advanced
- But it returns a ResearchResponse instead of a string.
Inside ask_structured, the LLM is wrapped with with_structured_output(ResearchResponse) so the model returns data in the schema format.
This makes it easy to access individual parts of the response directly in code, such as response.answer or response.sources.

Why this matters

Plain text is fine for display, but structured output is better for downstream processing.
It makes it easier to extract answers, confidence scores, sources, and follow-up questions.

Broader context

The project now includes a full RAG pipeline with:
- a research assistant
- advanced retrieval
- conversation memory
- structured responses
Document ingestion was intentionally left out for simplicity, but should be implemented separately in a real project:
- upload file
- extract content
- store it in the database

Conclusion

The lesson shows how to use LangChain to build a more powerful RAG system, and it sets up the next topic: LangGraph, stateful agents, and how LangGraph and LangChain work together.

5. RAG and Memory - A Comprehensive Dive

64. Hands-on ~ Basic RAG Pipeline

Main points

1) Create the knowledge base

2) Create a basic RAG system

3) Format retrieved documents

4) Build the RAG chain

5) Test the chain

Why it works

65. Hands-on ~ RAG with Resources

Key ideas

Main steps

Why it matters

Bottom line

66. Hands-on ~ RAG with Fallback

Key points

Testing

Why it matters

Overall result

67. Hands-on ~ RAG with Structured Outputs

70. Hands-on ~ Advanced RAG - Multi-Query Retriever

71. Hands-on ~ Advanced RAG - Contextual Compression

How it works

What it shows

Benefits

Trade-off

Overall

72. Hands-on ~ Advanced RAG - Hybrid Search

Main steps

Results and takeaways

Why it matters

73. Hands-on ~ Advanced RAG - Parent Document Retriever

Main idea

Setup

How it works

Why it helps

Key benefit

Summary

74. Hands-on ~ Advanced RAG - Combining Multi-Query and Compression Strategies

76. Hands-on ~ Conversation Memory - Basics

Main idea

Key components

basic_memory() workflow

Result

Conclusion

77. Hands-on ~ Multiple Sessions Memory

Main idea

How it works

Example

Why it matters

78. Hands-on ~ Message Trimming

79. Hands-on ~ Windowed Memory

Main takeaway

80. Hands-on ~ Summary Memory

How it works

Why it’s useful

Key idea

81. Exercise and Solution ~ Persistent Memory

Main points

Persistence verification

Summarization idea

Overall goal

83. Project ~ AI Research Assistant - Indexing Documents (Part 1)

Main points

Overall takeaway

84. Project ~ AI Research Assistant - LLM Prompt and Output Parser (Part 2)

Main steps covered

Testing and behavior

Key result

85. Project ~ AI Research Assistant - Adding Memory (Part 3)

86. Project ~ AI Research Assistant - Multi-Query Implementation (Part 4)

Main idea

Basic vs. advanced retriever

Why it helps

Implementation described

Testing and results

Conclusion

87. Project ~ AI Research Assistant - Structured Output - Final Part

Key points

Why this matters

`basic_memory()` workflow