Memory and LangChain

Interacting with language models can be challenging due to their stateless nature, which means they don’t remember previous conversations.

This could be an issue when building some applications, like a chatbot, where you want to have a consistent conversation flow. In this context, we introduce memory management in LangChain.

LangChain provides several types of memory to maintain the conversation context:

ConversationBufferMemory
ConversationBufferWindowMemory
ConversationTokenBufferMemory
ConversationSummaryBufferMemory

These memories are designed to manage the conversation history and feed it back into the language model to maintain the conversational flow.

import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

from langchain.chat_models import ChatOpenAI

OpenAIModel = 'gpt-4'
llm = ChatOpenAI(model=OpenAIModel, temperature=0.1)

ConversationBufferMemory

This type of memory simply stores all the terms or utterances from a conversation. When you use a large language model for a chat conversation, each transaction is independent and stateless. A language model appears to have memory because we provide it with the entire conversation context.

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()

conversationUsingBufferMemory = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

conversationUsingBufferMemory.predict(input="Hello, My name is John.")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hello, My name is John.
AI:

> Finished chain.

"Hello, John! It's a pleasure to meet you. How can I assist you today?"

conversationUsingBufferMemory.predict(input="What is 123 times 3?")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, My name is John.
AI: Hello, John! It's a pleasure to meet you. How can I assist you today?
Human: What is 123 times 3?
AI:

> Finished chain.

'123 times 3 equals 369.'

conversationUsingBufferMemory.predict(input="Do you know my name?")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, My name is John.
AI: Hello, John! It's a pleasure to meet you. How can I assist you today?
Human: What is 123 times 3?
AI: 123 times 3 equals 369.
Human: Do you know my name?
AI:

> Finished chain.

'Yes, you mentioned earlier that your name is John.'

print(memory.buffer)

Human: Hello, My name is John.
AI: Hello, John! It's a pleasure to meet you. How can I assist you today?
Human: What is 123 times 3?
AI: 123 times 3 equals 369.
Human: Do you know my name?
AI: Yes, you mentioned earlier that your name is John.

memory.load_memory_variables({})

{'history': "Human: Hello, My name is John.\nAI: Hello, John! It's a pleasure to meet you. How can I assist you today?\nHuman: What is 123 times 3?\nAI: 123 times 3 equals 369.\nHuman: Do you know my name?\nAI: Yes, you mentioned earlier that your name is John."}

memory = ConversationBufferMemory()

memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

print(memory.buffer)

Human: Hi
AI: What's up

memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

The ConversationBufferMemory is used in scenarios where it’s vital for the AI to have full access to the history of an ongoing conversation. It stores all the terms or utterances from a conversation and feeds them into the language model sequentially, allowing it to respond contextually as the conversation progresses.

Here are some instances where ConversationBufferMemory can be particularly useful:

Chatbots: In an interactive chatbot application where maintaining continuity and context is critical to the flow of the conversation, ConversationBufferMemory can play a crucial role. The users can ask multiple questions and each answer can depend deeply on the previous exchanges.
Customer Support Systems: In applications like automated customer service or support, ConversationBufferMemory helps keep track of the customer’s entire interaction with the AI, leading to a more personalized and effective assistance.
Interactive Games: In AI-driven interactive games where each user input can change the course of the game, this type of memory would be beneficial.

In any circumstance where preserving the full conversation context is important for maintaining accurate and meaningful interactions, ConversationBufferMemory provides an essential solution.

ConversationBufferWindowMemory

This memory limits the stored conversation to a fixed number of exchanges. For example, you can set it to remember only the last utterance or more. This prevents the memory from growing without control as the conversation gets longer.

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)

memory.save_context({"input": "Hi"},
                    {"output": "What's up"})


memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})

memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

memory = ConversationBufferWindowMemory(k=1)               

conversationUsingBufferWindowMemory = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

conversationUsingBufferWindowMemory.predict(input="Hello, My name is John.")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hello, My name is John.
AI:

> Finished chain.

"Hello, John! It's a pleasure to meet you. How can I assist you today?"

conversationUsingBufferWindowMemory.predict(input="What is 123 times 3?")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, My name is John.
AI: Hello, John! It's a pleasure to meet you. How can I assist you today?
Human: What is 123 times 3?
AI:

> Finished chain.

'123 times 3 equals 369.'

conversationUsingBufferWindowMemory.predict(input="Do you know my name?")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: What is 123 times 3?
AI: 123 times 3 equals 369.
Human: Do you know my name?
AI:

> Finished chain.

"I'm sorry, but as an AI, I don't have access to personal data about individuals unless it has been shared with me in the course of our conversation. I am designed to respect user privacy and confidentiality."

ConversationBufferWindowMemory can be utilized when you want the AI to remember only a limited number of past exchanges. This type of memory grows only up to a certain point, retaining the most recent elements of the conversation and discarding the oldest once its limit is reached.

This type of memory can be crucial in situations like:

Lengthy Dialogues: During drawn-out conversations where only the most recent exchanges are relevant, using a ConversationBufferWindowMemory can prevent unnecessary use of memory resources.
Highly Contextual Exchanges: If the AI needs to respond based on recent contextual information rather than older, potentially irrelevant facts, this memory type is beneficial.
Memory and Cost Efficiency: As lengthy conversations can occupy large amounts of memory and increase the cost of processing tokens with an LLM, using a ConversationBufferWindowMemory can help control costs by reducing memory use.

For instance, in certain customer support scenarios, an AI might need to keep track of just the last few exchanges to handle the current query, making the ConversationBufferWindowMemory a fitting option.

ConversationTokenBufferMemory

This memory limits the number of tokens stored. It’s useful because a large portion of the cost in language model usage comes from the number of tokens processed.

!pip install tiktoken

Requirement already satisfied: tiktoken in /home/eddypermana22/.local/lib/python3.8/site-packages (0.4.0)
Requirement already satisfied: requests>=2.26.0 in /home/eddypermana22/.local/lib/python3.8/site-packages (from tiktoken) (2.31.0)
Requirement already satisfied: regex>=2022.1.18 in /home/eddypermana22/.local/lib/python3.8/site-packages (from tiktoken) (2023.6.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/eddypermana22/.local/lib/python3.8/site-packages (from requests>=2.26.0->tiktoken) (3.2.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests>=2.26.0->tiktoken) (2019.11.28)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests>=2.26.0->tiktoken) (1.25.8)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests>=2.26.0->tiktoken) (2.8)

from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=200)

memory.save_context({"input": "Embeddings are what?"},
                    {"output": "Exciting!"})

memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})

memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})

memory.save_context({"input": "Deep learning is what?"},
                    {"output": "Delightful!"})

memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

memory.load_memory_variables({})

{'history': 'Human: Embeddings are what?\nAI: Exciting!\nHuman: AI is what?!\nAI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Deep learning is what?\nAI: Delightful!\nHuman: Chatbots are what?\nAI: Charming!'}

The ConversationTokenBufferMemory is used when it’s crucial to control both memory usage and the cost associated with a language model’s token processing. This type of memory can limit the number of tokens stored to a defined maximum, which directly maps to the cost of the language model calls.

Here are some examples where ConversationTokenBufferMemory can be beneficial:

Cost-Effective Chatbots: For interactive chatbots being used extensively, limiting the number of tokens processed can help reduce operation costs associated with large language models, particularly when a user interacts frequently or provides extensive inputs.
Large-Scale Applications: In large-scale applications that handle multiple instances of language models for multiple users, using ConversationTokenBufferMemory could improve performance while remaining cost-effective.
Data Capping: When you want to put a cap on the amount of data stored and processed for practical or regulatory reasons, the token buffer memory allows this flexibility.

In essence, wherever controlling the amount of processed data or tokens is crucial for performance, cost, or compliance reasons, ConversationTokenBufferMemory provides an effective solution.

ConversationSummaryBufferMemory

Finally, the ConversationSummaryBufferMemory uses a language model to summarize the conversation when the token limit is reached. For example, if you have a long conversation and you want to summarize it, this memory type can be helpful.

from langchain.memory import ConversationSummaryBufferMemory

scheduleString = "This is your personalized and detailed daily schedule reminder. You begin your day bright and early at 6 am with a few minutes of mindfulness and meditation to set the tone for the day. At 7 am, you have a nutritious breakfast, followed by a re-energizing morning workout at 7:30 am. After freshening up, you dive into your first meeting of the day at 9 am that continues until 10 am. Subsequently, from 10:30 am to 12 noon, you'll be conducting a project review. When lunchtime rolls around at 12 pm, take a well-deserved break until 1 pm. Once you're rejuvenated, you're set to join a brainstorming session from 1:30 pm to 3 pm. You've set aside the time between 3 pm to 4 pm to catch up on emails and pending tasks. From 4:30 pm to 5:30 pm, you're scheduled for another meeting. Afterwards, from 5:30 pm onwards, it's your leisure time to be used as you see fit. Remember to have your dinner at 7:30 pm to keep you fueled for the next day. Before retiring for the night at 10 pm, allow yourself some relaxation or quiet reading time. Throughout the day, it's critically important to stay hydrated and take short breaks for proper rest. Here's to a productive and successful day ahead!"

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)

memory.save_context({"input": "Hello"}, {"output": "What's up"})

memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})

memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{scheduleString}"})

memory.load_memory_variables({})

{'history': 'System: The human and AI exchange greetings before the human asks about their schedule for the day. The AI provides a detailed schedule, starting with mindfulness and meditation at 6 am, followed by breakfast and a workout. The human has a meeting at 9 am, a project review from 10:30 am to 12 pm, lunch at 12 pm, a brainstorming session from 1:30 pm to 3 pm, time to catch up on emails and tasks from 3 pm to 4 pm, another meeting from 4:30 pm to 5:30 pm, and leisure time from 5:30 pm onwards. The AI reminds the human to stay hydrated, take breaks, and have dinner at 7:30 pm before going to bed at 10 pm.'}

conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

conversation.predict(input="When the first meeting?")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human and AI exchange greetings before the human asks about their schedule for the day. The AI provides a detailed schedule, starting with mindfulness and meditation at 6 am, followed by breakfast and a workout. The human has a meeting at 9 am, a project review from 10:30 am to 12 pm, lunch at 12 pm, a brainstorming session from 1:30 pm to 3 pm, time to catch up on emails and tasks from 3 pm to 4 pm, another meeting from 4:30 pm to 5:30 pm, and leisure time from 5:30 pm onwards. The AI reminds the human to stay hydrated, take breaks, and have dinner at 7:30 pm before going to bed at 10 pm.
Human: When the first meeting?
AI:

> Finished chain.

'The first meeting is scheduled for 9 am.'

memory.load_memory_variables({})

{'history': 'System: The human and AI exchange greetings before the human asks about their schedule for the day. The AI provides a detailed schedule, starting with mindfulness and meditation at 6 am, followed by breakfast and a workout. The human has a meeting at 9 am, a project review from 10:30 am to 12 pm, lunch at 12 pm, a brainstorming session from 1:30 pm to 3 pm, time to catch up on emails and tasks from 3 pm to 4 pm, another meeting from 4:30 pm to 5:30 pm, and leisure time from 5:30 pm onwards. The AI reminds the human to stay hydrated, take breaks, and have dinner at 7:30 pm before going to bed at 10 pm.\nHuman: When the first meeting?\nAI: The first meeting is scheduled for 9 am.'}

ConversationSummaryBufferMemory is employed in situations where you’d like to maintain the gist of the conversation without storing every single detail because of token or resource limitations. It uses a language model to summarize the dialogue and stores the summary.

Here are a few typical instances where you might find this memory type useful:

Large-scale Conversations: If a conversation is extensive or spans over a long period of time and storing every detail isn’t feasible due to token limitations or efficiency concerns, ConversationSummaryBufferMemory would be useful for retaining the most critical information without exhausting the resources.
Business Meetings: In digital platforms for brainstorming, meetings, or webinars, we often need to create a concise summary of the long conversations. This memory type could be very effective in such contexts.
Educational Platforms: In an e-learning environment, where a teacher interacts with several students, ConversationSummaryBufferMemory could assist in summarizing the key points of this interaction, aiding in generating reviewable content after a class or a session.
News or Document Summarizing Applications: If a conversation or text represents a news article or a document that needs to be summarized, ConversationSummaryBufferMemory could store the summarised version, helping users get the crux of the information promptly.

In essence, ConversationSummaryBufferMemory enables you to retain the core thread of a dialogue or content even with a high volume of tokens, offering a summary of the main points while using fewer resources.

In addition to what has been mentioned, LangChain also supports two other features: VectorStoreRetrieverMemory, which stores word or text embeddings, and ConversationEntityMemory, which retains details about specific entities.

These two features will be explained in detail in the subsequent material.

Memory Exercise

%pip install rggrader

# @title #### Student Identity
student_id = "your student id" # @param {type:"string"}
name = "your name" # @param {type:"string"}

# @title #### 00. Router Chain
from rggrader import submit
import os
from langchain.chat_models import ChatOpenAI
from dotenv import load_dotenv, find_dotenv
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# TODO:
# Create a chat application using the `ConversationBufferMemory` memory type
# Provide `ConversationBufferMemory` in the `memory' variable which will store data across calls to the chain.
# Also use the GPT 3.5-turbo chat model in the process.
# Have several conversations to see whether your chat application can remember/recognize the information previously asked.

# Put your code here:
memory = ConversationBufferMemory()

question="your question" # @param {type:"string"}
conversationUsingBufferMemory.predict(input=question)
# ---- End of your code ----

# Example of Expected Output:
"""
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello, i'm Djarot Purnomo. I'm a UI/UX Designer
AI: Hello Djarot Purnomo! It's nice to meet you. As a UI/UX Designer, you must have a lot of experience in creating user-friendly and visually appealing interfaces. How long have you been working in this field?
Human: explain the relationship between a UI/UX Designer and a Front End Engineer
AI: A UI/UX Designer and a Front End Engineer often work closely together to create a seamless user experience. The UI/UX Designer focuses on the overall look and feel of the interface, including the layout, color scheme, and typography. They also conduct user research and create wireframes and prototypes to test and refine the design.

On the other hand, the Front End Engineer is responsible for implementing the design into a functional website or application. They take the visual elements created by the UI/UX Designer and use programming languages like HTML, CSS, and JavaScript to bring them to life. They ensure that the design is responsive, interactive, and optimized for different devices and browsers.

In summary, the UI/UX Designer focuses on the user experience and visual design, while the Front End Engineer focuses on the technical implementation of that design. They collaborate closely to ensure that the final product is both visually appealing and user-friendly.
Human: What skills do I currently have?
AI:

> Finished chain.
As an AI, I don't have access to personal information or knowledge about your specific skills. Therefore, I cannot accurately determine what skills you currently have. However, based on your previous statement that you are a UI/UX Designer, it can be inferred that you have skills in creating user-friendly interfaces, conducting user research, creating wireframes and prototypes, and possibly knowledge of design tools such as Adobe XD or Sketch.
"""

# Submit Method
assignment_id = "00_memory"
question_id = "00_conversation_buffer_memory"
history = memory.buffer
submit(student_id, name, assignment_id, history, question_id)