LangChain Integration
Build a conversational AI that remembers users across sessions. This guide shows how to use MemoryRelay as a persistent memory backend for LangChain, so your chatbot retains context long after a conversation ends.
What You'll Build
A LangChain ConversationChain backed by MemoryRelay that:
- Stores every exchange as a searchable memory
- Loads the most relevant past memories before each response using semantic search
- Works across process restarts, server redeployments, and multiple instances
Prerequisites
- Python 3.9+
- A MemoryRelay API key (get one here)
- An OpenAI API key (for the LLM)
Installation
pip install memoryrelay langchain langchain-openai
Set your API keys as environment variables:
export MEMORYRELAY_API_KEY="mem_your_key_here"
export OPENAI_API_KEY="sk-your_key_here"
Create a Custom LangChain Memory Class
LangChain's BaseMemory interface lets you plug in any storage backend. The class below bridges LangChain and MemoryRelay:
from typing import Any
from memoryrelay import MemoryRelay
from langchain.memory import BaseMemory
from pydantic import Field
class MemoryRelayMemory(BaseMemory):
"""LangChain memory backend that persists to MemoryRelay.
On each turn:
- load_memory_variables: searches MemoryRelay for memories relevant
to the current user input and returns them as context.
- save_context: stores the user/assistant exchange as a new memory.
"""
client: MemoryRelay = Field(exclude=True)
agent_id: str
memory_key: str = "history"
search_limit: int = 5
min_score: float = 0.5
class Config:
arbitrary_types_allowed = True
@property
def memory_variables(self) -> list[str]:
return [self.memory_key]
def load_memory_variables(self, inputs: dict[str, Any]) -> dict[str, str]:
"""Search MemoryRelay for memories relevant to the current input."""
query = inputs.get("input", "")
if not query:
return {self.memory_key: ""}
results = self.client.memories.search(
query=query,
agent_id=self.agent_id,
limit=self.search_limit,
)
# Filter by minimum similarity score
relevant = [r for r in results.data if r.score >= self.min_score]
context = "\n".join(r.content for r in relevant)
return {self.memory_key: context}
def save_context(self, inputs: dict[str, Any], outputs: dict[str, str]) -> None:
"""Store the conversation turn as a memory in MemoryRelay."""
user_input = inputs.get("input", "")
assistant_output = outputs.get("output", "")
self.client.memories.create(
content=f"User: {user_input}\nAssistant: {assistant_output}",
agent_id=self.agent_id,
metadata={
"type": "conversation",
"source": "langchain",
},
)
def clear(self) -> None:
"""No-op — memories are persistent by design."""
pass
LangChain's built-in memory classes (ConversationBufferMemory, ConversationSummaryMemory) store data in-process and lose everything on restart. MemoryRelayMemory persists memories to the cloud and uses semantic search to retrieve only the most relevant context — not the entire conversation history.
Wire It Into a Conversation Chain
import os
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from memoryrelay import MemoryRelay
# Initialize the MemoryRelay client
client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])
# Create (or reuse) an agent — this is the memory namespace
agent = client.agents.create(name="langchain-bot")
# Build the memory-backed chain
memory = MemoryRelayMemory(
client=client,
agent_id=str(agent.id),
search_limit=5,
min_score=0.5,
)
chain = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=memory,
verbose=True, # Set to True to see memory loading in action
)
Run a Conversation
Session 1: Teach the bot about yourself
response = chain.predict(input="My name is Alice and I work on ML pipelines at Acme Corp.")
print(response)
# "Nice to meet you, Alice! ML pipelines sound interesting — what kind of
# data are you working with at Acme Corp?"
response = chain.predict(input="Mostly time-series sensor data. We use Airflow for orchestration.")
print(response)
# "Airflow is a solid choice for time-series pipelines. Are you using
# any specific ML frameworks for your models?"
Both exchanges are now stored in MemoryRelay with embeddings generated automatically.
Session 2: The bot remembers (even after restart)
Imagine the process restarts — a new Python session, a new deployment, or even a different server. As long as you use the same agent ID, memories persist:
# New session — reconnect to the same agent
client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])
memory = MemoryRelayMemory(
client=client,
agent_id="<same-agent-id-from-session-1>",
)
chain = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=memory,
)
response = chain.predict(input="What do you know about me?")
print(response)
# "You're Alice from Acme Corp, working on ML pipelines for time-series
# sensor data. You use Apache Airflow for orchestration."
The chain called load_memory_variables, which searched MemoryRelay for memories relevant to "What do you know about me?" and injected the matching results as context for the LLM.
Full Working Example
Here is a complete, self-contained script you can run:
"""langchain_memory_demo.py — LangChain + MemoryRelay persistent memory demo."""
import os
from typing import Any
from langchain.chains import ConversationChain
from langchain.memory import BaseMemory
from langchain_openai import ChatOpenAI
from memoryrelay import MemoryRelay
from pydantic import Field
class MemoryRelayMemory(BaseMemory):
client: MemoryRelay = Field(exclude=True)
agent_id: str
memory_key: str = "history"
search_limit: int = 5
min_score: float = 0.5
class Config:
arbitrary_types_allowed = True
@property
def memory_variables(self) -> list[str]:
return [self.memory_key]
def load_memory_variables(self, inputs: dict[str, Any]) -> dict[str, str]:
query = inputs.get("input", "")
if not query:
return {self.memory_key: ""}
results = self.client.memories.search(
query=query, agent_id=self.agent_id, limit=self.search_limit
)
relevant = [r for r in results.data if r.score >= self.min_score]
context = "\n".join(r.content for r in relevant)
return {self.memory_key: context}
def save_context(self, inputs: dict[str, Any], outputs: dict[str, str]) -> None:
self.client.memories.create(
content=f"User: {inputs['input']}\nAssistant: {outputs['output']}",
agent_id=self.agent_id,
metadata={"type": "conversation", "source": "langchain"},
)
def clear(self) -> None:
pass
def main():
client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])
agent = client.agents.create(name="langchain-demo")
print(f"Agent ID: {agent.id}")
print("Save this ID to continue the conversation in a later session.\n")
memory = MemoryRelayMemory(client=client, agent_id=str(agent.id))
chain = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=memory,
)
print("Chat with the bot (type 'quit' to exit):\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit", "q"):
break
response = chain.predict(input=user_input)
print(f"Bot: {response}\n")
if __name__ == "__main__":
main()
Run it:
python langchain_memory_demo.py
How It Works
┌──────────┐ input ┌────────────────────┐ search() ┌──────────────┐
│ User │ ───────────► │ MemoryRelayMemory │ ──────────────► │ MemoryRelay │
└──────────┘ │ (load_memory_vars)│ ◄────results──── │ API │
└────────┬───────────┘ └──────────────┘
│ context ▲
▼ │
┌────────────────────┐ │
│ ChatOpenAI (LLM) │ │
└────────┬───────────┘ │
│ response │
▼ │
┌────────────────────┐ create() │
│ MemoryRelayMemory │ ──────────────────────┘
│ (save_context) │
└────────────────────┘
- User sends input to the chain.
load_memory_variablessearches MemoryRelay for relevant past memories.- Matching memories are injected into the LLM prompt as context.
- The LLM generates a response informed by past conversations.
save_contextstores the full exchange as a new memory with an embedding.
Best Practices
search_limit and min_scoreStart with search_limit=5 and min_score=0.5. If the bot recalls too much irrelevant context, raise min_score. If it misses things, lower it or increase search_limit.
Tag memories with metadata like {"type": "preference"} or {"type": "fact"}. You can later filter searches by metadata to retrieve only specific kinds of memories.
The agent ID is your memory namespace. Store it in a database or config file so returning users always connect to their existing memory. Creating a new agent starts with a blank slate.