Skip to main content

LangChain Integration

Build a conversational AI that remembers users across sessions. This guide shows how to use MemoryRelay as a persistent memory backend for LangChain, so your chatbot retains context long after a conversation ends.

What You'll Build

A LangChain ConversationChain backed by MemoryRelay that:

  • Stores every exchange as a searchable memory
  • Loads the most relevant past memories before each response using semantic search
  • Works across process restarts, server redeployments, and multiple instances

Prerequisites

  • Python 3.9+
  • A MemoryRelay API key (get one here)
  • An OpenAI API key (for the LLM)

Installation

pip install memoryrelay langchain langchain-openai

Set your API keys as environment variables:

export MEMORYRELAY_API_KEY="mem_your_key_here"
export OPENAI_API_KEY="sk-your_key_here"

Create a Custom LangChain Memory Class

LangChain's BaseMemory interface lets you plug in any storage backend. The class below bridges LangChain and MemoryRelay:

from typing import Any

from memoryrelay import MemoryRelay
from langchain.memory import BaseMemory
from pydantic import Field


class MemoryRelayMemory(BaseMemory):
"""LangChain memory backend that persists to MemoryRelay.

On each turn:
- load_memory_variables: searches MemoryRelay for memories relevant
to the current user input and returns them as context.
- save_context: stores the user/assistant exchange as a new memory.
"""

client: MemoryRelay = Field(exclude=True)
agent_id: str
memory_key: str = "history"
search_limit: int = 5
min_score: float = 0.5

class Config:
arbitrary_types_allowed = True

@property
def memory_variables(self) -> list[str]:
return [self.memory_key]

def load_memory_variables(self, inputs: dict[str, Any]) -> dict[str, str]:
"""Search MemoryRelay for memories relevant to the current input."""
query = inputs.get("input", "")
if not query:
return {self.memory_key: ""}

results = self.client.memories.search(
query=query,
agent_id=self.agent_id,
limit=self.search_limit,
)

# Filter by minimum similarity score
relevant = [r for r in results.data if r.score >= self.min_score]
context = "\n".join(r.content for r in relevant)
return {self.memory_key: context}

def save_context(self, inputs: dict[str, Any], outputs: dict[str, str]) -> None:
"""Store the conversation turn as a memory in MemoryRelay."""
user_input = inputs.get("input", "")
assistant_output = outputs.get("output", "")

self.client.memories.create(
content=f"User: {user_input}\nAssistant: {assistant_output}",
agent_id=self.agent_id,
metadata={
"type": "conversation",
"source": "langchain",
},
)

def clear(self) -> None:
"""No-op — memories are persistent by design."""
pass
Why a custom class?

LangChain's built-in memory classes (ConversationBufferMemory, ConversationSummaryMemory) store data in-process and lose everything on restart. MemoryRelayMemory persists memories to the cloud and uses semantic search to retrieve only the most relevant context — not the entire conversation history.

Wire It Into a Conversation Chain

import os
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from memoryrelay import MemoryRelay

# Initialize the MemoryRelay client
client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])

# Create (or reuse) an agent — this is the memory namespace
agent = client.agents.create(name="langchain-bot")

# Build the memory-backed chain
memory = MemoryRelayMemory(
client=client,
agent_id=str(agent.id),
search_limit=5,
min_score=0.5,
)

chain = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=memory,
verbose=True, # Set to True to see memory loading in action
)

Run a Conversation

Session 1: Teach the bot about yourself

response = chain.predict(input="My name is Alice and I work on ML pipelines at Acme Corp.")
print(response)
# "Nice to meet you, Alice! ML pipelines sound interesting — what kind of
# data are you working with at Acme Corp?"

response = chain.predict(input="Mostly time-series sensor data. We use Airflow for orchestration.")
print(response)
# "Airflow is a solid choice for time-series pipelines. Are you using
# any specific ML frameworks for your models?"

Both exchanges are now stored in MemoryRelay with embeddings generated automatically.

Session 2: The bot remembers (even after restart)

Imagine the process restarts — a new Python session, a new deployment, or even a different server. As long as you use the same agent ID, memories persist:

# New session — reconnect to the same agent
client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])

memory = MemoryRelayMemory(
client=client,
agent_id="<same-agent-id-from-session-1>",
)

chain = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=memory,
)

response = chain.predict(input="What do you know about me?")
print(response)
# "You're Alice from Acme Corp, working on ML pipelines for time-series
# sensor data. You use Apache Airflow for orchestration."

The chain called load_memory_variables, which searched MemoryRelay for memories relevant to "What do you know about me?" and injected the matching results as context for the LLM.

Full Working Example

Here is a complete, self-contained script you can run:

"""langchain_memory_demo.py — LangChain + MemoryRelay persistent memory demo."""

import os
from typing import Any

from langchain.chains import ConversationChain
from langchain.memory import BaseMemory
from langchain_openai import ChatOpenAI
from memoryrelay import MemoryRelay
from pydantic import Field


class MemoryRelayMemory(BaseMemory):
client: MemoryRelay = Field(exclude=True)
agent_id: str
memory_key: str = "history"
search_limit: int = 5
min_score: float = 0.5

class Config:
arbitrary_types_allowed = True

@property
def memory_variables(self) -> list[str]:
return [self.memory_key]

def load_memory_variables(self, inputs: dict[str, Any]) -> dict[str, str]:
query = inputs.get("input", "")
if not query:
return {self.memory_key: ""}
results = self.client.memories.search(
query=query, agent_id=self.agent_id, limit=self.search_limit
)
relevant = [r for r in results.data if r.score >= self.min_score]
context = "\n".join(r.content for r in relevant)
return {self.memory_key: context}

def save_context(self, inputs: dict[str, Any], outputs: dict[str, str]) -> None:
self.client.memories.create(
content=f"User: {inputs['input']}\nAssistant: {outputs['output']}",
agent_id=self.agent_id,
metadata={"type": "conversation", "source": "langchain"},
)

def clear(self) -> None:
pass


def main():
client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])
agent = client.agents.create(name="langchain-demo")

print(f"Agent ID: {agent.id}")
print("Save this ID to continue the conversation in a later session.\n")

memory = MemoryRelayMemory(client=client, agent_id=str(agent.id))
chain = ConversationChain(
llm=ChatOpenAI(model="gpt-4o"),
memory=memory,
)

print("Chat with the bot (type 'quit' to exit):\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit", "q"):
break
response = chain.predict(input=user_input)
print(f"Bot: {response}\n")


if __name__ == "__main__":
main()

Run it:

python langchain_memory_demo.py

How It Works

┌──────────┐    input     ┌────────────────────┐    search()     ┌──────────────┐
│ User │ ───────────► │ MemoryRelayMemory │ ──────────────► │ MemoryRelay │
└──────────┘ │ (load_memory_vars)│ ◄────results──── │ API │
└────────┬───────────┘ └──────────────┘
│ context ▲
▼ │
┌────────────────────┐ │
│ ChatOpenAI (LLM) │ │
└────────┬───────────┘ │
│ response │
▼ │
┌────────────────────┐ create() │
│ MemoryRelayMemory │ ──────────────────────┘
│ (save_context) │
└────────────────────┘
  1. User sends input to the chain.
  2. load_memory_variables searches MemoryRelay for relevant past memories.
  3. Matching memories are injected into the LLM prompt as context.
  4. The LLM generates a response informed by past conversations.
  5. save_context stores the full exchange as a new memory with an embedding.

Best Practices

Tune search_limit and min_score

Start with search_limit=5 and min_score=0.5. If the bot recalls too much irrelevant context, raise min_score. If it misses things, lower it or increase search_limit.

Use metadata for filtering

Tag memories with metadata like {"type": "preference"} or {"type": "fact"}. You can later filter searches by metadata to retrieve only specific kinds of memories.

Reuse agent IDs across sessions

The agent ID is your memory namespace. Store it in a database or config file so returning users always connect to their existing memory. Creating a new agent starts with a blank slate.

Next Steps