LangChain Integration

Build a conversational AI that remembers users across sessions. This guide shows how to use MemoryRelay as a persistent memory backend for LangChain, so your chatbot retains context long after a conversation ends.

What You'll Build

A LangChain ConversationChain backed by MemoryRelay that:

Stores every exchange as a searchable memory
Loads the most relevant past memories before each response using semantic search
Works across process restarts, server redeployments, and multiple instances

Prerequisites

Python 3.9+
A MemoryRelay API key (get one here)
An OpenAI API key (for the LLM)

Installation

pip install memoryrelay langchain langchain-openai

Set your API keys as environment variables:

export MEMORYRELAY_API_KEY="mem_your_key_here"
export OPENAI_API_KEY="sk-your_key_here"

Create a Custom LangChain Memory Class

LangChain's BaseMemory interface lets you plug in any storage backend. The class below bridges LangChain and MemoryRelay:

from typing import Any

from memoryrelay import MemoryRelay
from langchain.memory import BaseMemory
from pydantic import Field


class MemoryRelayMemory(BaseMemory):
    """LangChain memory backend that persists to MemoryRelay.

    On each turn:
      - load_memory_variables: searches MemoryRelay for memories relevant
        to the current user input and returns them as context.
      - save_context: stores the user/assistant exchange as a new memory.
    """

    client: MemoryRelay = Field(exclude=True)
    agent_id: str
    memory_key: str = "history"
    search_limit: int = 5
    min_score: float = 0.5

    class Config:
        arbitrary_types_allowed = True

    @property
    def memory_variables(self) -> list[str]:
        return [self.memory_key]

    def load_memory_variables(self, inputs: dict[str, Any]) -> dict[str, str]:
        """Search MemoryRelay for memories relevant to the current input."""
        query = inputs.get("input", "")
        if not query:
            return {self.memory_key: ""}

        results = self.client.memories.search(
            query=query,
            agent_id=self.agent_id,
            limit=self.search_limit,
        )

        # Filter by minimum similarity score
        relevant = [r for r in results.data if r.score >= self.min_score]
        context = "\n".join(r.content for r in relevant)
        return {self.memory_key: context}

    def save_context(self, inputs: dict[str, Any], outputs: dict[str, str]) -> None:
        """Store the conversation turn as a memory in MemoryRelay."""
        user_input = inputs.get("input", "")
        assistant_output = outputs.get("output", "")

        self.client.memories.create(
            content=f"User: {user_input}\nAssistant: {assistant_output}",
            agent_id=self.agent_id,
            metadata={
                "type": "conversation",
                "source": "langchain",
            },
        )

    def clear(self) -> None:
        """No-op — memories are persistent by design."""
        pass

Why a custom class?

LangChain's built-in memory classes (ConversationBufferMemory, ConversationSummaryMemory) store data in-process and lose everything on restart. MemoryRelayMemory persists memories to the cloud and uses semantic search to retrieve only the most relevant context — not the entire conversation history.

Wire It Into a Conversation Chain

import os
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from memoryrelay import MemoryRelay

# Initialize the MemoryRelay client
client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])

# Create (or reuse) an agent — this is the memory namespace
agent = client.agents.create(name="langchain-bot")

# Build the memory-backed chain
memory = MemoryRelayMemory(
    client=client,
    agent_id=str(agent.id),
    search_limit=5,
    min_score=0.5,
)

chain = ConversationChain(
    llm=ChatOpenAI(model="gpt-4o"),
    memory=memory,
    verbose=True,  # Set to True to see memory loading in action
)

Run a Conversation

Session 1: Teach the bot about yourself

response = chain.predict(input="My name is Alice and I work on ML pipelines at Acme Corp.")
print(response)
# "Nice to meet you, Alice! ML pipelines sound interesting — what kind of
#  data are you working with at Acme Corp?"

response = chain.predict(input="Mostly time-series sensor data. We use Airflow for orchestration.")
print(response)
# "Airflow is a solid choice for time-series pipelines. Are you using
#  any specific ML frameworks for your models?"

Both exchanges are now stored in MemoryRelay with embeddings generated automatically.

Session 2: The bot remembers (even after restart)

Imagine the process restarts — a new Python session, a new deployment, or even a different server. As long as you use the same agent ID, memories persist:

# New session — reconnect to the same agent
client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])

memory = MemoryRelayMemory(
    client=client,
    agent_id="<same-agent-id-from-session-1>",
)

chain = ConversationChain(
    llm=ChatOpenAI(model="gpt-4o"),
    memory=memory,
)

response = chain.predict(input="What do you know about me?")
print(response)
# "You're Alice from Acme Corp, working on ML pipelines for time-series
#  sensor data. You use Apache Airflow for orchestration."

The chain called load_memory_variables, which searched MemoryRelay for memories relevant to "What do you know about me?" and injected the matching results as context for the LLM.

Full Working Example

Here is a complete, self-contained script you can run:

"""langchain_memory_demo.py — LangChain + MemoryRelay persistent memory demo."""

import os
from typing import Any

from langchain.chains import ConversationChain
from langchain.memory import BaseMemory
from langchain_openai import ChatOpenAI
from memoryrelay import MemoryRelay
from pydantic import Field


class MemoryRelayMemory(BaseMemory):
    client: MemoryRelay = Field(exclude=True)
    agent_id: str
    memory_key: str = "history"
    search_limit: int = 5
    min_score: float = 0.5

    class Config:
        arbitrary_types_allowed = True

    @property
    def memory_variables(self) -> list[str]:
        return [self.memory_key]

    def load_memory_variables(self, inputs: dict[str, Any]) -> dict[str, str]:
        query = inputs.get("input", "")
        if not query:
            return {self.memory_key: ""}
        results = self.client.memories.search(
            query=query, agent_id=self.agent_id, limit=self.search_limit
        )
        relevant = [r for r in results.data if r.score >= self.min_score]
        context = "\n".join(r.content for r in relevant)
        return {self.memory_key: context}

    def save_context(self, inputs: dict[str, Any], outputs: dict[str, str]) -> None:
        self.client.memories.create(
            content=f"User: {inputs['input']}\nAssistant: {outputs['output']}",
            agent_id=self.agent_id,
            metadata={"type": "conversation", "source": "langchain"},
        )

    def clear(self) -> None:
        pass


def main():
    client = MemoryRelay(api_key=os.environ["MEMORYRELAY_API_KEY"])
    agent = client.agents.create(name="langchain-demo")

    print(f"Agent ID: {agent.id}")
    print("Save this ID to continue the conversation in a later session.\n")

    memory = MemoryRelayMemory(client=client, agent_id=str(agent.id))
    chain = ConversationChain(
        llm=ChatOpenAI(model="gpt-4o"),
        memory=memory,
    )

    print("Chat with the bot (type 'quit' to exit):\n")
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ("quit", "exit", "q"):
            break
        response = chain.predict(input=user_input)
        print(f"Bot: {response}\n")


if __name__ == "__main__":
    main()

Run it:

python langchain_memory_demo.py

How It Works

┌──────────┐    input     ┌────────────────────┐    search()     ┌──────────────┐
│   User   │ ───────────► │  MemoryRelayMemory │ ──────────────► │  MemoryRelay │
└──────────┘              │  (load_memory_vars)│ ◄────results──── │  API         │
                          └────────┬───────────┘                 └──────────────┘
                                   │ context                           ▲
                                   ▼                                   │
                          ┌────────────────────┐                       │
                          │   ChatOpenAI (LLM) │                       │
                          └────────┬───────────┘                       │
                                   │ response                          │
                                   ▼                                   │
                          ┌────────────────────┐    create()           │
                          │  MemoryRelayMemory │ ──────────────────────┘
                          │  (save_context)    │
                          └────────────────────┘

User sends input to the chain.
load_memory_variables searches MemoryRelay for relevant past memories.
Matching memories are injected into the LLM prompt as context.
The LLM generates a response informed by past conversations.
save_context stores the full exchange as a new memory with an embedding.

Best Practices

Tune search_limit and min_score

Start with search_limit=5 and min_score=0.5. If the bot recalls too much irrelevant context, raise min_score. If it misses things, lower it or increase search_limit.

Use metadata for filtering

Tag memories with metadata like {"type": "preference"} or {"type": "fact"}. You can later filter searches by metadata to retrieve only specific kinds of memories.

Reuse agent IDs across sessions

The agent ID is your memory namespace. Store it in a database or config file so returning users always connect to their existing memory. Creating a new agent starts with a blank slate.

What You'll Build​

Prerequisites​

Installation​

Create a Custom LangChain Memory Class​

Wire It Into a Conversation Chain​

Run a Conversation​

Session 1: Teach the bot about yourself​

Session 2: The bot remembers (even after restart)​

Full Working Example​

How It Works​

Best Practices​

Next Steps​