Skip to main content

v2 Async API

The v2 API provides asynchronous memory operations. Instead of waiting for embedding generation to complete synchronously, the v2 endpoints return immediately with a 202 Accepted status and process the embedding in the background. This reduces response times from ~300ms to ~20ms for memory creation.

When to use v2 vs v1
  • v1 (synchronous): Use when you need the memory to be immediately searchable. Simpler to integrate -- the memory is ready when the request returns.
  • v2 (asynchronous): Use when you are storing many memories in a hot path and latency matters. The memory exists immediately but is not searchable until embedding generation completes (typically 1-5 seconds).

Async Create Memory

POST /v2/memories

Create a memory asynchronously. The memory is stored immediately and a background worker generates the embedding. The request body is identical to POST /v1/memories.

Request Body

Same as POST /v1/memories. All fields are supported.

FieldTypeRequiredDescription
contentstringYesMemory content (1-50,000 characters)
agent_idstringYesAgent namespace identifier
metadataobjectNoCustom key-value metadata (max 10KB)
memory_typestringNoMemory type classification
session_idstringNoSession ID to associate with
projectstringNoProject slug
importancefloatNoImportance score 0.0-1.0
webhook_urlstringNoHTTPS URL to notify when embedding is ready

Response 202 Accepted

{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "pending",
"job_id": "arq:abc123def456",
"estimated_completion_seconds": 3
}
FieldTypeDescription
idstringMemory UUID (can be used immediately for GET)
statusstringProcessing status: pending, processing, ready, failed
job_idstringBackground job ID for tracking (null if worker queue unavailable)
estimated_completion_secondsintegerEstimated time until embedding is ready

curl

curl -X POST https://api.memoryrelay.net/v2/memories \
-H "Authorization: Bearer $MEMORYRELAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "The deployment pipeline uses GitHub Actions with staging and production environments",
"agent_id": "my-assistant",
"metadata": { "source": "onboarding" },
"webhook_url": "https://example.com/webhooks/memoryrelay"
}'

Python SDK

from memoryrelay import AsyncMemoryRelay

client = AsyncMemoryRelay(api_key="mem_prod_...")

result = await client.memories.create_async(
content="The deployment pipeline uses GitHub Actions with staging and production environments",
agent_id="my-assistant",
webhook_url="https://example.com/webhooks/memoryrelay",
)
print(f"Memory {result.id} is {result.status}, ETA: {result.estimated_completion_seconds}s")

Poll Processing Status

GET /v2/memories/{id}/status

Check the processing status of an asynchronously created memory. Use this to poll until the memory is ready for search.

Path Parameters

ParameterTypeDescription
idUUIDMemory ID

Response

{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "ready",
"created_at": 1710672000,
"updated_at": 1710672003,
"error": null
}

Status Values

StatusDescription
pendingMemory stored, waiting for worker to pick up
processingEmbedding generation in progress
readyEmbedding complete, memory is fully searchable
failedEmbedding generation failed (check error field)

curl

curl https://api.memoryrelay.net/v2/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890/status \
-H "Authorization: Bearer $MEMORYRELAY_API_KEY"

Python SDK

status = await client.memories.get_status("a1b2c3d4-e5f6-7890-abcd-ef1234567890")
if status.status == "ready":
print("Memory is searchable!")
elif status.status == "failed":
print(f"Error: {status.error}")

Polling Example

import asyncio

async def wait_for_memory(client, memory_id, timeout=30):
"""Poll until memory is ready or timeout."""
for _ in range(timeout):
status = await client.memories.get_status(memory_id)
if status.status == "ready":
return True
if status.status == "failed":
raise RuntimeError(f"Memory failed: {status.error}")
await asyncio.sleep(1)
raise TimeoutError(f"Memory {memory_id} not ready after {timeout}s")
Use Webhooks Instead of Polling

For production workloads, register a webhook for the embedding.completed event instead of polling. This is more efficient and eliminates the need for retry logic.


Build Context

POST /v2/context/build

Build a formatted context string from an agent's most relevant memories. This is designed for injecting memory context into LLM prompts. Optionally compresses the output to fit within a token budget.

Request Body

FieldTypeRequiredDescription
querystringYesQuery to search for relevant memories (1-5,000 characters)
agent_idstringNoAgent namespace. Omit to search across all agents
limitintegerNoMaximum memories to include, 1-50 (default: 10)
thresholdfloatNoMinimum similarity score, 0.0-1.0 (default: 0.5)
max_tokensintegerNoToken budget for the output (1 token is approximately 4 characters). Context is truncated to fit

Response

{
"context": "## Relevant Memories\n\n1. The user prefers dark mode in all applications (importance: 0.7, stored 2 days ago)\n2. User requested larger font sizes for accessibility (importance: 0.6, stored 3 days ago)\n3. The deployment pipeline uses GitHub Actions (importance: 0.5, stored 1 week ago)",
"memories_used": 3,
"total_chars": 287
}
FieldTypeDescription
contextstringFormatted context string, ready for prompt injection
memories_usedintegerNumber of memories included
total_charsintegerTotal character count of the context

curl

curl -X POST https://api.memoryrelay.net/v2/context/build \
-H "Authorization: Bearer $MEMORYRELAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What do I know about this user'\''s preferences?",
"agent_id": "my-assistant",
"limit": 10,
"threshold": 0.5,
"max_tokens": 2000
}'

Python SDK

context = await client.context.build(
query="What do I know about this user's preferences?",
agent_id="my-assistant",
limit=10,
max_tokens=2000,
)

# Inject into your LLM prompt
prompt = f"""You are a helpful assistant. Here is what you remember about this user:

{context.context}

Now respond to the user's message:
"""

Async Processing Model

The v2 async flow works as follows:

  1. Client sends POST /v2/memories with memory content
  2. API stores the memory in the database immediately (content, metadata, timestamps)
  3. API queues an embedding job on the background worker (ARQ + Redis)
  4. API returns 202 Accepted with the memory ID and job tracking info
  5. Background worker generates the embedding (typically 1-5 seconds)
  6. Worker updates the memory with the embedding vector
  7. Memory becomes searchable via POST /v1/memories/search

Notification Options

You have two ways to know when a memory is ready:

MethodHowBest For
PollingGET /v2/memories/{id}/statusSimple integrations, small volume
WebhooksRegister via POST /v1/webhooksProduction workloads, high volume

When the Worker is Unavailable

If the background worker queue is unavailable, the API falls back to synchronous embedding generation. In this case:

  • The job_id field in the response will be null
  • The memory is fully ready when the request returns
  • No polling or webhook notification is needed

This fallback ensures the API remains functional even if the Redis-backed job queue is temporarily down.