v2 Async API

The v2 API provides asynchronous memory operations. Instead of waiting for embedding generation to complete synchronously, the v2 endpoints return immediately with a 202 Accepted status and process the embedding in the background. This reduces response times from ~300ms to ~20ms for memory creation.

When to use v2 vs v1

v1 (synchronous): Use when you need the memory to be immediately searchable. Simpler to integrate -- the memory is ready when the request returns.
v2 (asynchronous): Use when you are storing many memories in a hot path and latency matters. The memory exists immediately but is not searchable until embedding generation completes (typically 1-5 seconds).

Async Create Memory

POST /v2/memories

Create a memory asynchronously. The memory is stored immediately and a background worker generates the embedding. The request body is identical to POST /v1/memories.

Request Body

Same as POST /v1/memories. All fields are supported.

Field	Type	Required	Description
`content`	string	Yes	Memory content (1-50,000 characters)
`agent_id`	string	Yes	Agent namespace identifier
`metadata`	object	No	Custom key-value metadata (max 10KB)
`memory_type`	string	No	Memory type classification
`session_id`	string	No	Session ID to associate with
`project`	string	No	Project slug
`importance`	float	No	Importance score 0.0-1.0
`webhook_url`	string	No	HTTPS URL to notify when embedding is ready

Response `202 Accepted`

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "pending",
  "job_id": "arq:abc123def456",
  "estimated_completion_seconds": 3
}

Field	Type	Description
`id`	string	Memory UUID (can be used immediately for GET)
`status`	string	Processing status: `pending`, `processing`, `ready`, `failed`
`job_id`	string	Background job ID for tracking (`null` if worker queue unavailable)
`estimated_completion_seconds`	integer	Estimated time until embedding is ready

curl

curl -X POST https://api.memoryrelay.net/v2/memories \
  -H "Authorization: Bearer $MEMORYRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The deployment pipeline uses GitHub Actions with staging and production environments",
    "agent_id": "my-assistant",
    "metadata": { "source": "onboarding" },
    "webhook_url": "https://example.com/webhooks/memoryrelay"
  }'

Python SDK

from memoryrelay import AsyncMemoryRelay

client = AsyncMemoryRelay(api_key="mem_prod_...")

result = await client.memories.create_async(
    content="The deployment pipeline uses GitHub Actions with staging and production environments",
    agent_id="my-assistant",
    webhook_url="https://example.com/webhooks/memoryrelay",
)
print(f"Memory {result.id} is {result.status}, ETA: {result.estimated_completion_seconds}s")

Poll Processing Status

GET /v2/memories/{id}/status

Check the processing status of an asynchronously created memory. Use this to poll until the memory is ready for search.

Path Parameters

Parameter	Type	Description
`id`	UUID	Memory ID

Response

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "ready",
  "created_at": 1710672000,
  "updated_at": 1710672003,
  "error": null
}

Status Values

Status	Description
`pending`	Memory stored, waiting for worker to pick up
`processing`	Embedding generation in progress
`ready`	Embedding complete, memory is fully searchable
`failed`	Embedding generation failed (check `error` field)

curl

curl https://api.memoryrelay.net/v2/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890/status \
  -H "Authorization: Bearer $MEMORYRELAY_API_KEY"

Python SDK

status = await client.memories.get_status("a1b2c3d4-e5f6-7890-abcd-ef1234567890")
if status.status == "ready":
    print("Memory is searchable!")
elif status.status == "failed":
    print(f"Error: {status.error}")

Polling Example

import asyncio

async def wait_for_memory(client, memory_id, timeout=30):
    """Poll until memory is ready or timeout."""
    for _ in range(timeout):
        status = await client.memories.get_status(memory_id)
        if status.status == "ready":
            return True
        if status.status == "failed":
            raise RuntimeError(f"Memory failed: {status.error}")
        await asyncio.sleep(1)
    raise TimeoutError(f"Memory {memory_id} not ready after {timeout}s")

Use Webhooks Instead of Polling

For production workloads, register a webhook for the embedding.completed event instead of polling. This is more efficient and eliminates the need for retry logic.

Build Context

POST /v2/context/build

Build a formatted context string from an agent's most relevant memories. This is designed for injecting memory context into LLM prompts. Optionally compresses the output to fit within a token budget.

Request Body

Field	Type	Required	Description
`query`	string	Yes	Query to search for relevant memories (1-5,000 characters)
`agent_id`	string	No	Agent namespace. Omit to search across all agents
`limit`	integer	No	Maximum memories to include, 1-50 (default: 10)
`threshold`	float	No	Minimum similarity score, 0.0-1.0 (default: 0.5)
`max_tokens`	integer	No	Token budget for the output (1 token is approximately 4 characters). Context is truncated to fit

Response

{
  "context": "## Relevant Memories\n\n1. The user prefers dark mode in all applications (importance: 0.7, stored 2 days ago)\n2. User requested larger font sizes for accessibility (importance: 0.6, stored 3 days ago)\n3. The deployment pipeline uses GitHub Actions (importance: 0.5, stored 1 week ago)",
  "memories_used": 3,
  "total_chars": 287
}

Field	Type	Description
`context`	string	Formatted context string, ready for prompt injection
`memories_used`	integer	Number of memories included
`total_chars`	integer	Total character count of the context

curl

curl -X POST https://api.memoryrelay.net/v2/context/build \
  -H "Authorization: Bearer $MEMORYRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What do I know about this user'\''s preferences?",
    "agent_id": "my-assistant",
    "limit": 10,
    "threshold": 0.5,
    "max_tokens": 2000
  }'

Python SDK

context = await client.context.build(
    query="What do I know about this user's preferences?",
    agent_id="my-assistant",
    limit=10,
    max_tokens=2000,
)

# Inject into your LLM prompt
prompt = f"""You are a helpful assistant. Here is what you remember about this user:

{context.context}

Now respond to the user's message:
"""

Async Processing Model

The v2 async flow works as follows:

Client sends POST /v2/memories with memory content
API stores the memory in the database immediately (content, metadata, timestamps)
API queues an embedding job on the background worker (ARQ + Redis)
API returns 202 Accepted with the memory ID and job tracking info
Background worker generates the embedding (typically 1-5 seconds)
Worker updates the memory with the embedding vector
Memory becomes searchable via POST /v1/memories/search

Notification Options

You have two ways to know when a memory is ready:

Method	How	Best For
Polling	`GET /v2/memories/{id}/status`	Simple integrations, small volume
Webhooks	Register via `POST /v1/webhooks`	Production workloads, high volume

When the Worker is Unavailable

If the background worker queue is unavailable, the API falls back to synchronous embedding generation. In this case:

The job_id field in the response will be null
The memory is fully ready when the request returns
No polling or webhook notification is needed

This fallback ensures the API remains functional even if the Redis-backed job queue is temporarily down.

Async Create Memory​

Request Body​

Response 202 Accepted​

curl​

Python SDK​

Poll Processing Status​

Path Parameters​

Response​

Status Values​

curl​

Python SDK​

Polling Example​

Build Context​

Request Body​

Response​

curl​

Python SDK​

Async Processing Model​

Notification Options​

When the Worker is Unavailable​

Async Create Memory

Request Body

Response `202 Accepted`

curl

Python SDK

Poll Processing Status

Path Parameters

Response

Status Values

curl

Python SDK

Polling Example

Build Context

Request Body

Response

curl

Python SDK

Async Processing Model

Notification Options

When the Worker is Unavailable