v2 Async API
The v2 API provides asynchronous memory operations. Instead of waiting for embedding generation to complete synchronously, the v2 endpoints return immediately with a 202 Accepted status and process the embedding in the background. This reduces response times from ~300ms to ~20ms for memory creation.
- v1 (synchronous): Use when you need the memory to be immediately searchable. Simpler to integrate -- the memory is ready when the request returns.
- v2 (asynchronous): Use when you are storing many memories in a hot path and latency matters. The memory exists immediately but is not searchable until embedding generation completes (typically 1-5 seconds).
Async Create Memory
POST /v2/memories
Create a memory asynchronously. The memory is stored immediately and a background worker generates the embedding. The request body is identical to POST /v1/memories.
Request Body
Same as POST /v1/memories. All fields are supported.
| Field | Type | Required | Description |
|---|---|---|---|
content | string | Yes | Memory content (1-50,000 characters) |
agent_id | string | Yes | Agent namespace identifier |
metadata | object | No | Custom key-value metadata (max 10KB) |
memory_type | string | No | Memory type classification |
session_id | string | No | Session ID to associate with |
project | string | No | Project slug |
importance | float | No | Importance score 0.0-1.0 |
webhook_url | string | No | HTTPS URL to notify when embedding is ready |
Response 202 Accepted
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "pending",
"job_id": "arq:abc123def456",
"estimated_completion_seconds": 3
}
| Field | Type | Description |
|---|---|---|
id | string | Memory UUID (can be used immediately for GET) |
status | string | Processing status: pending, processing, ready, failed |
job_id | string | Background job ID for tracking (null if worker queue unavailable) |
estimated_completion_seconds | integer | Estimated time until embedding is ready |
curl
curl -X POST https://api.memoryrelay.net/v2/memories \
-H "Authorization: Bearer $MEMORYRELAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "The deployment pipeline uses GitHub Actions with staging and production environments",
"agent_id": "my-assistant",
"metadata": { "source": "onboarding" },
"webhook_url": "https://example.com/webhooks/memoryrelay"
}'
Python SDK
from memoryrelay import AsyncMemoryRelay
client = AsyncMemoryRelay(api_key="mem_prod_...")
result = await client.memories.create_async(
content="The deployment pipeline uses GitHub Actions with staging and production environments",
agent_id="my-assistant",
webhook_url="https://example.com/webhooks/memoryrelay",
)
print(f"Memory {result.id} is {result.status}, ETA: {result.estimated_completion_seconds}s")
Poll Processing Status
GET /v2/memories/{id}/status
Check the processing status of an asynchronously created memory. Use this to poll until the memory is ready for search.
Path Parameters
| Parameter | Type | Description |
|---|---|---|
id | UUID | Memory ID |
Response
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "ready",
"created_at": 1710672000,
"updated_at": 1710672003,
"error": null
}
Status Values
| Status | Description |
|---|---|
pending | Memory stored, waiting for worker to pick up |
processing | Embedding generation in progress |
ready | Embedding complete, memory is fully searchable |
failed | Embedding generation failed (check error field) |
curl
curl https://api.memoryrelay.net/v2/memories/a1b2c3d4-e5f6-7890-abcd-ef1234567890/status \
-H "Authorization: Bearer $MEMORYRELAY_API_KEY"
Python SDK
status = await client.memories.get_status("a1b2c3d4-e5f6-7890-abcd-ef1234567890")
if status.status == "ready":
print("Memory is searchable!")
elif status.status == "failed":
print(f"Error: {status.error}")
Polling Example
import asyncio
async def wait_for_memory(client, memory_id, timeout=30):
"""Poll until memory is ready or timeout."""
for _ in range(timeout):
status = await client.memories.get_status(memory_id)
if status.status == "ready":
return True
if status.status == "failed":
raise RuntimeError(f"Memory failed: {status.error}")
await asyncio.sleep(1)
raise TimeoutError(f"Memory {memory_id} not ready after {timeout}s")
For production workloads, register a webhook for the embedding.completed event instead of polling. This is more efficient and eliminates the need for retry logic.
Build Context
POST /v2/context/build
Build a formatted context string from an agent's most relevant memories. This is designed for injecting memory context into LLM prompts. Optionally compresses the output to fit within a token budget.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Query to search for relevant memories (1-5,000 characters) |
agent_id | string | No | Agent namespace. Omit to search across all agents |
limit | integer | No | Maximum memories to include, 1-50 (default: 10) |
threshold | float | No | Minimum similarity score, 0.0-1.0 (default: 0.5) |
max_tokens | integer | No | Token budget for the output (1 token is approximately 4 characters). Context is truncated to fit |
Response
{
"context": "## Relevant Memories\n\n1. The user prefers dark mode in all applications (importance: 0.7, stored 2 days ago)\n2. User requested larger font sizes for accessibility (importance: 0.6, stored 3 days ago)\n3. The deployment pipeline uses GitHub Actions (importance: 0.5, stored 1 week ago)",
"memories_used": 3,
"total_chars": 287
}
| Field | Type | Description |
|---|---|---|
context | string | Formatted context string, ready for prompt injection |
memories_used | integer | Number of memories included |
total_chars | integer | Total character count of the context |
curl
curl -X POST https://api.memoryrelay.net/v2/context/build \
-H "Authorization: Bearer $MEMORYRELAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What do I know about this user'\''s preferences?",
"agent_id": "my-assistant",
"limit": 10,
"threshold": 0.5,
"max_tokens": 2000
}'
Python SDK
context = await client.context.build(
query="What do I know about this user's preferences?",
agent_id="my-assistant",
limit=10,
max_tokens=2000,
)
# Inject into your LLM prompt
prompt = f"""You are a helpful assistant. Here is what you remember about this user:
{context.context}
Now respond to the user's message:
"""
Async Processing Model
The v2 async flow works as follows:
- Client sends
POST /v2/memorieswith memory content - API stores the memory in the database immediately (content, metadata, timestamps)
- API queues an embedding job on the background worker (ARQ + Redis)
- API returns
202 Acceptedwith the memory ID and job tracking info - Background worker generates the embedding (typically 1-5 seconds)
- Worker updates the memory with the embedding vector
- Memory becomes searchable via
POST /v1/memories/search
Notification Options
You have two ways to know when a memory is ready:
| Method | How | Best For |
|---|---|---|
| Polling | GET /v2/memories/{id}/status | Simple integrations, small volume |
| Webhooks | Register via POST /v1/webhooks | Production workloads, high volume |
When the Worker is Unavailable
If the background worker queue is unavailable, the API falls back to synchronous embedding generation. In this case:
- The
job_idfield in the response will benull - The memory is fully ready when the request returns
- No polling or webhook notification is needed
This fallback ensures the API remains functional even if the Redis-backed job queue is temporarily down.