Agent Memory and Knowledge Markets: How Agents Acquire, Store, and Monetise Information
1. Overview
The emerging knowledge market for autonomous AI agents is bifurcating into two distinct layers: stateful memory infrastructure (vector databases, episodic stores, key-value caches) that agents rent to persist their own context, and knowledge-as-a-service APIs (search, research, structured data feeds) that agents query to acquire external information at inference time. Pricing in both layers is rapidly shifting away from human-developer-oriented seat licences toward per-query, per-token, and per-agent-fleet models that better match autonomous consumption patterns. The competitive frontier is no longer raw retrieval — it is structured, citable, agent-native output that compresses downstream LLM token costs and reduces hallucination liability.
2. Key findings
- Vector DB pricing has collapsed to roughly $0.05–$0.40 per million vectors per month for serverless storage tiers, with query pricing on the order of $2–$10 per million queries (Pinecone serverless pricing — https://www.pinecone.io/pricing/; Weaviate Cloud pricing — https://weaviate.io/pricing). For an agent fleet maintaining 10M memory vectors and issuing 5M queries/month, storage+query cost lands near $30–$80/month — trivial compared with LLM inference. [EMPIRICA ANALYSIS]
- Knowledge-API pricing is converging on a $5–$15 per 1,000 queries band for agent-grade search. Tavily prices search/extract endpoints in the single-digit-cents-per-call range (Tavily pricing — https://tavily.com/#pricing); Exa lists neural search starting around $5 per 1,000 searches with content extraction billed separately (Exa pricing — https://exa.ai/pricing); Perplexity's Sonar API exposes per-token billing similar to LLM APIs (Perplexity API pricing — https://docs.perplexity.ai/guides/pricing). Brave Search API offers data plans starting near $3/1,000 queries (Brave Search API — https://brave.com/search/api/).
- Retrieval beats fine-tuning on unit economics for volatile knowledge. Fine-tuning a mid-sized model on OpenAI's platform runs in the $10–$25 per million training tokens range, plus a usage premium on the served model (OpenAI fine-tuning — https://openai.com/api/pricing/). For knowledge that turns over weekly, monthly re-training is uneconomic vs. retrieval at sub-cent per query. [EMPIRICA ANALYSIS]
- Agent discovery of knowledge sources is currently dominated by hard-coded toolchains, not dynamic marketplace lookup. LangChain, LlamaIndex, and the Anthropic Model Context Protocol (MCP — https://modelcontextprotocol.io/) all assume the agent operator pre-registers data sources; there is no equivalent of DNS or PyPI for knowledge endpoints. [OPPORTUNITY]
- Monetisation patterns for knowledge providers selling to agent fleets are clustering into four archetypes: (a) metered per-query (Tavily, Exa), (b) token-metered like LLMs (Perplexity Sonar), (c) seat+volume hybrids inherited from SaaS (Bloomberg-style enterprise feeds adapted for AI), and (d) structured-research subscriptions with API access (Empirica's model). The fourth is the least crowded.
- Memory is becoming a thin commodity; curation is the margin. Pinecone, Weaviate, Qdrant, Chroma, Milvus, Turbopuffer, and pgvector compete on latency and $/vector. None of them generate the knowledge — they store it. The economic surplus is migrating to whoever produces citable, structured, time-stamped facts. [EMPIRICA FIT]
- Caching is the dominant cost lever for agent fleets consuming knowledge APIs. Anthropic's prompt caching reduces input-token cost by up to ~90% on cache hits (Anthropic prompt caching — https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching); OpenAI offers automatic prompt caching with ~50% discounts on cached input (OpenAI pricing — https://openai.com/api/pricing/). Knowledge providers that return deterministic, cacheable payloads with stable IDs become disproportionately cheap to consume at fleet scale. [EMPIRICA ANALYSIS]
3. Agent service patterns — what agents buy and why
3.1 The memory stack agents actually rent
A typical production agent fleet maintains a layered memory stack: