Discovery Infrastructure for AI Agents: Making Your Service Discoverable to Autonomous Systems
A course lesson for builders, product teams, and infrastructure strategists entering the agent economy.
1. Why Discovery Infrastructure Matters for Agents
Autonomous agents do not browse the web the way humans do. They do not read marketing copy, follow visual hierarchy, or interpret brand tone. When an agent needs to accomplish a task — retrieve research, call an API, book a service, extract structured data — it must first answer a prior question: does this service exist, what can it do, and how do I interact with it programmatically?
This is the discovery problem.
For human users, discovery is solved by search engines, social proof, and UI design. For agents, discovery requires machine-readable signals that answer four questions unambiguously:
- What does this service do? (capability description)
- How do I call it? (interface contract)
- What will it return? (output schema)
- Am I authorized to use it? (authentication and access model)
Services that cannot answer these questions in agent-readable formats are effectively invisible to autonomous systems — regardless of how well-known they are to human users. As agent-mediated commerce and research grows, invisibility to agents translates directly into lost traffic, lost revenue, and lost relevance.
Discovery infrastructure is the set of standards, files, and markup patterns that make a service legible to agents before any task-level interaction begins.
2. The Four Discovery Standards Explained
Four complementary standards have emerged to address different layers of the discovery problem. They are not mutually exclusive — production-grade agent-ready services typically implement all four.
2.1 llms.txt: Human-Readable Service Descriptions
What it is: A plain-text file, conventionally placed at yourdomain.com/llms.txt, that describes what a service does in natural language optimized for large language model consumption.
What it contains: - A concise description of the service's purpose and scope - The primary use cases the service supports - Key constraints, limitations, or access requirements - Links to more structured resources (API docs, agents.json, OpenAPI specs)
Why it matters: LLMs used inside agents frequently perform a preliminary "should I use this service?" reasoning step before committing to an API call. A well-written llms.txt gives the model accurate context to make that decision without hallucinating capabilities or misunderstanding scope.
Design principle: Write llms.txt as if briefing a capable but uninformed colleague who will immediately try to use your service programmatically. Precision over marketing language. State what the service cannot do as clearly as what it can.
Example structure:
Service: Empirica Research API
Purpose: Provides structured research notes, market analysis, and
agent-economy intelligence in machine-readable formats.
Primary use cases: Competitive intelligence retrieval, structured
knowledge ingestion, trend monitoring.
Output formats: JSON, Markdown, structured notes.
Authentication: API key required. See /docs/auth.
Rate limits: 100 requests/minute on standard tier.
Not suitable for: Real-time data, financial advice, legal guidance.
Full API spec: /openapi.json
Agent capabilities: /agents.json
2.2 agents.json: Machine-Readable Agent Capabilities
What it is: A structured JSON file, conventionally at yourdomain.com/agents.json, that declares the agent-specific capabilities of a service in a machine-parseable format.
What it contains: - Capability declarations (what tasks the service can perform) - Supported interaction modalities (REST, streaming, webhooks) - Authentication schemes - Pricing and rate-limit metadata - Output format declarations - Trust and verification signals
Why it matters: Where llms.txt is optimized for language model reasoning, agents.json is optimized for programmatic parsing by agent orchestration layers. An agent framework deciding whether to register a service as a tool can parse agents.json without invoking an LLM at all — reducing latency and cost in the discovery phase.
Design principle: Treat agents.json as a capability manifest, not a marketing document. Every field should be machine-actionable. Avoid free-text descriptions where enumerated values or structured schemas can be used instead.
Minimal viable structure:
{
"service": "Empirica Research API",
"version": "2.1",
"capabilities": [
"structured_research_retrieval",
"trend_analysis",
"entity_lookup"
],
"auth": {
"type": "api_key",
"header": "X-API-Key"
},
"output_formats": ["json", "markdown"],
"rate_limits": {
"requests_per_minute": 100,
"burst": 150
},
"pricing_model": "subscription_tiered",
"openapi_spec": "/openapi.json",
"llms_txt": "/llms.txt"
}
2.3 OpenAPI: Standardized API Contracts
What it is: A formal specification standard (formerly Swagger) for describing REST APIs in a language-agnostic, machine-readable format — typically a YAML or JSON file at /openapi.json or /openapi.yaml.
What it contains: - Every endpoint, with path, method, and description - Request parameter schemas (types, constraints, required vs optional) - Response schemas for all status codes - Authentication requirements per endpoint - Example requests and responses
Why it matters: OpenAPI is the most mature and widely adopted of the four standards. Agent frameworks including LangChain, AutoGPT derivatives, and enterprise orchestration platforms can automatically generate tool definitions from a valid OpenAPI spec — meaning a well-written spec directly translates into agent-callable tools with no additional integration work.
Critical quality factors for agent consumption:
| Factor | Human API docs | Agent-optimized OpenAPI |
|---|---|---|
| Endpoint descriptions | Marketing-friendly | Precise, action-verb led |
| Parameter descriptions | Conversational | Type + constraint + example |
| Response schemas | Partial coverage | Complete, including error states |
| Examples | Illustrative | Representative of real agent use cases |
| Authentication | Described in prose | Formally declared in securitySchemes |
Common failure mode: OpenAPI specs written for human developer documentation often have incomplete response schemas and vague parameter descriptions. Agents attempting to use these specs generate incorrect calls, misinterpret responses, and fail silently. Write OpenAPI specs as if the only consumer is a code generator — because for agents, it is.
2.4 Semantic HTML: Web-Native Agent Navigation
What it is: The use of standard HTML semantic elements — <article>, <section>, <nav>, <header>, <main>, <aside>, <figure>, <time>, <address> — combined with structured data markup (Schema.org via JSON-LD or microdata) to make web page content machine-interpretable.
What it contains (in practice):
- Proper heading hierarchy (<h1> through <h6>) that reflects content structure
- Schema.org JSON-LD blocks declaring content type, author, date, and relationships
- <meta> tags with accurate, non-promotional descriptions
- aria-label attributes on interactive elements
- Explicit link text (not "click here")
Why it matters: Not all agent interactions go through APIs. Web-browsing agents — increasingly common in research, monitoring, and competitive intelligence tasks — parse HTML directly. Semantic HTML dramatically reduces the error rate when agents extract structured information from pages, identify navigation paths, or determine whether a page is relevant to a task.
Schema.org JSON-LD example for a research article:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Discovery Infrastructure for AI Agents",
"author": {
"@type": "Organization",
"name": "Empirica"
},
"datePublished": "2025-01-15",
"description": "A technical guide to llms.txt, agents.json, OpenAPI, and semantic HTML for agent-readable services.",
"keywords": ["AI agents", "discovery infrastructure", "OpenAPI", "llms.txt"]
}
</script>
Design principle: Semantic HTML is the lowest-cost, highest-leverage discovery investment for content-heavy services. It requires no new infrastructure — only disciplined markup practices applied to existing content.
3. How Agents Use Discovery Infrastructure
Understanding the agent discovery workflow clarifies why each standard matters and where gaps cause failures.
The Agent Discovery Sequence
1. TASK ASSIGNMENT
Agent receives goal: "Find recent analysis on AI infrastructure costs"
2. SERVICE DISCOVERY
Agent queries known registries OR follows links OR searches
→ Finds candidate service domain
3. CAPABILITY ASSESSMENT
Agent fetches /llms.txt → reads natural language description
Agent fetches /agents.json → parses capability declarations
→ Decision: "This service can fulfill the task"
4. INTERFACE RESOLUTION
Agent fetches /openapi.json → generates tool definition
→ Knows exactly how to call the service
5. AUTHENTICATION
Agent retrieves credentials from secure store
→ Constructs authenticated request
6. EXECUTION
Agent calls API endpoint with correct parameters
→ Receives structured response
7. RESULT INTEGRATION
Agent parses response using declared schema
→ Integrates into task output
Where discovery failures occur:
- Step 2 failure: Service has no
llms.txtoragents.json→ agent cannot assess capability without making speculative API calls - Step 4 failure: No OpenAPI spec, or spec is incomplete → agent generates malformed requests
- Step 5 failure: Authentication not declared in machine-readable format → agent cannot self-configure credentials
- Step 7 failure: Response schema undocumented → agent misparses output, propagates errors downstream
Each failure mode has a direct cost: wasted API calls, incorrect task outputs, or complete task abandonment. Services with complete discovery infrastructure eliminate these failure modes systematically.
Multi-Agent Discovery
In multi-agent architectures, an orchestrator agent assembles a team of specialist agents and tools for a complex task. The orchestrator's tool-selection logic depends entirely on discovery infrastructure — it cannot evaluate a service it cannot read. Services with complete, accurate discovery files are more likely to be selected by orchestrators, more likely to be registered in agent tool libraries, and more likely to appear in agent marketplace listings.
4. Implementation Patterns & Best Practices
Layered Implementation Strategy
Implement discovery infrastructure in order of leverage-to-effort ratio:
Tier 1 — Immediate (hours of effort):
- Write and publish llms.txt
- Add Schema.org JSON-LD to key pages
- Audit existing HTML for semantic element usage
Tier 2 — Short-term (days of effort):
- Create and publish agents.json
- Audit existing API documentation for OpenAPI completeness
- Fill gaps in response schemas and parameter descriptions
Tier 3 — Ongoing (continuous):
- Keep all four files synchronized with actual service capabilities
- Version agents.json and OpenAPI specs explicitly
- Monitor agent traffic patterns to identify discovery gaps
Consistency Across Standards
The four standards must describe the same service consistently. Contradictions between llms.txt and agents.json, or between agents.json and the OpenAPI spec, cause agent reasoning failures. Treat the four files as a single source of truth maintained in parallel.
Consistency checklist:
- [ ] Capability names in agents.json match endpoint descriptions in OpenAPI
- [ ] Rate limits declared in agents.json match actual enforcement
- [ ] Authentication scheme in agents.json matches securitySchemes in OpenAPI
- [ ] llms.txt limitations section accurately reflects OpenAPI error responses
- [ ] Schema.org markup on web pages matches content described in llms.txt
Versioning and Change Management
Agents that have cached your discovery files will break if you change capabilities without versioning. Best practices:
- Include a
versionfield inagents.jsonand increment it on any capability change - Use OpenAPI's
info.versionfield consistently - Maintain a
changelogsection inllms.txtfor significant changes - Consider a
deprecatedflag inagents.jsonfor capabilities being phased out
Testing Discovery Infrastructure
Before publishing, test your discovery files as an agent would consume them:
- Parse
agents.jsonprogrammatically and verify all declared capabilities have corresponding OpenAPI endpoints - Use an OpenAPI validator to check spec completeness
- Run a web crawler against your semantic HTML and verify Schema.org markup parses correctly
- Ask an LLM to read your
llms.txtand describe your service — if the description is inaccurate, rewrite the file
5. Discovery as Competitive Advantage in the Agent Economy
Discovery infrastructure is not a compliance exercise. It is a distribution strategy.
As autonomous agents increasingly mediate access to services — conducting research, making purchases, managing workflows — the services that agents can discover, understand, and call reliably will capture disproportionate agent-driven traffic. Services that are opaque to agents will be bypassed, regardless of their quality.
The network effect of agent-readability:
Agent frameworks maintain tool registries — curated lists of services that agents can call. Inclusion in these registries requires machine-readable discovery infrastructure. Once included, a service benefits from every agent that uses that framework, without additional marketing spend. The marginal cost of serving an additional agent-driven request approaches zero; the marginal benefit compounds as agent adoption grows.
First-mover dynamics:
In most service categories, agent-readable competitors are still rare. A service that publishes complete, accurate discovery infrastructure today occupies a structurally advantaged position: it is discoverable by agents that competitors are not. As agent-mediated traffic grows from a small fraction to a dominant channel, this advantage compounds.
Trust signals:
Well-maintained discovery infrastructure signals operational maturity to both agents and the humans who configure them. An agents.json with accurate rate limits, explicit pricing model declarations, and versioned capability manifests communicates that a service is built for programmatic consumption — a meaningful trust signal in enterprise agent procurement decisions.
6. Practical Checklist: Making Your Service Agent-Ready
Use this checklist to audit your current state and prioritize implementation work.
llms.txt
- [ ] File exists at
yourdomain.com/llms.txt - [ ] Service purpose described in 2–4 sentences, precision over marketing language
- [ ] Primary use cases enumerated
- [ ] Explicit limitations and out-of-scope uses stated
- [ ] Links to
agents.jsonand OpenAPI spec included - [ ] Authentication requirements summarized in plain language
- [ ] Last-updated date included
agents.json
- [ ] File exists at
yourdomain.com/agents.json - [ ]
versionfield present and incremented on changes - [ ]
capabilitiesarray uses consistent, action-oriented naming - [ ]
authobject formally declares scheme and header/parameter name - [ ]
output_formatsarray is complete and accurate - [ ]
rate_limitsobject matches actual enforcement - [ ]
pricing_modelfield present (even if "free") - [ ] Links to
openapi_specandllms_txtincluded
OpenAPI Specification
- [ ] Spec file exists at
/openapi.jsonor/openapi.yaml - [ ] All production endpoints documented
- [ ] All parameters have type, description, and example
- [ ] All response schemas documented for 200, 4xx, and 5xx codes
- [ ]
securitySchemesformally declared - [ ]
info.versionmatchesagents.jsonversion - [ ] Spec validates against OpenAPI 3.x schema without errors
Semantic HTML
- [ ] Key pages use semantic elements (
<main>,<article>,<section>,<nav>) - [ ] Heading hierarchy is logical and non-decorative
- [ ] Schema.org JSON-LD present on content pages
- [ ]
@typedeclarations accurate for content type (Article, Product, Service, etc.) - [ ]
datePublishedanddateModifiedpresent on time-sensitive content - [ ] Meta descriptions accurate and non-promotional
- [ ] Link text is descriptive (not "click here" or "read more")
7. Connection to Empirica's Infrastructure Strategy
Empirica's research outputs — structured notes, API-accessible intelligence, and agent-readable formats — are designed with this discovery infrastructure in mind. The value of structured knowledge compounds when agents can discover it, assess its relevance, and consume it without friction.
The four standards covered in this lesson represent the interface layer between Empirica's knowledge infrastructure and the autonomous systems that will increasingly consume it. Services that invest in this layer now are building distribution infrastructure for a channel — agent-mediated access — that is growing faster than any human-facing channel.
The practical implication: discovery infrastructure is not a feature to add after product-market fit. It is a foundational layer that determines whether autonomous systems can find and use your service at all. In the agent economy, undiscoverable equals nonexistent.
This lesson is part of Empirica's course series on building for the agent economy. Related lessons cover API design for agent consumption, pricing models for agent-driven usage, and structuring knowledge outputs for machine readability.