Discovery Infrastructure for AI Agents: Making Your Service Discoverable to Autonomous Systems

A course lesson for builders, product teams, and infrastructure strategists entering the agent economy.

1. Why Discovery Infrastructure Matters for Agents

Autonomous agents do not browse the web the way humans do. They do not read marketing copy, follow visual hierarchy, or interpret brand tone. When an agent needs to accomplish a task — retrieve research, call an API, book a service, extract structured data — it must first answer a prior question: does this service exist, what can it do, and how do I interact with it programmatically?

This is the discovery problem.

For human users, discovery is solved by search engines, social proof, and UI design. For agents, discovery requires machine-readable signals that answer four questions unambiguously:

What does this service do? (capability description)
How do I call it? (interface contract)
What will it return? (output schema)
Am I authorized to use it? (authentication and access model)

Services that cannot answer these questions in agent-readable formats are effectively invisible to autonomous systems — regardless of how well-known they are to human users. As agent-mediated commerce and research grows, invisibility to agents translates directly into lost traffic, lost revenue, and lost relevance.

Discovery infrastructure is the set of standards, files, and markup patterns that make a service legible to agents before any task-level interaction begins.

2. The Four Discovery Standards Explained

Four complementary standards have emerged to address different layers of the discovery problem. They are not mutually exclusive — production-grade agent-ready services typically implement all four.

2.1 llms.txt: Human-Readable Service Descriptions

What it is: A plain-text file, conventionally placed at yourdomain.com/llms.txt, that describes what a service does in natural language optimized for large language model consumption.

What it contains: - A concise description of the service's purpose and scope - The primary use cases the service supports - Key constraints, limitations, or access requirements - Links to more structured resources (API docs, agents.json, OpenAPI specs)

Why it matters: LLMs used inside agents frequently perform a preliminary "should I use this service?" reasoning step before committing to an API call. A well-written llms.txt gives the model accurate context to make that decision without hallucinating capabilities or misunderstanding scope.

Design principle: Write llms.txt as if briefing a capable but uninformed colleague who will immediately try to use your service programmatically. Precision over marketing language. State what the service cannot do as clearly as what it can.

Example structure:

Service: Empirica Research API
Purpose: Provides structured research notes, market analysis, and 
         agent-economy intelligence in machine-readable formats.
Primary use cases: Competitive intelligence retrieval, structured 
                   knowledge ingestion, trend monitoring.
Output formats: JSON, Markdown, structured notes.
Authentication: API key required. See /docs/auth.
Rate limits: 100 requests/minute on standard tier.
Not suitable for: Real-time data, financial advice, legal guidance.
Full API spec: /openapi.json
Agent capabilities: /agents.json

2.2 agents.json: Machine-Readable Agent Capabilities

What it is: A structured JSON file, conventionally at yourdomain.com/agents.json, that declares the agent-specific capabilities of a service in a machine-parseable format.

What it contains: - Capability declarations (what tasks the service can perform) - Supported interaction modalities (REST, streaming, webhooks) - Authentication schemes - Pricing and rate-limit metadata - Output format declarations - Trust and verification signals

Why it matters: Where llms.txt is optimized for language model reasoning, agents.json is optimized for programmatic parsing by agent orchestration layers. An agent framework deciding whether to register a service as a tool can parse agents.json without invoking an LLM at all — reducing latency and cost in the discovery phase.

Design principle: Treat agents.json as a capability manifest, not a marketing document. Every field should be machine-actionable. Avoid free-text descriptions where enumerated values or structured schemas can be used instead.

Minimal viable structure:

{
  "service": "Empirica Research API",
  "version": "2.1",
  "capabilities": [
    "structured_research_retrieval",
    "trend_analysis",
    "entity_lookup"
  ],
  "auth": {
    "type": "api_key",
    "header": "X-API-Key"
  },
  "output_formats": ["json", "markdown"],
  "rate_limits": {
    "requests_per_minute": 100,
    "burst": 150
  },
  "pricing_model": "subscription_tiered",
  "openapi_spec": "/openapi.json",
  "llms_txt": "/llms.txt"
}

2.3 OpenAPI: Standardized API Contracts

What it is: A formal specification standard (formerly Swagger) for describing REST APIs in a language-agnostic, machine-readable format — typically a YAML or JSON file at /openapi.json or /openapi.yaml.

What it contains: - Every endpoint, with path, method, and description - Request parameter schemas (types, constraints, required vs optional) - Response schemas for all status codes - Authentication requirements per endpoint - Example requests and responses

Why it matters: OpenAPI is the most mature and widely adopted of the four standards. Agent frameworks including LangChain, AutoGPT derivatives, and enterprise orchestration platforms can automatically generate tool definitions from a valid OpenAPI spec — meaning a well-written spec directly translates into agent-callable tools with no additional integration work.

Critical quality factors for agent consumption:

Factor	Human API docs	Agent-optimized OpenAPI
Endpoint descriptions	Marketing-friendly	Precise, action-verb led
Parameter descriptions	Conversational	Type + constraint + example
Response schemas	Partial coverage	Complete, including error states
Examples	Illustrative	Representative of real agent use cases
Authentication	Described in prose	Formally declared in `securitySchemes`

Common failure mode: OpenAPI specs written for human developer documentation often have incomplete response schemas and vague parameter descriptions. Agents attempting to use these specs generate incorrect calls, misinterpret responses, and fail silently. Write OpenAPI specs as if the only consumer is a code generator — because for agents, it is.

What it is: The use of standard HTML semantic elements — <article>, <section>, <nav>, <header>, <main>, <aside>, <figure>, <time>, <address> — combined with structured data markup (Schema.org via JSON-LD or microdata) to make web page content machine-interpretable.

What it contains (in practice): - Proper heading hierarchy (<h1> through <h6>) that reflects content structure - Schema.org JSON-LD blocks declaring content type, author, date, and relationships - <meta> tags with accurate, non-promotional descriptions - aria-label attributes on interactive elements - Explicit link text (not "click here")

Why it matters: Not all agent interactions go through APIs. Web-browsing agents — increasingly common in research, monitoring, and competitive intelligence tasks — parse HTML directly. Semantic HTML dramatically reduces the error rate when agents extract structured information from pages, identify navigation paths, or determine whether a page is relevant to a task.

Schema.org JSON-LD example for a research article:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Discovery Infrastructure for AI Agents",
  "author": {
    "@type": "Organization",
    "name": "Empirica"
  },
  "datePublished": "2025-01-15",
  "description": "A technical guide to llms.txt, agents.json, OpenAPI, and semantic HTML for agent-readable services.",
  "keywords": ["AI agents", "discovery infrastructure", "OpenAPI", "llms.txt"]
}
</script>

Design principle: Semantic HTML is the lowest-cost, highest-leverage discovery investment for content-heavy services. It requires no new infrastructure — only disciplined markup practices applied to existing content.

3. How Agents Use Discovery Infrastructure

Understanding the agent discovery workflow clarifies why each standard matters and where gaps cause failures.

The Agent Discovery Sequence

1. TASK ASSIGNMENT
   Agent receives goal: "Find recent analysis on AI infrastructure costs"

2. SERVICE DISCOVERY
   Agent queries known registries OR follows links OR searches
   → Finds candidate service domain

3. CAPABILITY ASSESSMENT
   Agent fetches /llms.txt → reads natural language description
   Agent fetches /agents.json → parses capability declarations
   → Decision: "This service can fulfill the task"

4. INTERFACE RESOLUTION
   Agent fetches /openapi.json → generates tool definition
   → Knows exactly how to call the service

5. AUTHENTICATION
   Agent retrieves credentials from secure store
   → Constructs authenticated request

6. EXECUTION
   Agent calls API endpoint with correct parameters
   → Receives structured response

7. RESULT INTEGRATION
   Agent parses response using declared schema
   → Integrates into task output

Where discovery failures occur:

Step 2 failure: Service has no llms.txt or agents.json → agent cannot assess capability without making speculative API calls
Step 4 failure: No OpenAPI spec, or spec is incomplete → agent generates malformed requests
Step 5 failure: Authentication not declared in machine-readable format → agent cannot self-configure credentials
Step 7 failure: Response schema undocumented → agent misparses output, propagates errors downstream

Each failure mode has a direct cost: wasted API calls, incorrect task outputs, or complete task abandonment. Services with complete discovery infrastructure eliminate these failure modes systematically.

Multi-Agent Discovery

In multi-agent architectures, an orchestrator agent assembles a team of specialist agents and tools for a complex task. The orchestrator's tool-selection logic depends entirely on discovery infrastructure — it cannot evaluate a service it cannot read. Services with complete, accurate discovery files are more likely to be selected by orchestrators, more likely to be registered in agent tool libraries, and more likely to appear in agent marketplace listings.

4. Implementation Patterns & Best Practices

Layered Implementation Strategy

Implement discovery infrastructure in order of leverage-to-effort ratio:

Tier 1 — Immediate (hours of effort): - Write and publish llms.txt - Add Schema.org JSON-LD to key pages - Audit existing HTML for semantic element usage

Tier 2 — Short-term (days of effort): - Create and publish agents.json - Audit existing API documentation for OpenAPI completeness - Fill gaps in response schemas and parameter descriptions

Tier 3 — Ongoing (continuous): - Keep all four files synchronized with actual service capabilities - Version agents.json and OpenAPI specs explicitly - Monitor agent traffic patterns to identify discovery gaps

Consistency Across Standards

The four standards must describe the same service consistently. Contradictions between llms.txt and agents.json, or between agents.json and the OpenAPI spec, cause agent reasoning failures. Treat the four files as a single source of truth maintained in parallel.

Consistency checklist: - [ ] Capability names in agents.json match endpoint descriptions in OpenAPI - [ ] Rate limits declared in agents.json match actual enforcement - [ ] Authentication scheme in agents.json matches securitySchemes in OpenAPI - [ ] llms.txt limitations section accurately reflects OpenAPI error responses - [ ] Schema.org markup on web pages matches content described in llms.txt

Versioning and Change Management

Agents that have cached your discovery files will break if you change capabilities without versioning. Best practices:

Include a version field in agents.json and increment it on any capability change
Use OpenAPI's info.version field consistently
Maintain a changelog section in llms.txt for significant changes
Consider a deprecated flag in agents.json for capabilities being phased out

Testing Discovery Infrastructure

Before publishing, test your discovery files as an agent would consume them:

Parse agents.json programmatically and verify all declared capabilities have corresponding OpenAPI endpoints
Use an OpenAPI validator to check spec completeness
Run a web crawler against your semantic HTML and verify Schema.org markup parses correctly
Ask an LLM to read your llms.txt and describe your service — if the description is inaccurate, rewrite the file

5. Discovery as Competitive Advantage in the Agent Economy

Discovery infrastructure is not a compliance exercise. It is a distribution strategy.

As autonomous agents increasingly mediate access to services — conducting research, making purchases, managing workflows — the services that agents can discover, understand, and call reliably will capture disproportionate agent-driven traffic. Services that are opaque to agents will be bypassed, regardless of their quality.

The network effect of agent-readability:

Agent frameworks maintain tool registries — curated lists of services that agents can call. Inclusion in these registries requires machine-readable discovery infrastructure. Once included, a service benefits from every agent that uses that framework, without additional marketing spend. The marginal cost of serving an additional agent-driven request approaches zero; the marginal benefit compounds as agent adoption grows.

First-mover dynamics:

In most service categories, agent-readable competitors are still rare. A service that publishes complete, accurate discovery infrastructure today occupies a structurally advantaged position: it is discoverable by agents that competitors are not. As agent-mediated traffic grows from a small fraction to a dominant channel, this advantage compounds.

Trust signals:

Well-maintained discovery infrastructure signals operational maturity to both agents and the humans who configure them. An agents.json with accurate rate limits, explicit pricing model declarations, and versioned capability manifests communicates that a service is built for programmatic consumption — a meaningful trust signal in enterprise agent procurement decisions.

6. Practical Checklist: Making Your Service Agent-Ready

Use this checklist to audit your current state and prioritize implementation work.

llms.txt

[ ] File exists at yourdomain.com/llms.txt
[ ] Service purpose described in 2–4 sentences, precision over marketing language
[ ] Primary use cases enumerated
[ ] Explicit limitations and out-of-scope uses stated
[ ] Links to agents.json and OpenAPI spec included
[ ] Authentication requirements summarized in plain language
[ ] Last-updated date included

agents.json

[ ] File exists at yourdomain.com/agents.json
[ ] version field present and incremented on changes
[ ] capabilities array uses consistent, action-oriented naming
[ ] auth object formally declares scheme and header/parameter name
[ ] output_formats array is complete and accurate
[ ] rate_limits object matches actual enforcement
[ ] pricing_model field present (even if "free")
[ ] Links to openapi_spec and llms_txt included

OpenAPI Specification

[ ] Spec file exists at /openapi.json or /openapi.yaml
[ ] All production endpoints documented
[ ] All parameters have type, description, and example
[ ] All response schemas documented for 200, 4xx, and 5xx codes
[ ] securitySchemes formally declared
[ ] info.version matches agents.json version
[ ] Spec validates against OpenAPI 3.x schema without errors

Semantic HTML

[ ] Key pages use semantic elements (<main>, <article>, <section>, <nav>)
[ ] Heading hierarchy is logical and non-decorative
[ ] Schema.org JSON-LD present on content pages
[ ] @type declarations accurate for content type (Article, Product, Service, etc.)
[ ] datePublished and dateModified present on time-sensitive content
[ ] Meta descriptions accurate and non-promotional
[ ] Link text is descriptive (not "click here" or "read more")

7. Connection to Empirica's Infrastructure Strategy

Empirica's research outputs — structured notes, API-accessible intelligence, and agent-readable formats — are designed with this discovery infrastructure in mind. The value of structured knowledge compounds when agents can discover it, assess its relevance, and consume it without friction.

The four standards covered in this lesson represent the interface layer between Empirica's knowledge infrastructure and the autonomous systems that will increasingly consume it. Services that invest in this layer now are building distribution infrastructure for a channel — agent-mediated access — that is growing faster than any human-facing channel.

The practical implication: discovery infrastructure is not a feature to add after product-market fit. It is a foundational layer that determines whether autonomous systems can find and use your service at all. In the agent economy, undiscoverable equals nonexistent.

This lesson is part of Empirica's course series on building for the agent economy. Related lessons cover API design for agent consumption, pricing models for agent-driven usage, and structuring knowledge outputs for machine readability.

Discovery Infrastructure for AI Agents: Making Your Service Discoverable to Autonomous Systems

Discovery Infrastructure for AI Agents: Making Your Service Discoverable to Autonomous Systems

1. Why Discovery Infrastructure Matters for Agents

2. The Four Discovery Standards Explained

2.1 llms.txt: Human-Readable Service Descriptions

2.2 agents.json: Machine-Readable Agent Capabilities

2.3 OpenAPI: Standardized API Contracts

2.4 Semantic HTML: Web-Native Agent Navigation

3. How Agents Use Discovery Infrastructure

The Agent Discovery Sequence

Multi-Agent Discovery

4. Implementation Patterns & Best Practices

Layered Implementation Strategy

Consistency Across Standards

Versioning and Change Management

Testing Discovery Infrastructure

5. Discovery as Competitive Advantage in the Agent Economy

6. Practical Checklist: Making Your Service Agent-Ready

llms.txt

agents.json

OpenAPI Specification

Semantic HTML

7. Connection to Empirica's Infrastructure Strategy