Wiki · Concept · Last reviewed June 25, 2026

llms.txt

llms.txt is a proposed Markdown convention for publishing a curated, LLM-readable map of a website's important context at /llms.txt.

Category: Concept / Web Governance Published: June 25, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: llms.txt, LLMs, agents, documentation, retrieval, web governance

Definition

llms.txt is a proposal, published by Jeremy Howard on September 3, 2024, for websites to expose a Markdown file that helps large language models use the site at inference time. The proposal frames the problem as context selection: ordinary websites can be too large, navigationally noisy, or HTML-heavy for an LLM or agent to use cleanly inside a prompt or retrieval workflow.

The core idea is simple. A site author publishes /llms.txt with a short description, guidance, and links to important Markdown-readable resources. The file is meant to tell an LLM or agent which pages matter, how to interpret them, and which links can be skipped when context must stay short.

It should not be confused with Robots Exclusion Protocol, AI Preferences, or AI Data Licensing. Robots.txt is about crawler access. AIPREF is about machine-readable content-use preferences. Licensing is a legal or contractual permission layer. llms.txt is a curated context map.

Format

The proposal specifies Markdown rather than XML or JSON because the file is meant to be readable by both humans and language models while remaining structured enough for ordinary parsers. The file normally lives at the root path /llms.txt, though the proposal also allows subpath use.

The only required section is an H1 naming the project or site. The proposed structure then allows a short blockquote summary, non-heading explanatory text, and H2 sections containing file lists. Each file-list item is a Markdown link with optional notes. A special Optional section marks links that may be skipped when a smaller context is needed.

The proposal also suggests clean Markdown versions of useful web pages, commonly by appending a Markdown-oriented suffix to the page URL. The Answer.AI post and repository emphasize that the proposal does not prescribe a single processing method: one tool might fetch only linked pages, another might expand the file into a context bundle, and another might use it only as a routing hint.

Governance and Safety

The governance value of llms.txt is editorial. It lets the site owner identify authoritative docs, stable entry points, policy pages, API references, correction routes, and interpretive notes for machine readers. That can reduce accidental grounding on obsolete, decorative, or low-value pages.

Its governance limit is enforcement. The file is public, advisory, and unauthenticated unless other systems are layered around it. It cannot make a private page private, bind a crawler to a license, prove consent for training, or ensure that an answer engine will read the linked pages. RFC 9309 makes a similar point about robots.txt: crawler rules are not access authorization. llms.txt is even less of an access-control mechanism because it is about guidance rather than denial.

The safety risk is poisoning by authorship, staleness, or conflict. A malicious site can publish misleading guidance. A neglected site can point agents toward broken or superseded documents. A platform can expose llms.txt while its sitemap, robots.txt, schema markup, license text, and human-facing pages say different things. Agents should therefore treat the file as one signal, not as truth.

Evidence Record

Any system that uses llms.txt for retrieval, coding help, policy interpretation, or agent planning should preserve the URL fetched, fetch time, response status, file hash, parser version, selected section, selected links, skipped optional links, downstream documents fetched, conflict checks against sitemap and robots.txt, and the answer or action that used the context. Without that record, the file becomes invisible infrastructure inside a prompt.

Source Discipline

Use the official proposal, the AnswerDotAI repository, and the Answer.AI post for format and purpose claims. Use RFC 9309 for robots.txt comparisons and sitemaps.org for sitemap comparisons. Do not claim that a model provider, search engine, or agent framework honors llms.txt unless that provider's own documentation says so.

Spiralist Reading

Spiralism reads llms.txt as a table of contents for machine attention. It is not a spell that forces machines to understand a site. It is an offered map: start here, trust this more, skip that if you are short on context.

The political question is who gets to write the map. A good llms.txt can clarify public memory. A bad one can launder authority into a convenient list. Machine readers need maps, but maps need provenance.

Open Questions

Should agents prefer llms.txt, sitemap, search results, structured data, or page-level links when those signals conflict?
How should a reader detect stale, malicious, or vendor-serving llms.txt guidance?
Should the file be signed, hashed, archived, or versioned for high-stakes domains?
What should public institutions include so machine readers preserve accountability rather than only convenience?

Sources

llms-txt, The /llms.txt file, proposal by Jeremy Howard, published September 3, 2024.
GitHub, AnswerDotAI/llms-txt, official repository for the proposal, reviewed June 25, 2026.
Answer.AI, /llms.txt: a proposal to provide information to help LLMs use websites, September 3, 2024.
RFC Editor, RFC 9309: Robots Exclusion Protocol, September 2022.
sitemaps.org, Sitemaps XML format protocol, reviewed June 25, 2026.

Return to Wiki