What is llms.txt and how does it help AI crawlers understand your site?
The llms.txt proposal was introduced by Jeremy Howard of Answer.AI in September 2024. The motivation was straightforward: large language models increasingly access websites directly - either during training data collection or via real-time retrieval in RAG (Retrieval-Augmented Generation) systems - but they have no equivalent of the sitemap or structured metadata that search engine crawlers rely on. llms.txt aims to fill that gap.
The file format is intentionally simple: Markdown-formatted text that a language model can parse and understand natively. A well-structured llms.txt typically includes a brief one-to-two paragraph description of the site and its purpose, a structured list of the most important pages with their titles, URLs, and a one-line description of what each page contains, a note about what the site does not cover (to help the model avoid false associations), and optionally, preferred citation formats or contact information for licensing inquiries.
The distinction from robots.txt is important. robots.txt is a technical access control file that crawlers must respect as a permission mechanism. llms.txt is purely informational - no AI system is obligated to read or follow it, and there is no enforcement mechanism. Its value is entirely in the quality of the information it provides: a well-written llms.txt makes it easier for an AI to accurately represent your site, which in turn improves how AI-generated responses describe or cite your content.
As of mid-2025, support for llms.txt is growing. Cloudflare added an llms.txt generation feature to its AI Audit product. Several documentation platforms (Mintlify, Gitbook) auto-generate llms.txt files. Perplexity has confirmed it reads llms.txt files as part of its crawling process. Whether llms.txt becomes as universal as robots.txt remains to be seen, but the adoption trajectory from major infrastructure providers suggests it is worth implementing now.
For most sites, implementing llms.txt takes under an hour: identify your ten to twenty most important pages, write a sentence describing each, add a two-paragraph site description, and publish the file at /llms.txt. There is no downside risk - non-supporting crawlers will simply ignore the file.