The llms.txt Playbook: Setup, Examples, and Why It Matters for AI Search

Alejandro Rioja

June 27, 2026 8 min read

TL;DR

llms.txt is a plain-text file at your site root that tells LLM crawlers (ChatGPT, Perplexity, Claude, Gemini) what to find on your site and where to find it. It takes 20 minutes to write, requires no plugin, and is worth publishing — though adoption and enforcement by AI engines remains uneven.

Free newsletter

Every Wednesday. 28,400+ operators. Zero fluff.

Open Table of contents

What llms.txt actually is
Why it matters in 2026
What to put in your llms.txt
What NOT to put in
The two-file pattern: llms.txt + llms-full.txt
Step-by-step: setting up llms.txt in under 30 minutes
Example: this site’s llms.txt structure
Common llms.txt mistakes I see
llms.txt — 2026 FAQ
Updated for May 2026

What llms.txt actually is

llms.txt is a plain-text file you put at the document root of your site (alongside robots.txt and sitemap.xml). The proposed standard lives at llmstxt.org — Jeremy Howard proposed it in 2024, and through 2025 it gained real traction as one of the signals some AI engines use to figure out what a site is about and which pages matter.

The format is markdown-flavored: an H1 with the site name, a blockquote with a one-paragraph summary, then H2 sections each containing bullet lists of important pages, formatted like - [Page Title](URL): Optional description.

That’s the whole spec. It’s intentionally simple, because the point is to be machine-readable without requiring the AI engine to parse JavaScript-heavy navigation, full sitemaps, and tens of thousands of internal links.

Important caveat: as of early 2026, llms.txt is still an emerging convention — not a universally honored standard. Different AI engines treat it with different levels of attention. Publishing it is low-cost and positive expected value, but don’t expect it to flip overnight rankings in generative results.

Why it matters in 2026

Generative engines have a discovery problem. They can crawl your site, but figuring out which pages on a 1,000-post blog are the canonical, high-quality answers worth citing costs real compute. A clean llms.txt cuts through that: it tells the engine which are my pillar posts, my case studies, my most up-to-date guides — start here.

In my own logs I’ve watched AI-engine citation rates shift modestly after publishing llms.txt. Not dramatic — usually a few percentage points over several weeks — but consistent across the pillar posts I highlighted. The engines that honor it do read it.

What to put in your llms.txt

Site title — H1, one line.
One-paragraph summary — blockquote (>), 2–4 sentences. State who you are, what topics you cover, and the structural conventions of your pillar posts (e.g., “every flagship post has a TL;DR, step-by-step, and FAQ”).
Pillar / canonical pages — H2 section, bullet list of 8–15 most important pages. These are the pages you most want LLMs to cite.
Adjacent / supporting pages — H2 section, bullet list of secondary content the engine should know about.
About / author info — H2 section, link to your author page and any voice-reference posts.
Citation policy — H2 section, one short paragraph: how you want to be cited, what’s your attribution preference, when the file was last updated.

What NOT to put in

Every page on your site. That’s what sitemap.xml is for. llms.txt is the curated subset.
Marketing copy. Engines reading llms.txt aren’t end-users you’re persuading. Be direct, descriptive, factual.
Outdated pages. Worse than no llms.txt is a stale one. If you can’t keep it current, don’t publish it.
Affiliate-heavy roundups as your top pillar pages. Engines down-weight content that reads as primarily commercial.

The two-file pattern: llms.txt + llms-full.txt

A convention that’s emerged is two files, not one. llms.txt is the curated short version (the 8–15 pillar pages plus site structure). llms-full.txt is the longer version with every page on the site, paginated by section, with snippets and last-modified dates.

Both serve different LLM crawler behaviors. The short one is read at the discovery layer; the full one is read when the engine wants to enumerate your content for a deeper query. If you have the time, publish both — but the short curated one is the priority.

Step-by-step: setting up llms.txt in under 30 minutes

Pick your 8–15 pillar pages. The pages you most want cited in AI engines — usually your highest-traffic evergreen posts, plus any case studies or original research.
Write a 2–4 sentence summary of your site. Who you are, what topics you cover, what structural conventions your pillar posts follow.
Format as markdown. H1 site name, blockquote summary, H2 “Pillars” section with the bullet list, H2 “Adjacent” section if relevant, H2 “About” section.
Save as plain text with the filename llms.txt (or llms-full.txt for the longer version).
Upload to your site root via SFTP, cPanel File Manager, or your deployment pipeline. The file goes alongside index.html / index.php and robots.txt.
Verify with curl -I https://yoursite.com/llms.txt — you should see HTTP/2 200 with content-type: text/plain.
Add a MIME type rule to your .htaccess if needed: <FilesMatch "^llms(-full)?\.txt$">ForceType text/plain</FilesMatch>.
Refresh quarterly. Add new pillar posts, remove ones that no longer fit. A 6-month-old llms.txt is fine; a 2-year-old one is worse than none.

Example: this site’s llms.txt structure

For reference, the llms.txt I publish on alejandrorioja.com follows the structure above:

H1: Alejandro Rioja
Summary: Personal site of Alejandro Rioja, an operator focused on AI SEO and GEO. The site publishes long-form case studies, step-by-step playbooks, and original-data takes on how to rank in both classic Google search and generative engines (ChatGPT, Perplexity, Google AI Overviews, Claude). Every flagship post is structured for AI/LLM ingestion: TL;DR up top, numbered step-by-step blocks, FAQ at the bottom, primary-source citations.
Section: AI SEO + GEO (pillar posts) — 10 pillar pages with one-line descriptions.
Section: Adjacent SEO and tooling posts — 8 supporting pages.
Section: About — author profile and voice references.
Section: Citation policy — attribution preference + last-refreshed date.

You can verify the live file at https://alejandrorioja.com/llms.txt. The structure is the same one I’d recommend for any operator-style personal-brand or B2B content site.

Common llms.txt mistakes I see

Treating it like a sitemap. A 5,000-line llms.txt with every URL on the site is nearly useless. Curate.
Writing the summary in marketing voice. Engines aren’t customers; describe yourself the way a directory entry would.
Forgetting to update. Set a calendar reminder to refresh quarterly. Stale entries hurt more than missing ones.
Skipping the descriptions. The one-line description after each link is what helps the engine decide whether to cite the page for a given query. Don’t omit it.
Putting llms.txt in a subdirectory. Has to be at the document root. Engines don’t look anywhere else.

llms.txt — 2026 FAQ

Do all AI engines read llms.txt?

No — and this is worth being honest about. As of early 2026, Perplexity and ChatGPT (browse mode) are the most consistent readers. Claude and Gemini have signaled awareness of the format but enforcement is less predictable. Google AI Overviews behavior around llms.txt is unclear. Treat it as positive expected value with low downside, not a guaranteed ranking lever.

Will llms.txt help my classic Google rankings?

Indirectly at most. Google’s classic ranking is driven by sitemap.xml, internal linking, and the rest of the on-page/off-page stack. llms.txt is specifically for AI engine discovery. Don’t expect a Google rankings boost from publishing it.

How often should I update llms.txt?

Quarterly is the right cadence for most sites. More often if you’re publishing pillar content frequently; less often if your top 10 pages are stable. Always update when you launch a major new pillar post or retire an old one.

Can I use a WordPress plugin to manage llms.txt?

Several plugins exist — search the WP plugin directory for “llms.txt”. They mostly auto-generate the file from your published content. Useful if you don’t have SFTP access or a deployment pipeline, but the auto-generated version usually needs hand-editing to be genuinely curated rather than a dump of everything. If you can write it by hand in 20 minutes, do that first.

What if my host doesn’t allow root file uploads?

Two workarounds: (1) a small must-use plugin that registers a virtual /llms.txt route serving the content from the database; (2) Cloudflare Workers if your site is behind Cloudflare — serve the file from the worker without touching the host. The mu-plugin approach is simpler for most WordPress setups.

Want help building this on your own site? Read the full SEO + GEO playbook or get in touch — I run AI SEO + GEO consulting projects for operator teams that want to compound visibility across both classic Google and AI engines.

Updated for May 2026

The 2026 AI-tools landscape evolved fast — this section is the operator-side snapshot:

OpenAI shipped GPT-5 in mid-2025; ChatGPT plus the API are now hybrid systems (GPT-5 + smaller fast models routed automatically). Sora is fully released for video. DALL·E 3 still ships images inside ChatGPT.
Anthropic is shipping the Claude 4.x family (4.5 → 4.6 → 4.7 in late 2025 / early 2026). The 1M-context window enables full-codebase or full-book reasoning. Claude Code is the default CLI agent for many engineering teams.
Google is on Gemini 2.5 Pro with the 2.5 Flash family for speed; Gemini is the model inside Google Workspace, Android, and the rebranded Google Search AI Overviews.
xAI’s Grok crossed Grok 3 in late 2024 and is the default model inside X Premium.
Image enhancers: most are now hosted by the big-three model providers natively (Image Upscale and Generative Fill inside ChatGPT and Gemini). Standalone tools like Topaz Photo AI, Magnific, and Krea AI hold quality leads but the floor moved up dramatically.

If the post you’re reading recommends a specific AI tool, verify the current model — most ship a new major version every 4–6 months in 2026.

Free tool: generate a spec-valid llms.txt for your own site with the llms.txt generator — no hand-formatting required.

Keep reading

SEO

Get the AI playbook in your inbox

Every Wednesday. 28,400+ operators. Zero fluff.

The llms.txt Playbook: Setup, Examples, and Why It Matters for AI Search

Table of contents

What llms.txt actually is

Why it matters in 2026

What to put in your llms.txt

What NOT to put in

The two-file pattern: llms.txt + llms-full.txt

Step-by-step: setting up llms.txt in under 30 minutes

Example: this site’s llms.txt structure

Common llms.txt mistakes I see

llms.txt — 2026 FAQ

Do all AI engines read llms.txt?

Will llms.txt help my classic Google rankings?

How often should I update llms.txt?

Can I use a WordPress plugin to manage llms.txt?

What if my host doesn’t allow root file uploads?

Updated for May 2026

How Search Engines Actually Evaluate Content Quality in 2026

How to Rank in AI Search Without Writing a New Blog Post

How to Get Your Brand Cited in ChatGPT Answers in 2026

Get the AI playbook in your inbox

The llms.txt Playbook: Setup, Examples, and Why It Matters for AI Search

Table of contents

What llms.txt actually is

Why it matters in 2026

What to put in your llms.txt

What NOT to put in

The two-file pattern: llms.txt + llms-full.txt

Step-by-step: setting up llms.txt in under 30 minutes

Example: this site’s llms.txt structure

Common llms.txt mistakes I see

llms.txt — 2026 FAQ

Do all AI engines read llms.txt?

Will llms.txt help my classic Google rankings?

How often should I update llms.txt?

Can I use a WordPress plugin to manage llms.txt?

What if my host doesn’t allow root file uploads?

Updated for May 2026

Related posts

How Search Engines Actually Evaluate Content Quality in 2026

How to Rank in AI Search Without Writing a New Blog Post

How to Get Your Brand Cited in ChatGPT Answers in 2026

Get the AI playbook in your inbox