XML Sitemaps & Robots.txt: Guide Search Engines
XML sitemaps and robots.txt are two critical files that tell Google how to crawl and index your site. Most websites set them up poorly or not at all.
These files are free, easy to set up, and have significant impact on which pages Google finds and indexes.
XML Sitemap: The What and Why
An XML sitemap is a map of your website. It lists every important page you want Google to index.
Google can find pages through links. But a sitemap makes discovery faster and more complete. Sites with sitemaps get better crawl coverage than sites without.
Creating Your Sitemap
Most modern website platforms (WordPress, Shopify, Webflow) auto-generate sitemaps. Check if yours is at yoursite.com/sitemap.xml.
If not, use a free tool like XML-Sitemaps.com to generate one. Or manually create the XML:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/page1</loc> <lastmod>2026-03-27</lastmod> <changefreq>weekly</changefreq> <priority>0.8</priority> </url> <url> <loc>https://example.com/page2</loc> <lastmod>2026-03-27</lastmod> <changefreq>monthly</changefreq> <priority>0.6</priority> </url> </urlset>
Sitemap.xml Best Practices
Include only important pages. Pagination, duplicate content, and low-value pages can be excluded.
Keep updated. Remove deleted pages. Add new pages. Update lastmod dates when content changes.
Use priority wisely. Set your most important pages to 0.8-1.0. Less important to 0.3-0.5. Google doesn't trust priority declarations, but they're useful for internal organization.
Set appropriate changefreq. Homepage might be weekly (changes often). Old blog posts might be yearly (rarely updated). This helps Google decide crawl frequency.
Keep under 50,000 URLs. If you have more, create a sitemap index file listing multiple sitemaps.
Submitting Your Sitemap
Submit your sitemap to Google Search Console. This tells Google exactly where to find it.
Also submit to Bing Webmaster Tools if you care about Bing rankings.
Robots.txt: Tell Google What NOT to Crawl
Robots.txt is a file at yoursite.com/robots.txt that tells crawlers what they can and can't crawl.
User-agent: * Allow: / Disallow: /admin/ Disallow: /private/ Disallow: /search-results/
Sitemap: https://example.com/sitemap.xml
This says: "Everyone can crawl the whole site except /admin/, /private/, and /search-results/. Also, here's my sitemap."
What to Disallow
Admin pages. /admin/, /dashboard/, /account/ (these don't need to be indexed).
Private content. /private/, /members-only/ (content not meant for public search).
Duplicate content. /search-results/, /filters/ (pages that are variants of other pages).
Crawl traps. Infinite pagination or session-based URLs that create unlimited crawl paths.
Sensitive pages. /password-reset/, /checkout/ (can become indexed and expose sensitive info).
What NOT to Disallow
Don't disallow pages you want to rank. Don't disallow pages just because they're not "important." Google needs to crawl pages to see their content and links.
Crawl Optimization
Your crawl budget (how many pages Google crawls daily) is limited. Massive sites have bigger budgets, but it's still finite.
Optimize by: - Blocking unnecessary pages in robots.txt - Using canonical tags to consolidate duplicate content - Fixing crawl errors so Google doesn't waste budget on broken pages - Creating an efficient internal linking structure so important pages get crawled more often
Common Robots.txt Mistakes
Blocking CSS/JavaScript. If you block /css/ or /js/, Google can't render your pages properly.
Blocking images. Block /images/ and Google can't see what your pages look like.
Blocking important pages. Don't disallow /blog/ or /products/ unless you really don't want them indexed.
Disallow all. Don't use "Disallow: /" (which blocks everything) unless you're taking the site offline.
Syntax errors. Robots.txt has specific syntax. One typo and the whole file is invalid.
Testing Your Setup
Google Search Console has a robots.txt tester. Check whether pages you want indexed are allowed.
Also check URL Inspection to verify pages are being crawled correctly.
The Bigger Picture
Robots.txt and sitemap.xml are foundational. They're easy to get right and have outsized impact on crawlability.
Spend 30 minutes setting these up correctly. You'll improve Google's ability to find and index your pages.
RankWizrd checks your robots.txt and sitemap.xml, identifies configuration issues, and recommends improvements.
Check your site's SEO score
Free audit in under 60 seconds. No credit card required.
Audit My Site Free →