What is a robots.txt file?+
robots.txt is a text file in your site's root that tells crawlers which pages they can and cannot access. It's a courtesy protocol - well-behaved crawlers (Googlebot, Bingbot) respect it, but malicious scrapers ignore it.
Does blocking a URL in robots.txt remove it from Google?+
No. Disallowing a URL prevents Google from crawling it, but if other sites link to it, Google may still index the URL (showing it without a description). To remove a URL from search results, use noindex in the page's HTTP header or meta tag.
What is crawl budget and how does robots.txt help?+
Crawl budget is the number of pages Googlebot crawls on your site per day. Disallowing low-value pages (admin panels, login pages, search result pages, infinite scroll URLs) focuses crawl budget on your important content.
Can I block just one user-agent?+
Yes. Use User-agent: GPTBot (block OpenAI's crawler) or User-agent: Bingbot (block Bing specifically). User-agent: * applies to all crawlers. Rules apply only to the user-agent block they appear under.