+91 8178116799 [email protected]
Career With Us Our Blogs Contact Us

Agent - Online
@ 2021 Powered By Canvaas  

BEGINNERS: EVERYTHING TO KNOW ABOUT robots.txt FILE?

While considering SEO for your website, you must have come across the terms robots.txt file and XML sitemap.

If you are an expert in SEO, you might be aware about how to deal with these files. But for a beginner, though small, but it could be a tricky task as it might hamper the speed and indexing of your web pages.

This blog covers the topic of robots.txt file. XML sitemaps will be covered in the upcoming blog.

Use of robots.txt?

A Web crawler (a.k.a. spider/ spiderbot/ crawler) is an Internet/browser bot that systematically browses/crawls your website for the purpose of Web indexing. Now, as someone clicks on your website, the first step followed by the crawler is to look for the robots.txt file for the specified crawling instructions. Crawlers can crawl all sorts of data such as content, links on a page, broken links, sitemaps, images and HTML codes.

Now, any of the below mentioned cases would arise:

Case 1: Absence of robots.txt file: In this case the crawler gets a thumbs up for crawling each and every page, file and folders of your website and consider them further for indexing.

Case 2: Presence of robots.txt file: In this case the crawler only crawls as per the instructions specified in the file.

General terms involved:

1: User-agent: Every browser has its own crawler bot. In case you only want to allow any specific crawler to index your website, it has to be specified. It is always advised to check your web server logs to see how often they are actually crawling your site. List of Crawlers

2: User-agent*: This command states that every crawler available is allowed to crawl the website and every page is allowed for the crawling purpose.

Example 1:

User-agent: *

Disallow:

Here you are not blocking any content/data on your website from the crawlers.

Example 2:

User-agent: *

Disallow: /

Here you are blocking the entire data on your website from the crawlers.

Example 3:

User-agent: *

Disallow: /folder/

Allow: /folder/file/

If you are blocking a folder but want its particular file to be available for crawling, you need to use the above syntax.

You can use this syntax to allow or disallow any folder/ file/ links/ sitemaps.

robots.txt for WordPress

Though initially used as a tool for blogging, today over 36% websites are built on WordPress. Hence, the pointers included below would help you in creating robots.txt file for any WordPress website.

User-agent: *

Disallow: /wp-admin/

Allow: /wp-content/uploads/

Disallow: /wp-content/plugins/

Disallow: /readme.html/

Disallow: /wp-content/themes/

Sitemap: https://www.example.com/sitemap.xml

Above is a basic structure of robots.txt for any WordPress website. You can ofcourse allow or disallow any folder/file/image as required.

2 thoughts on “BEGINNERS: EVERYTHING TO KNOW ABOUT robots.txt FILE?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.