All About the Robots.txt File

All About the Robots.txt File

Last Edited June 5, 2019 by Garenne Bigby in Search Engine Optimization

Those who create websites use things called robots.txt files to tell web robots such as search engine robots how to crawl particular pages on theirwebsites. REP, is a set of rules that dictate how robots may or may not crawl the web and deal with content they come across. The robots.txt file is part of this and indicates whether certain web crawlers can or cannot crawl the various parts of a website by allowing (or not) behaviors of certain user agents.

It’s important to learn about robots.txt because it can really help or really hurt your website. Read on to get a good concept of what needs to be done to make the most of your website.


what is a robots txt


Is a Robots.txt File Important?     

If you are not using robots.txt correctly, it can really hurt your rankings because the file controls how search engine spiders, or crawlers, see and interact with your web pages. Bots read your robots.txt file (if you have one) and that tells them whether or not they should crawl your site and if so, to what extent and when.

The very first thing Google bots look at is a site’s robots.txt. It does so in order to see whether or not it has permission to crawl around. Your robots.txt file is a set of instructions for bots, and if you know what you are doing, you can make it say anything that you want. You can even set up a delay so that the bots will crawl, but after a time period which you specify in your robots.txt file.

How to Tell if You Have a Robots.txt File

There are ways to tell if you have a robots.txt file already. The most common is to type in your root domain URL and then add /robots.txt at the end of it. For example, if your website is www.fansofthegrimreaper.com, type in www.fansofthegrimreaper.com/robots.txt. If there is no .txt page, then you do not currently have a live robots.txt file up and running.

This can be good or bad depending on what you want to do with your website.  If you do have a robots.txt tile, you must ensure that it is not hurting your ranking by blocking content that you do not want blocked.

Reasons to Have a Robots.txt File     

While you do not necessarily need to have a robots.txt file, in some cases it is beneficial to have one up and running. If you have content that you want blocked from certain search engines or if you want to fine-tune access from reputable robots, it is imperative to have robots.txt functioning. Or, perhaps your site is live, but you are still working on editing it, so you do not want to show up in search engines quite yet.

You can configure robots.txt to adhere to all of your criteria. Most webmasters have the capabilities and permissions to create, customize and successfully utilize a robots.txt file.

When Not to Have a Robots.txt File     

If your website is relatively simple, error free and does not contain any files that you want blocked from search engines, then there is no need for you to have a robots.txt file. Even when you do not have a robots.txt file, search engine bots will still be able to have full access to your site, so there are no reasons to worry that they will not be able to find you. In fact, they may find you much easier than they would if you do have a robots.txt file, especially if it is poorly configured or contains mistakes.

If you want anyone and everyone, the more, the merrier, to find your site and see everything on it, your best bet is not to have a robots.txt file at all. There is nothing wrong with this, and it is a widespread practice. Do not feel as though you are missing out on some key tool for search engine rankings. In fact, by not having a robots.txt file, you may enjoy higher rankings than you would have otherwise.


how to create a robots txt


How to Create a Robots.txt File    

If you can copy and paste, then you too can create a robots.txt file. It is very simple, and no programming skills are needed. Microsoft Word or Notepad is sufficient. There is no need to use a code editor. That would be overkill. There are countless sites with instructions on how to set up your robots.txt file.

Simply find an example of one that has what you need and then copy and paste the text into your own file. Do not be afraid because it is just as easy to check and see if your file is set up correctly as it is to make it or fix it. There are many online tools available to help you for free.

What Should the Robots.txt File Say?     

There are three main things that the robots.txt file does: it will allow, it will disallow, or it will partially allow your site to be crawled. If you want your entire website to be crawled, you have three options. First, you cannot have a robots.txt file, meaning one does not exist on your site at all. When a bot comes to crawl, it will look for a robots.txt file right off the bat. If it does not find one, it will then visit all of the content on all of your pages because there is nothing telling it not to.

You can also make a blank or empty robots.txt file. This will serve the same purpose as not having one. When the bot comes to visit, there will be nothing for it to read and it will again crawl through all of your material. If you do not want any of your content crawled by bots, then you must set up a full disallow robots.txt file. Be careful, though, because this will mean that Google and all other search engines will not index or ever display your site. This method is not recommended.

Why You Should Utilize a Robots.txt File    

If you went through the time, trouble and expense of creating a website, chances are that you want people to look at it and be able to find it if they do not already know it exists. Crawlers are your best bet when trying to get higher in search engine rankings. At times, you may not want your site riddled with bots crawling about, at least right away.

An example of this would be if you have a page that is still somewhat of a rough draft. Or, a crawl delay may come in handy to not have your server get overloaded from too much traffic. You also may not want your internal search engine page to show up anywhere else because it will not make sense out of context.  


How to Test Your Robots.txt File

If you have set up a disallow or partial disallow robots.txt file, it is a good idea to check and make sure that it is working. There are several free tools available to do this. They can tell you if files that are important to Google are being blocked and also show you what your robots.txt file says.

calling all robots txt


Calling All Bots     

The robots.txt file is very much like a set of directions for the bots that visit your site. You can leave specific instructions for specific bots or use a “wildcard” if you want to use the same set of instructions for all of the bots. Googlebot and bingbot are two examples of bots that may potentially visit your site. Generally speaking, it is a good thing when bots visit your site, providing that you do not have any information or graphics that you do not want to get indexed.

If that is the case, perhaps think again about having your private content posted on a website, to begin with. If you have a picture that you do not want anyone else to see, it should not be on the internet. However, if you are a professional photographer wanting to sell prints of your work, then you will want to be careful that your images cannot be stolen.

While you may want bots to be able to find your site because you want to find new customers, you may not want the actual picture that you are trying to sell to show up in search engine results. If you do, make sure that it is copyrighted or has a watermark on it so that it cannot be easily downloaded or otherwise stolen. If you have a picture that you just do not feel is relevant to your site, that is the type of thing you may want to have a partial disallow for in your robots.txt file.

What is a Crawl Delay and Why You Should Care

Sometimes bots can come crawling before you want them to and this is undesirable, to say the least.  Yahoo, Yandex, and Bing are several examples of bots that are often very quick to arrive. You can keep them at bay by applying Crawl-delay: 10 to your block within robots.txt. This will make them wait ten seconds before crawling and before re-entering your site. This will help if your site is becoming too bogged down with traffic.

This method also is helpful if you are editing a live website, so that visitors won’t unknowingly happen upon a work in progress. This could cause them to think the website is subpar and never return. If you delay crawlers, this will lessen the risk of this happening.

Things You May Not Want Crawled

There may be some instances where you do not want certain content on your site crawled by bots. This could include personal photography that you have taken or information that you do not want made more public than it already is. Or, perhaps you have an internal search bar that searches only within your site.

That is great, but you do not want Google displaying the page where someone’s search query results may have shown up. That may be useless, or worse, confuse a potential new visitor to your site who will not take the time to poke around and find the relevant information.

What Should You Avoid Regarding Robots.txt Files     

Crawl delays can be useful at times, but you must be careful because they can very easily do more harm than good. This can be especially detrimental if you have a large website with many pages. You should also avoid blocking bots from being able to crawl through your entire site because you will never show up in search engine results then.

Unless you are making some kind of scrapbook for yourself, or you are a very private person who for some reason still wants to be on the internet, it is vital that you do show up in search engine results. In fact, many people spend a lot of time and money to try to improve their rankings. By forbidding crawlers on your site, you could be shooting yourself in the foot so to speak.


Conclusion

It is very important to decide whether or not you want to use a robots.txt file and if you decide that you do, that it is configured properly. An incorrect robots.txt file can block bots from indexing your page or pages which will be detrimental to your search engine rankings. In fact, you may not even show up at all. It is important to remember that it is not necessarily a bad thing to not use a robots.txt file if you want anything and everything on your website to be crawled by bots. This is a very common and useful practice and one that is perfectly acceptable.

Robots.txt is just one of a wide array of ways you can improve (or not improve, depending on what you ultimately want to do) your search engine rankings. Some people live to be the best. Others prefer secrecy. It is your website, and you can do with it whatever you want.

Garenne Bigby
Author: Garenne BigbyWebsite: http://garennebigby.com
Founder of DYNO Mapper and Advisory Committee Representative at the W3C.

Back

Related Articles

Create Visual Sitemaps

Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.

Popular Tags

Search Engine Optimization SEO Create Sitemaps Accessibility Testing Sitemaps UX User Experience Sitemap Generator Content Audit Website Content Audit
Create Interactive Visual Sitemaps

Discovery has never been easier.

Sign up today!