When blocking a URL on your website, you are able to stop Google from indexing certain web pages with the purpose of being displayed in Google's Search Engine Results. This means that when people are looking through the search results, they will not be able to see or navigate to a URL that has been blocked, and they will not see any of its content. If there are pages of content that you would like to refrain from being seen within Google's Search Results, there are a few things you can do to complete this task.
Most people might not give this a second thought, but it there are a few reasons that someone would want to hide any amount of content from Google.
You can keep your data secure. It is possible that you'd have a lot of private data that is present on your website that you'd like to keep out of users' reach. This could be things like contact information for members. This type of information needs to be blocked from Google so that the members' data is not being shown in Google's Search Results pages.
Getting rid of content from a third party. It is possible for a website to share information that is rendered by a third party source, and is likely available other places on the internet. When this is the case, Google will see less value in your website when it contains large amounts of duplicate content within Google's Search Results. You will be able to block the duplicate content in order to improve what Google will see, thus boosting your page within Google's Search Results.
Hide less valuable content from your website visitors. If your website has the same content on multiple places on the site, this could have a negative impact on the rankings you get with Google Search. You can perform a site-wide search in order to get a good idea of where your duplicate content could be, and understand how this related to users and how they navigate the website. Some search functions will generate and show a custom search results page each time that a user enters a search query. Google will crawl all of these custom search results pages one by one when they are not blocked. Because of this, Google will be able to see a website that contains many similar pages, and would actually categorize this duplicate content as spam. This results in Google Search pushing this website further down the list in the Search Results pages.
Robots.txt files are located at the root of the site that will indicate the portion(s) of the site that you do not want search engine crawlers to access. It uses the “Robots Exclusion Standard”—a protocol that contains a small set of commands that will indicate where web crawlers are allowed to access.
This can be used for web pages, and should be used only for controlling crawling so that the server isn't overwhelmed by going through duplicate content. Keeping this in mind, it should NOT be used to hide pages from Google's Search results. Other pages could point to your page, and the page will be indexed as such, totally disregarding the robots.txt file. If you'd like to block pages from the search results, there are other methods, like password protection.
Robots.txt will also prevent image files from showing up in Google search results, but it does not disallow other users from linking to the specific image.
NOTE: know that when you combine more than one directive for crawling and indexing may cause the directives to counteract each other.
Learn how to create a robots.txt file. First, you will need access to the root of the domain. If you don't know how to do this, contact your web host.
The syntax associated with robots.txt matters greatly. In its most simple form, the robots.txt file will use two keywords—Disallow and user-agent. The term Disallow is a command aimed at the user-agent that will tell them that they should not be accessing this particular link. User-agents are web crawler software, and most of them are listed online. Opposite of this, to give user-agents access to a specific URL that is a child directory in a parent directory that has been disallowed, you will use the Allow term to grant access.
When user-agent and allow or disallow are together, it is considered to be a single entry in a file where the action will only be applied to the specified user agent. If you'd like to direct this to multiple user-agents, list an asterisk (*).
You will then need to make sure that your robots.txt file is saved. Make sure that you do the following so that web crawlers will be able to find and identify your file.
There is a Testing tool specifically for robots.txt, and it will show you if the file is successfully blocking Google's web crawlers from accessing specific links on your site. The tool is able to operate exactly like Googlebot does, and verifies that everything is working properly.
To test the file, follow these instructions:
There are some limitations to the robots.txt testing tool. Know that the changes that have been made within the tool are not saved automatically to your own web server. You will have to copy the changes as described previously. The tester tool will also only text the file with Google's user-agents or crawlers like Googlebot. Google is not responsible for how other web crawlers interpret the robots.txt file.
Finally, you will submit the file once it has been edited. Within the editor, click on Submit. Download your code from the tester page, and then upload it to the root of the domain. Verify, and then submit the live version.
When there is private information or content that you do not want included in Google's search results, this is the most effective way to block private links. You should store them within a password protected directory located on your website's server. All web crawlers will be blocked from having access to this content contained within the protected directories.
It is possible to block a page from appearing in Google Search when you include the noindex meta tag in your web page's HTML coding. Googlebot will crawl that page and see the meta tag, and then will totally drop that page from the search results- even if other websites link to it. NOTE: in order for this meta tag to work, the page cannot be hindered by a robots.txt file. If it is blocked by this file, crawlers will not see the noindex meta tag and might still come through in the search results if other pages link to it.
The noindex tag is very useful when you do not have access to the root of your server, as this is the area that allows control over the website through each page individually. If you'd like to prevent most search engines from indexing a specific page on your website, use the meta tag <meta name=”robots” content=”noindex”> into the <head> section of the page. If you'd like to prevent only Google from indexing the page, exchange “robots” for “googlebot”. Various search engine crawlers will interpret the noindex instructions differently, and it is possible that the page could still appear in search results from some search engines.
You can actually help Google to spot your meta tags when blocking access from certain content. Because they have to crawl the page in order to actually see the meta tags, it is possible that Googlebot will miss the noindex tag. If you know that a page that you've tried to block is still showing up in search results, it may be possible that Google has not crawled the site yet since the tag has been added. You will need to send a request for Google to crawl the page by implementing the Fetch as Google tool. If the content is still showing up, it is possible that the robots.txt file is stopping the link from being seen in Google's web crawlers, so the tag can't be seen. If you'd like to unblock the page from Google, you will need to edit the robots.txt file- which can be done right from the robots.txt testing tool.
It is possible for you to have your content blocked from being displayed on varying Google properties after it has been crawled. This would include Google Local, Google Hotels, Google Flights, and Google Shopping. When you choose to opt out of being displayed on these outlets, the content that has been crawled won't be listed on them. Any content that is being displayed currently on any of these platforms will be removed in no more than 30 days from opting out. When opting out of Google Local, you'll be opted out globally. For the other properties of Google, the opt-out will apply to the services that are hosted on Google's domain.
Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.