How to prevent your page form showing up on Google - Noindex vs Disallow

Disallow does not always work. Google can still show your page on SERP results without crawling your page.

I still see a lot of websites using disallow to prevent pages from showing up on Google. 
The problem with using disallow in Robots.txt file is that it does not guarantee that your page wont be listed on Google. 

For example, this website's robot.txt file is asking Google to disallow the /devices/ folder. 


But this page is still being shown on Google!


This happens due to a variety of reasons - and in this blog post we shall first look at why this happens, and then at ways to prevent this from happening. 

Google can show your page on SERP results without crawling your page

First - there is a difference between crawling a page, and listing the page on Google.
Even thought your robots.txt file tells Google to not crawl the page - Google can still understand what the page is about by looking at all the incoming links and the anchors to the page.
They are thus getting these results without violating the robots.txt file, and can therefore still display this page on the SERP results even though they didn't crawl this page.

They can also get information about the page from other places that talk about this page. 
This used to be from the Yahoo Directory or the DMOZ in the olden days, 
but it could be possible that they get the same information from any website that shows this information. 
So, even without directly fetching the URL themselves, Google can still show un-crawled URLs on the SERP results.

How to prevent your page form showing up on Google

If you truly do not want your page to show on Google, you can do it by adding a no-index meta tag at the top of the page.
When Google sees the no-index meta tag on the page, it will drop the page from the SERP completely. 

Add this meta tag to the top of pages you don't want to show up on Google

<meta name="robots" content="noindex,nofollow">

And once Google sees this tag, they will drop it from their index completely. 

But the problem with this is that you have to add this tag to each page individually. 
To make this process easier - search engines have introduced the X-Robots-Tag HTTP header.

Recent Blogs