One of the most basic parts about SEO is indexing. If Google can’t, or won’t, save your site documents in their index, no other actions will make any difference, they still don’t have anything to show.
In this blog post I will discuss indexing and solutions to when Google won’t index anything at all on a site, and when they don’t index parts of a site. Since this is a general text it’s not guaranteed that this advice will solve your specific problem, but hopefully it will point you in the right direction and give you an idea of where it might be reasonable to start looking for the problems at hand.
Find Out If A Page Is Indexed
It’s pretty easy to find out if Google has indexed your site. Go to Google and type site:domainname.com in the search bar. That will give you all the pages that Google has in its index. Above the results you will find how many pages are indexed in total.
In our case, there are 279 pages indexed, a reasonable number considering the fact that we have approximately 170 blog entries, around 50 different kinds of pages and about 30 tags and categories, some with several pages. On a site as big as ours most pages should be indexed, if not there is strong indication that something is not right. When the sites grow (think huge web shops or newspaper sites) it is common that not all pages are indexed. That’s OK as long as most pages are. A similar search will also give an indication of how important Google finds your site’s pages, sorted by the order of which they are displayed in the results. Normally the index page should be the first one in the list. If not, it may indicate that there are some problems with your site.
If you have a lot of pages on your site and you want to know if a specific page has been indexed, it is possible to search in the same way as above, but type the whole URL:
No Pages on the Site Are Indexed
If Google hasn’t been able to index any page at all on your site you have either an easily fixed problem at hand, or a pretty tough job to deal with.
The most common cause for this issue is that you somehow have blocked Google from visiting the site. Start by opening the source code on a page (right click and choose “Show source code”, “Show page source” or similar) and see if you can find a “robot tag” that includes “noindex”.
If you find such a tag, that is most likely the problem and you simply need to remove it. How to do that, depends on the CMS you are using.
If you can’t find that tag anywhere, the next step is to check your robots.txt-file. That you can find by typing your domain name into the address bar of your browser and adding /robots.txt. Now you want to find a line with the text: “Disallow: /”. If you find that, it means that you’re not allowing Google or other bots to visit your site. Remove “/” and the problem will be solved!
Unfortunately, if you don’t find a noindex tag or that / in robots.txt, it will get a bit more complicated. Then the issue is most probably that Google can’t find your site, or that they really don’t like it. Check that you have at least some external links to the site, or make sure to get a few so that Google can find it. Go to the Google Search Console (previously called Google Webmaster Tools) and check the manual actions report to make sure you don’t have any manual actions issued against the site. Also check that Google doesn’t indicate that they have a hard time reading your documents, and make sure you have a sitemap uploaded. If none of this helps, it might be in your best interest to get in touch with a professional search engine optimizer who can take a look at your site.
Part of the Site Is Not Indexed
Now we’re getting to the difficult part when it comes to finding and solving the problems. When part of a site is not indexed, or when seemingly random pages won’t get picked up by Google, it could indicate almost anything. Therefore, I will address a few common reasons why Google chooses not to save a number of pages.
Bad structure
One of the most common problems is that the site structure, the internal links and menus, are poorly built. Either due to there being so many links in the menus that Google can’t spider them all, or due to there being important parts of the main menus missing, so the robot has to enter through back doors. Make sure you have a clear and hierarchical structure for your content. This becomes more important the bigger your site gets.
Internal duplicate content
Having several pages with the same or similar content is something Google does not like. This problem may be due to having many similar products, or to a technical error creating copies of pages by, for example, adding parameters or session IDs. The solution to this is to create unique pages, or to use canonical tags if it is due to a technical problem.
External duplicate content
If much of your content also can be found on other sites (you might use the same supplier, or perhaps someone has copied your content) it is common for Google to choose to only index pages belonging to one site. In this case the solution is to create content that only can be found on your site.
Thin content
When Google comes across a site with many pages, but with not much good information, it sometimes chooses to skip that site. In that case, create more interesting content.
Not enough links
If your site doesn’t have enough links, there is a possibility that Google won’t find your site interesting enough to spider through all pages. That can be solved by getting more relevant links.
Too many/spammy links
If you have a lot of bad links, Google might see that as a negative signal, therefore choosing not to show these pages. Solve this by removing bad links, or worst-case scenario – use the Disavow Tool.
Too many ads
If Google comes across a site that has a lot of ads, especially if it is placed “above the fold”, there is the possibility that they will remove those kinds of pages from their index. Make sure that it is your own unique content that takes up the most space on the page, not ads.
There are several other things that can cause