
For several years, it has been missing—Google’s PDF on search engine optimization. Now, it seems that they have, at least for the moment, found it again.
The disappearance of Google’s PDF has been a valuable lesson in one of the lesser-known features of Google’s algorithm and a very clear explanation of the problems you can encounter with duplicate content. It has also been a bit of a frustration for all of us working to make our sites visible in search engine optimization. We’ll take a look at this today, but first, let’s go over what has actually happened:

In 2011, Google created a PDF in a number of languages. In Swedish, it became known as Grundhandbok om sökmotoroptimering (The SEO Handbook). One might think it’s extra ironic, considering how poorly search engine optimization around the handbook has been managed, but the PDF contained relatively good content and it garnered some links, both in Sweden and abroad. Combined with a lot of content, it started ranking well for keywords like Sökmotoroptimering (Search Engine Optimization).
As clumsy as they are over in Mountain View (just kidding, they have a lot of really smart people there), they, of course, didn’t optimize their own site particularly well, meaning it wasn’t even easy for Google to crawl it. For that reason, the guide “disappeared.” Essentially, since it managed to make its way into the search results, copies of the PDF on other domains have appeared instead of the original.
It’s been a case of everything from playful types to less scrupulous SEO agencies simply publishing a copy of Google’s material (I can understand individual bloggers doing that, but if you run a company that works with, against, or near Google, I’d probably be cautious of copyright violations). When Google later failed to recognize that their own version was the original, others suddenly started appearing in the position they once held. Who exactly it’s been has varied over time.
What can we learn from this?
A well-known feature is at play here. Google has methods for ranking what they believe to be the original version of a text, and this is what causes this effect. Since they don’t measure the publication date or similar factors (because that would be too risky), anything can be counted as the original, especially if the original text is hard to index.
When Google then assumes that another PDF is the source, they assign the authority from the other sites to that page. They merge the trust that the page has gained, which is why completely unknown SEO sites and bloggers have been able to shoot up into the top 10 simply by publishing Google’s PDF.
Now, it’s true that it’s been the PDF showing up, so I find it hard to imagine anyone benefiting much from it except for an SEO company in southern Sweden that chose to “cloak” — that is, show one type of content to Google and another to visitors. They simply sent visitors to their own site while the spiders read the PDF.
Think about this
I think it’s worth considering just how poorly this PDF has performed in search results. It says something about the SEO advice Google gives. If simply creating a good service is enough for Google to show it (as they so often claim), then they can’t possibly have such a good service themselves, right?


Magnus is one of the world's most prominent search marketing specialists and primarily works with management and strategy at his agency Brath AB.