The Googlebot

by Thomas Klein on 2011/06/23

Search engines getting information from websites by seconds. This process is said to be “crawling”. The technical instrument to crawl the net, respectively websites, is called (web-)crawler, spider, robot or bot for short. The big three search engines host their own webcrawler. The most prominent and also the most important one comes from the biggest search engine Google. Prominent cause the Googlebot seems to be more developed compare to the others. Important cause Google´s market share grabbed just about 66 percent in May, 2011 comScore.

search engine rankings us may 2011

Search Engine Rankings May 2011, U.S.

 The crawl process

But what exactly is a webcrawler? Generally you can technical describe it as a script with much code inside. A webcrawler, e.g. the Googlebot, crawls and copied the whole content of your website, including text and pictures. The crawler-action on your website started from top to down, left to the right. Having that, all informations of your sites will be stored and found at the index of Google. But which files getting indexed?

pic of the googlebot


 What the Googlebot can read (and what not)

Webmaster should know which files getting indexed by the webcrawler e.g. Googlebot. For that reason i created my first infographic. In addition to it, i designed the Googlebot from my point of view. The infographic shows you which formats and files arn´t readable by the spider. Some of these f&fs, like Silverlight, vids and iframes arn´t readable by the bot! On the other hand some of these f&fs are readable. For instance flash e.g. swf-files, AJAX, JavaScript and images. If the implementation of these f&fs went of correct, the crawler has no issues to interprete the code right.


Googlebot infographic