Search engines getting information from websites by seconds. This process is said to be “crawling”. The technical instrument to crawl the net, respectively websites, is called (web-)crawler, spider, robot or bot for short. The big three search engines host their own webcrawler. The most prominent and also the most important one comes from the biggest search engine Google. Prominent cause the Googlebot seems to be more developed compare to the others. Important cause Google´s market share grabbed just about 66 percent in May, 2011 comScore.
The crawl process
But what exactly is a webcrawler? Generally you can technical describe it as a script with much code inside. A webcrawler, e.g. the Googlebot, crawls and copied the whole content of your website, including text and pictures. The crawler-action on your website started from top to down, left to the right. Having that, all informations of your sites will be stored and found at the index of Google. But which files getting indexed?
What the Googlebot can read (and what not)