Quote:
Originally Posted by thommy
I think this is just one reason the other is that they donīt get fined for what they show.
actually Google shows many documents and websites that do not have a robot.txt
now letīs imagine a funny example:
a weapon company uploads the newest secret version of a killer machine into their web - Google crawls it and publish it without the explicit demand of doing so - they would be also in trouble.
THE INTERNET law is not existing and google works worldwide under the laws of 255 different countries.
I think that robots.txt would be the simplest way to allow or deny to crawl and publish
stuff from a site.
we can see everywhere in internet that rules and laws are going to an excessive point. users have to agree to cookies (even when this was a common technique for the part 25 years).
in addition, an internet presence is not necessarily a privilege of companies. consumer protection can also apply here to the site operator.
|
Average internet user does not have any knowledge about robots and crawling so you cant really expect everyone to follow. A better solution would be , instead crawl robot crawling everything on website, is to have explicitly stated what should be crawled instead.