![]() |
![]() |
![]() |
||||
Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
![]() ![]() |
|
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. |
|
Thread Tools |
![]() |
#1 |
StraightBro
Industry Role:
Join Date: Aug 2003
Location: Monarch Beach, CA USA
Posts: 56,232
|
Google to make robots.txt an Internet standard after 25 years
![]() Google demanding more free work & expense from people to bend to their fucking will ![]() Google to make robots.txt an Internet standard after 25 years The Robots Exclusion Protocol (REP) — better known as robots.txt — allows website owners to exclude web crawlers and other automatic clients from accessing a site. “One of the most basic and critical components of the web,” Google wants to make robots.txt an Internet standard after 25 years. Despite its prevalence, REP never became an Internet standard, with developers interpreting the “ambiguous de-facto” protocol “somewhat differently over the years.” Additionally, it doesn’t address modern edge cases, with web devs and site owners ultimately still having to worry about implementation today. On one hand, for webmasters, it meant uncertainty in corner cases, like when their text editor included BOM characters in their robots.txt files. On the other hand, for crawler and tool developers, it also brought uncertainty; for example, how should they deal with robots.txt files that are hundreds of megabytes large? To address this, Google — along with the original author of the protocol from 1994, webmasters, and other search engines — has now documented how REP is used on the modern web and submitted it to the IETF. The proposed REP draft reflects over 20 years of real world experience of relying on robots.txt rules, used both by Googlebot and other major crawlers, as well as about half a billion websites that rely on REP. These fine grained controls give the publisher the power to decide what they’d like to be crawled on their site and potentially shown to interested users. It doesn’t change the rules created in 1994, but rather defines essentially all undefined scenarios for robots.txt parsing and matching, and extends it for the modern web. The robots.txt standard is currently a draft, with Google requesting comments from developers. The standard will be adjusted as web creators specify “how much information they want to make available to Googlebot, and by extension, eligible to appear in Search.” This standardization will result in “extra work” for developers that parse robots.txt files, with Google open sourcing the robots.txt parser used in its production systems. This library has been around for 20 years and it contains pieces of code that were written in the 90’s. Since then, the library evolved; we learned a lot about how webmasters write robots.txt files and corner cases that we had to cover for, and added what we learned over the years also to the internet draft when it made sense. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#2 |
Pay It Forward
Industry Role:
Join Date: Sep 2005
Location: Yo Mama House
Posts: 76,894
|
they want to reduce page removal right?? this is something they have to pay for currently. i think they are trimming the fat to focus on tech items. i use robots on everything
__________________
TRUMP 2025 KEKAW!!! - Support The Laken Riley Act!!! END DACA - SUPPORT AZ HCR 2060 52R - email: brassballz-at-techie.com |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#3 |
Confirmed User
Industry Role:
Join Date: Aug 2006
Location: Midwest
Posts: 3,802
|
Been running websites for over 15 years and making money from it. Tens of thousands of sites at least...
And every single one of them has had a robots.txt file. I don't see the issue. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#4 | |
StraightBro
Industry Role:
Join Date: Aug 2003
Location: Monarch Beach, CA USA
Posts: 56,232
|
Quote:
It will be mandatory and you'll have to add all sorts of parameters that you don't currently have, and likely aren't aware of, and if any of them are null, or if you don't have the robot text file exactly how Google wants it you will be dinged and your SE placement will suffer. |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#5 | |
Pay It Forward
Industry Role:
Join Date: Sep 2005
Location: Yo Mama House
Posts: 76,894
|
Quote:
![]()
__________________
TRUMP 2025 KEKAW!!! - Support The Laken Riley Act!!! END DACA - SUPPORT AZ HCR 2060 52R - email: brassballz-at-techie.com |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#6 |
StraightBro
Industry Role:
Join Date: Aug 2003
Location: Monarch Beach, CA USA
Posts: 56,232
|
We agree
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#7 |
Too lazy to set a custom title
Join Date: Mar 2002
Location: Australia
Posts: 17,393
|
Funny how Google is going on about making a de-facto a standard, when they explicitly ignore a fairly important (IMHO) de-facto directive: Crawl-delay.
Website: I'm asking you nicely to please limit your fetching to once per 60 seconds. GoogleBot: No. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#8 | |
Confirmed User
Industry Role:
Join Date: Jun 2003
Location: Switzerland / Germany / Thailand
Posts: 5,469
|
Quote:
actually Google shows many documents and websites that do not have a robot.txt now let´s imagine a funny example: a weapon company uploads the newest secret version of a killer machine into their web - Google crawls it and publish it without the explicit demand of doing so - they would be also in trouble. THE INTERNET law is not existing and google works worldwide under the laws of 255 different countries. I think that robots.txt would be the simplest way to allow or deny to crawl and publish stuff from a site. we can see everywhere in internet that rules and laws are going to an excessive point. users have to agree to cookies (even when this was a common technique for the part 25 years). in addition, an internet presence is not necessarily a privilege of companies. consumer protection can also apply here to the site operator. |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#9 | |
God Bless You
Industry Role:
Join Date: Aug 2014
Location: Glasgow, $cotland
Posts: 1,467
|
Quote:
If in the robot file you select which file or directory to bypass the possible that Google will do. But for others it will be a gift.
__________________
magneto664 📧 gmail.com Adult Backlinks 💘Best Website Stats 💘 Best CDN for Adult Content My Fav: 👍Chaturbate 👍 Stripchat 👍 Dateprofits 👍 AdultFriendFinder |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#10 | |
Industry Role:
Join Date: Aug 2006
Location: Little Vienna
Posts: 32,235
|
Quote:
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#11 | ||
Confirmed User
Industry Role:
Join Date: Jun 2003
Location: Switzerland / Germany / Thailand
Posts: 5,469
|
Quote:
I really know how a robots.txt is working but the point is that millions who have an internet presence don´t know. if google crawls something from their site WITHOUT AN EXPLICIT demand to do so, they can be seen as "victim" from the one or other judge and can sue Google for millions. this is why it would make sense to make robots.txt as THE rule to crawl your site and sites without robots.txt would not be touched. Quote:
|
||
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#12 | |
Confirmed User
Industry Role:
Join Date: Jun 2003
Location: Switzerland / Germany / Thailand
Posts: 5,469
|
Quote:
the laws in the various countries are so different that you can not even decide who is a professional who HAVE to know it and who is not. when the internet started nobody ever thought about such things like privacy and permission to crawl a page. it was simply assumed that everyone who posts something on the internet wants others to find it. this case have changed a lot in the meantime and the views on right or wrong in the world are so completely different that everything have to be EXPLICIT allowed and not just assumed. |
|
![]() |
![]() ![]() ![]() ![]() ![]() |