GoFuckYourself.com - Adult Webmaster Forum - View Single Post - Tech Anyone following Google's latest May 2022 Core Update?

dcortez · 05-28-2022, 12:04 PM

To clarify Googlebot and robots.txt...

At face value, Google does recognize robots.txt and some directives. But there are caveats which it discloses.

It did officially drop the NOINDEX directive in 2019.

It does claim to support the "disallow" directive, but...

https://developers.google.com/search...obots.txt-file

Google: "Warning: Don't use a robots.txt file as a means to hide your web pages from Google search results."

This may be moot in the case of disallowed paths in robots.txt that are not listed by anyone else, such as your outbound affiliate link bounce scripts. But, any external references (outside the realm of your robots.txt) to the same bounce URL may cause Googlebot to dig or list deeper.

At face value, suggestions in this thread to disallow paths that reveal a sponsor destination are "correct".

I've seen Googlebot do too many things since the it's beginning in the nineties, that make it impossible for me to take Google at it's word.

For example, Google notoriously does, what I call, the "pig and the electric fence" trick.

If you try to keep a pig contained with an electric fence, it's important to note that pigs are exceptionally smart, and will constantly "test" the fence for a momentary outage. They are so tenacious that pigs will often bust out, where other livestock relies on old memories of what happened when they touched the fence months ago.

I have watched googlebot test/spider for the existence of pages that neither existed on my websites, nor were ever linked to from anywhere. The spider would literally concoct URLS with random text and try them.

Anyhow, as far as I'm concerned, only htaccess can keep Googlebot out.

05-28-2022, 12:04 PM
dcortez DINO CORTEZ™ Industry Role: Join Date: Jun 2003 Location: Vancouver Island Posts: 2,145	To clarify Googlebot and robots.txt... At face value, Google does recognize robots.txt and some directives. But there are caveats which it discloses. It did officially drop the NOINDEX directive in 2019. It does claim to support the "disallow" directive, but... https://developers.google.com/search...obots.txt-file Google: "Warning: Don't use a robots.txt file as a means to hide your web pages from Google search results." This may be moot in the case of disallowed paths in robots.txt that are not listed by anyone else, such as your outbound affiliate link bounce scripts. But, any external references (outside the realm of your robots.txt) to the same bounce URL may cause Googlebot to dig or list deeper. At face value, suggestions in this thread to disallow paths that reveal a sponsor destination are "correct". I've seen Googlebot do too many things since the it's beginning in the nineties, that make it impossible for me to take Google at it's word. For example, Google notoriously does, what I call, the "pig and the electric fence" trick. If you try to keep a pig contained with an electric fence, it's important to note that pigs are exceptionally smart, and will constantly "test" the fence for a momentary outage. They are so tenacious that pigs will often bust out, where other livestock relies on old memories of what happened when they touched the fence months ago. I have watched googlebot test/spider for the existence of pages that neither existed on my websites, nor were ever linked to from anywhere. The spider would literally concoct URLS with random text and try them. Anyhow, as far as I'm concerned, only htaccess can keep Googlebot out.