Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 05-20-2012, 02:13 PM   #1
Darrell
Confirmed User
 
Join Date: Feb 2003
Location: In bed asleep
Posts: 803
Help with robots.txt please

I don't want search engines to index any pages in the root folder and any other folders that has the word page in the url. So for example domain.com/page1.php I don't want indexed.

I believe Disallow: /page will stop any pages starting with page in the root folder, but can a wild card be added to include all files in folders as well?

Thanks for any help.
Darrell is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-20-2012, 02:16 PM   #2
CYF
Coupon Guru
 
CYF's Avatar
 
Industry Role:
Join Date: Mar 2009
Location: Minneapolis
Posts: 10,973
User-agent: *
Disallow: /

or:

User-agent: *
Disallow: /folder

/ will disallow your whole site, and /folder will disallow everything in that folder.
__________________
Webmaster Coupons Coupons and discounts for hosting, domains, SSL Certs, and more!
AmeriNOC Coupons | Certified Hosting Coupons | Hosting Coupons | Domain Name Coupons

CYF is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-20-2012, 02:29 PM   #3
Darrell
Confirmed User
 
Join Date: Feb 2003
Location: In bed asleep
Posts: 803
Thanks for the quick reply but you didn't understand my question. I want every folder and page to be indexed except for any file that has the word page in it.

So for example I don't want the following indexed:
domain.com/page1.php
domain.com/page2.php
domain.com/folder/page1.php
domain.com/folder/page2.php

I read somewhere that Disallow: */page- should do that but I am not sure.
Darrell is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-20-2012, 02:51 PM   #4
CYF
Coupon Guru
 
CYF's Avatar
 
Industry Role:
Join Date: Mar 2009
Location: Minneapolis
Posts: 10,973
not all robots allow wildcards. Google and yahoo do, I'm not sure about the others.

This should work for what you want to do:

User-agent: *
Disallow: /folder/page*
Disallow: /page*
__________________
Webmaster Coupons Coupons and discounts for hosting, domains, SSL Certs, and more!
AmeriNOC Coupons | Certified Hosting Coupons | Hosting Coupons | Domain Name Coupons

CYF is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-20-2012, 02:56 PM   #5
CYF
Coupon Guru
 
CYF's Avatar
 
Industry Role:
Join Date: Mar 2009
Location: Minneapolis
Posts: 10,973
you could also use:

Disallow: /*php
Disallow: /folder/*php
__________________
Webmaster Coupons Coupons and discounts for hosting, domains, SSL Certs, and more!
AmeriNOC Coupons | Certified Hosting Coupons | Hosting Coupons | Domain Name Coupons

CYF is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-20-2012, 09:27 PM   #6
acrylix
Confirmed User
 
acrylix's Avatar
 
Industry Role:
Join Date: Oct 2006
Posts: 362
Quote:
Originally Posted by Darrell View Post
I believe Disallow: /page will stop any pages starting with page in the root folder, but can a wild card be added to include all files in folders as well?
Yes, you could do this:

User-agent: *
Disallow: /page
Disallow: /*/page

But that will also block all folders beginning with "page", for example "mysite.com/pages-with-the-hottest-babes/" will be blocked.


So more specifically, if these pages are all php files use this instead:

User-agent: *
Disallow: /page*.php
Disallow: /*/page*.php


You can test with a tool like this: http://tools.seobook.com/robots-txt/analyzer/


However, it seems you're asking how to prevent these pages from being indexed. Note the difference between "Disallow" and "noindex". Disallow will not prevent those URLs from appearing in the index. It only tells search engines not to crawl those files/folders. If they are being linked to from somewhere on your site (or from someone else's), they will still appear in the index whether you disallow them or not. Google for example, will still index the URL using anchor text to describe it.

To prevent these pages from being indexed, you should use the <meta name="robots" content="noindex"> tag on them.

If you can't do that, then look into adding a noindex X-Robots-Tag to your .htaccess file. Scroll down to the bottom of this page, and look for "Practical implementation of X-Robots-Tag with Apache" for some examples on how to do that:

https://developers.google.com/webmas...obots_meta_tag
acrylix is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.