Quote:
Originally Posted by Darrell
I believe Disallow: /page will stop any pages starting with page in the root folder, but can a wild card be added to include all files in folders as well?
|
Yes, you could do this:
User-agent: *
Disallow: /page
Disallow: /*/page
But that will also block all folders beginning with "page", for example "mysite.com/pages-with-the-hottest-babes/" will be blocked.
So more specifically, if these pages are all php files use this instead:
User-agent: *
Disallow: /page*.php
Disallow: /*/page*.php
You can test with a tool like this:
http://tools.seobook.com/robots-txt/analyzer/
However, it seems you're asking how to prevent these pages from being indexed. Note the difference between "
Disallow" and "
noindex". Disallow will not prevent those URLs from appearing in the index. It only tells search engines not to crawl those files/folders. If they are being linked to from somewhere on your site (or from someone else's), they will still appear in the index whether you disallow them or not. Google for example, will still index the URL using anchor text to describe it.
To prevent these pages from being indexed, you should use the <meta name="robots" content="noindex"> tag on them.
If you can't do that, then look into adding a noindex X-Robots-Tag to your .htaccess file. Scroll down to the bottom of this page, and look for "
Practical implementation of X-Robots-Tag with Apache" for some examples on how to do that:
https://developers.google.com/webmas...obots_meta_tag